NASA MODIS data
So something I recently worked on was looking at NASA satellite images. As is with the scientific method, I was originally curious about snow cover on land in a particular area, and wondered what kind of information could be learned from looking at satellite images. A quick search lead me to the MODIS data that NASA collects, and publishes.
With snow cover information, it allows you to form a view onto how much water inflow there might be in the future. But how does one quantify snow cover? From the point of view of a satellite, looking down on an area and capturing an image, you will get:
Now how do you determine what is snow cover?
Normalized Difference Snow Index (NDSI)
Luckily we have this numerical indicator that a smart person came up with, that gives us a ratio of spectral bands over an area, which in turn gives us a ratio of how much radiation is reflected in each band. Snow absorbs short wave infrared, and cloud does not, which is the key way of determining the difference between snow and cloud. This indicator is mainly used in looking at glacier movement, being able to accurate track the effects of global warming. NASA has a publication on it of course.
Again, NASA has a great tool that allows you to select your area of interest, and then it will find corresponding tiles for you. And this is the beginning of exploring GIS. I am not knowledgeable in GIS by any means, and there is definitely a lot of room for improvement.
The format of the files are HDF, which is basically just a binary file format designed to organise large amounts of data. For the MODIS files, the data structure is that of raster images, which is a matrix structure of pixels. To deal with the HDF file formats, I use the h4toh5 converter tool, to change it into a format that I can use with Python hdf library, which mainly deal with the h5 format. After all the data is in the proper format for Python, we are able to begin.
Looking at the area of interest
The first step is to only look at pixels that we care about in each tile. If you are lucky, then you might only have to look at a single tile that contains all your information. But sometimes the area you want to look at happens to need more than a single tile. The first step in grabbing the pixels of interest is to convert the MODIS projection into longitude and latitude.
I do this with the
sinu = pyproj.Proj("+proj=sinu +R=6371007.181 [email protected] +wktext") wgs84 = pyproj.Proj("+init=EPSG:4326") lon, lat = pyproj.transform(sinu, wgs84, xv, yv)
If you have a KML file, say from Google Maps or Google Earth, then you can
convert it into a numerical polygon using
shapely package. Then essentially
what I’ve done is loop over all the points in a single satellite tile(s)
lon/lat and check if it falls within the shape of interest, and output a 2D
numpy array that indicates whether or not that pixel is one we care about.
This gives us a masking layer to apply over the satellite images
Extracting pixels of interest
With the mask layer is effectively a filter, letting us extract pixels that we care about from each satellite image. The task is then to grab the NDSI values from the tile for the pixel that falls within our area of interest.
The rest of the steps involve doing smoothing and clustering, and then finally some averaging to give an output NDSI factor for a particular area of interest.
import h5py import pyproj import keytree import numpy as np import psycopg2 from scipy import interpolate from shapely.geometry import box, shape from astropy.convolution import convolve, Gaussian2DKernel from sklearn.cluster import KMeans, AgglomerativeClustering, MiniBatchKMeans from sklearn.preprocessing import scale import matplotlib as mpl import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap
Some of these will require local distributions, I did this all in Linux, so
apt-get got me all the packages I needed.
Is this the best way of solving the task I had? I’m not totally sure, I learned about PostGIS and spatial indexing recently, and it seems like a pretty useful tool, although maybe not needed in my application. The slowest part of my Python script is when I was extracting and processing pixels from each tile, so some improvements could be made there.
All in all, it was a fun exercise and it’s pretty neat being able to play with map objects.