Python GIS first try
NASA MODIS data #
So something I recently worked on was looking at NASA satellite images. As is
with the scientific method, I was originally curious about snow cover on land
in a particular area, and wondered what kind of information could be learned
from looking at satellite images. A quick search lead me to the MODIS
data that NASA collects, and publishes.
With snow cover information, it allows you to form a view onto how much water
inflow there might be in the future. But how does one quantify snow cover?
From the point of view of a satellite, looking down on an area and capturing
an image, you will get:
Now how do you determine what is snow cover?
Normalized Difference Snow Index (NDSI) #
Luckily we have this numerical indicator that a smart person came up with,
that gives us a ratio of spectral bands over an area, which in turn gives us a
ratio of how much radiation is reflected in each band. Snow absorbs short wave
infrared, and cloud does not, which is the key way of determining the
difference between snow and cloud. This indicator is mainly used in looking at
glacier movement, being able to accurate track the effects of global warming.
NASA has a publication on it of course.
Again, NASA has a great tool that allows you to select your area of
interest, and then it will find corresponding tiles for you. And this is the
beginning of exploring GIS. I am not knowledgeable in GIS by any means, and
there is definitely a lot of room for improvement.
The data #
The format of the files are HDF, which is basically just a binary file
format designed to organise large amounts of data. For the MODIS files, the
data structure is that of raster images, which is a matrix structure of
pixels. To deal with the HDF file formats, I use the h4toh5 converter
tool, to change it into a format that I can use with Python hdf library, which
mainly deal with the h5 format. After all the data is in the proper format for
Python, we are able to begin.
Looking at the area of interest #
The first step is to only look at pixels that we care about in each tile. If
you are lucky, then you might only have to look at a single tile that contains
all your information. But sometimes the area you want to look at happens to
need more than a single tile. The first step in grabbing the pixels of
interest is to convert the MODIS projection into longitude and latitude.
I do this with the
sinu = pyproj.Proj("+proj=sinu +R=6371007.181 +nadgrids=@null +wktext") wgs84 = pyproj.Proj("+init=EPSG:4326") lon, lat = pyproj.transform(sinu, wgs84, xv, yv)
If you have a KML file, say from Google Maps or Google Earth, then you can
convert it into a numerical polygon using
shapely package. Then essentially
what I've done is loop over all the points in a single satellite tile(s)
lon/lat and check if it falls within the shape of interest, and output a 2D
numpy array that indicates whether or not that pixel is one we care about.
This gives us a masking layer to apply over the satellite images
Extracting pixels of interest #
With the mask layer is effectively a filter, letting us extract pixels that we
care about from each satellite image. The task is then to grab the NDSI values
from the tile for the pixel that falls within our area of interest.
The rest of the steps involve doing smoothing and clustering, and then finally
some averaging to give an output NDSI factor for a particular area of
Packages used #
import h5py import pyproj import keytree import numpy as np import psycopg2 from scipy import interpolate from shapely.geometry import box, shape from astropy.convolution import convolve, Gaussian2DKernel from sklearn.cluster import KMeans, AgglomerativeClustering, MiniBatchKMeans from sklearn.preprocessing import scale import matplotlib as mpl import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap
Some of these will require local distributions, I did this all in Linux, so
apt-get got me all the packages I needed.
Is this the best way of solving the task I had? I'm not totally sure, I
learned about PostGIS and spatial indexing recently, and it seems like a
pretty useful tool, although maybe not needed in my application. The slowest
part of my Python script is when I was extracting and processing pixels from
each tile, so some improvements could be made there.
All in all, it was a fun exercise and it's pretty neat being able to play with