Python GIS first try

NASA MODIS data #

So something I recently worked on was looking at NASA satellite images. As is
with the scientific method, I was originally curious about snow cover on land
in a particular area, and wondered what kind of information could be learned
from looking at satellite images. A quick search lead me to the MODIS
data that NASA collects, and publishes.

With snow cover information, it allows you to form a view onto how much water
inflow there might be in the future. But how does one quantify snow cover?
From the point of view of a satellite, looking down on an area and capturing
an image, you will get:

Naki

Now how do you determine what is snow cover?

Normalized Difference Snow Index (NDSI) #

Luckily we have this numerical indicator that a smart person came up with,
that gives us a ratio of spectral bands over an area, which in turn gives us a
ratio of how much radiation is reflected in each band. Snow absorbs short wave
infrared, and cloud does not, which is the key way of determining the
difference between snow and cloud. This indicator is mainly used in looking at
glacier movement, being able to accurate track the effects of global warming.
NASA has a publication on it of course.

Again, NASA has a great tool that allows you to select your area of
interest, and then it will find corresponding tiles for you. And this is the
beginning of exploring GIS. I am not knowledgeable in GIS by any means, and
there is definitely a lot of room for improvement.

The data #

The format of the files are HDF, which is basically just a binary file
format designed to organise large amounts of data. For the MODIS files, the
data structure is that of raster images, which is a matrix structure of
pixels. To deal with the HDF file formats, I use the h4toh5 converter
tool, to change it into a format that I can use with Python hdf library, which
mainly deal with the h5 format. After all the data is in the proper format for
Python, we are able to begin.

Looking at the area of interest #

Naki

The first step is to only look at pixels that we care about in each tile. If
you are lucky, then you might only have to look at a single tile that contains
all your information. But sometimes the area you want to look at happens to
need more than a single tile. The first step in grabbing the pixels of
interest is to convert the MODIS projection into longitude and latitude.

I do this with the pyproj package:

    sinu = pyproj.Proj("+proj=sinu +R=6371007.181 +nadgrids=@null +wktext")
    wgs84 = pyproj.Proj("+init=EPSG:4326")
    lon, lat = pyproj.transform(sinu, wgs84, xv, yv)

If you have a KML file, say from Google Maps or Google Earth, then you can
convert it into a numerical polygon using shapely package. Then essentially
what I've done is loop over all the points in a single satellite tile(s)
lon/lat and check if it falls within the shape of interest, and output a 2D
numpy array that indicates whether or not that pixel is one we care about.
This gives us a masking layer to apply over the satellite images

Extracting pixels of interest #

With the mask layer is effectively a filter, letting us extract pixels that we
care about from each satellite image. The task is then to grab the NDSI values
from the tile for the pixel that falls within our area of interest.

The rest of the steps involve doing smoothing and clustering, and then finally
some averaging to give an output NDSI factor for a particular area of
interest.

Packages used #

    import h5py
    import pyproj
    import keytree
    import numpy as np
    import psycopg2
    from scipy import interpolate
    from shapely.geometry import box, shape
    from astropy.convolution import convolve, Gaussian2DKernel
    from sklearn.cluster import KMeans, AgglomerativeClustering, MiniBatchKMeans
    from sklearn.preprocessing import scale
    
    import matplotlib as mpl
    import matplotlib.pyplot as plt
    from mpl_toolkits.basemap import Basemap

Some of these will require local distributions, I did this all in Linux, so
apt-get got me all the packages I needed.

Improvements #

Is this the best way of solving the task I had? I'm not totally sure, I
learned about PostGIS and spatial indexing recently, and it seems like a
pretty useful tool, although maybe not needed in my application. The slowest
part of my Python script is when I was extracting and processing pixels from
each tile, so some improvements could be made there.

All in all, it was a fun exercise and it's pretty neat being able to play with
map objects.

← Home