Billy Fung

Python GIS first try

closest I'll get to being an astronaut

Billy Fung / 2017-08-02


So something I recently worked on was looking at NASA satellite images. As is with the scientific method, I was originally curious about snow cover on land in a particular area, and wondered what kind of information could be learned from looking at satellite images. A quick search lead me to the MODIS data that NASA collects, and publishes.

With snow cover information, it allows you to form a view onto how much water inflow there might be in the future. But how does one quantify snow cover? From the point of view of a satellite, looking down on an area and capturing an image, you will get:


Now how do you determine what is snow cover?

Normalized Difference Snow Index (NDSI)

Luckily we have this numerical indicator that a smart person came up with, that gives us a ratio of spectral bands over an area, which in turn gives us a ratio of how much radiation is reflected in each band. Snow absorbs short wave infrared, and cloud does not, which is the key way of determining the difference between snow and cloud. This indicator is mainly used in looking at glacier movement, being able to accurate track the effects of global warming. NASA has a publication on it of course.

Again, NASA has a great tool that allows you to select your area of interest, and then it will find corresponding tiles for you. And this is the beginning of exploring GIS. I am not knowledgeable in GIS by any means, and there is definitely a lot of room for improvement.

The data

The format of the files are HDF, which is basically just a binary file format designed to organise large amounts of data. For the MODIS files, the data structure is that of raster images, which is a matrix structure of pixels. To deal with the HDF file formats, I use the h4toh5 converter tool, to change it into a format that I can use with Python hdf library, which mainly deal with the h5 format. After all the data is in the proper format for Python, we are able to begin.

Looking at the area of interest


The first step is to only look at pixels that we care about in each tile. If you are lucky, then you might only have to look at a single tile that contains all your information. But sometimes the area you want to look at happens to need more than a single tile. The first step in grabbing the pixels of interest is to convert the MODIS projection into longitude and latitude.

I do this with the pyproj package:

    sinu = pyproj.Proj("+proj=sinu +R=6371007.181 [email protected] +wktext")
    wgs84 = pyproj.Proj("+init=EPSG:4326")
    lon, lat = pyproj.transform(sinu, wgs84, xv, yv)

If you have a KML file, say from Google Maps or Google Earth, then you can convert it into a numerical polygon using shapely package. Then essentially what I’ve done is loop over all the points in a single satellite tile(s) lon/lat and check if it falls within the shape of interest, and output a 2D numpy array that indicates whether or not that pixel is one we care about. This gives us a masking layer to apply over the satellite images

Extracting pixels of interest

With the mask layer is effectively a filter, letting us extract pixels that we care about from each satellite image. The task is then to grab the NDSI values from the tile for the pixel that falls within our area of interest.

The rest of the steps involve doing smoothing and clustering, and then finally some averaging to give an output NDSI factor for a particular area of interest.

Packages used

    import h5py
    import pyproj
    import keytree
    import numpy as np
    import psycopg2
    from scipy import interpolate
    from shapely.geometry import box, shape
    from astropy.convolution import convolve, Gaussian2DKernel
    from sklearn.cluster import KMeans, AgglomerativeClustering, MiniBatchKMeans
    from sklearn.preprocessing import scale
    import matplotlib as mpl
    import matplotlib.pyplot as plt
    from mpl_toolkits.basemap import Basemap

Some of these will require local distributions, I did this all in Linux, so apt-get got me all the packages I needed.


Is this the best way of solving the task I had? I’m not totally sure, I learned about PostGIS and spatial indexing recently, and it seems like a pretty useful tool, although maybe not needed in my application. The slowest part of my Python script is when I was extracting and processing pixels from each tile, so some improvements could be made there.

All in all, it was a fun exercise and it’s pretty neat being able to play with map objects.