A Geogspatial Diversion

I've added another (largely superfluous) online conference to my growing collection. This one had the very descriptive and very serious title of Cloud-Native Geospatial Outreach Event. The funny thing is that I don't even remember for sure how I found out about this one. It may have been when browsing the social accounts for Joe Hamman and Ryan Abernathey, who were guests on this Python Podcast about Pangeo.

Anyway the conference turned out to be a fun surprise for me. I'd intended to have it on in the background, waiting for something that might apply to me, but I was sucked right in. The format was, incredibly, like one long series of lightning talks – quick 20-30 minute pitches or lessons by successive speakers. For a two day conference that makes for a lot of talks by a lot of people. But these talks are always fascinating, and it was especially nice for me since 20 minutes isn't enough to get down in the weeds where I would have been lost anyway, it's just enough for "this is an exciting thing in the world of geospatial science". To be clear, a lot of it was still over my head, but I was inspired to learn more, and there were definite trends in terms of what people were excited about.

Right out of the gate I had to overcome my slight negative knee-jerk reaction to "Cloud Native" being tacked in front of whatever concept or phrase to make it sexy. Mostly if someone says cloud native _____, then they're trying to sell you _____. But very quickly I realized that that's not what was going on here, and that cloud technologies are genuinely a very big deal in this space. If you're doing geospatial anything you're often dealing with massive datasets, requiring a huge amount of storage. If you're analyzing that data you're similarly very likely dealing with massive compute. Finally, if you're rendering that data you really can't do so all at once, but need to pull out sections of data.

All of these points illustrate why the cloud is such a big deal here. Cloud storage is relatively cheap, and it's easy to place these datasets right next to scalable compute that access them directly so that you're not streaming that data around just to look at it or do something with it. With tools like Jupyter, Dask, Binder, and Pangeo Cloud (which, I believe is more of a collection of these tools), you can have reproducible environments, and work with geospatial data without pulling it down to your computer. Newer formats and standards for working with this data have also come along to address the last point mentioned above, about how to index massive geospatial data and access subsets of it. Cloud Optimized Geotiffs or COGs let you store metadata about the information, request just subsets of it (both in terms of area and "bands", e.g. I want this area of the world and only infrared light bands) and they also support enhanced compression. SpatioTemporal Asset Catalogs or STACs are basically a JSON catalog of space-time things. These provide a map through your data, and describe collections of things in the data.

An interesting side-note that's not cloud-related is that, apparently the HTTP 206 "Partial Content" response is part of what's enabled this whole stack. From my (admittedly very limited) understanding, the way this works is that you request metadata about the contents of a COG and then request the subset of data that which are returned to you as 206 partial response chunks.

Speakers are, pretty much by definition, enthusiastic and passionate about their topic. But I think this whole set of technologies coming together around the same time and helping to enable all of this work is genuinely very exciting. Besides the ability to effectively work with all of this information they, just as importantly, enable it to be effectively shared. Efforts like Pangeo Forge, Radiant Earth, Microsoft's Planetary Computer and many others encourage open-sourcing and cataloging both the data as well as the processing and calculations that can make sense of the data.

Learning a little bit about the history of GIS work helped me appreciate these advances. Formats, interfaces, calculations, storage, publications – all of it has historically been siloed inside of businesses and institutions. The advent of these standards and of common open-source tools and libraries (like PostGIS, Python, Xarray, Jupyter etc.) have created an explosion of collaborative work, and unlocked the potential for much more.

Around the same time that I heard about this conference I learned about NASAs Transform to Open Science (TOPS) initiative to put more of it's work out in the open and "increas(e) opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility." This whole theme makes me very optimistic about what this open collaborative concept could mean. I'm not a big fate person but it feels good to think of all of these efforts and technologies coming together just now, when they are very much needed. The NASA TOPS initiative has the phrase "to change everything, we need everyone", and I couldn't agree more.

These kinds of projects feel like the best version of open source or open data. Not even open source really, because it's not about source but about openness. It's about insights being shared along with code and data and bringing them to bear on real problems. At the Cloud Native Geo conference you heard from people building businesses on these new standards, about big companies like Google and Microsoft using them as well and then contributing compute and mapping catalogs and their own standards back to the community. CarbonPlan's Forest Risk map is such a great example of this. In this one example they're using open data from the US Forest Service and the World Climate Research Programme hosted (in the case of CMIP6) for free on Google Public Datasets. They outlined in detail how they came up with the map, posted notebooks on Github showing a part of the process they used for rasterization, and then open-sourced the React toolkit they came up with to work with these new cloud data formats.

Science, academia, tech, business – all have had their share of scrutiny lately, and it's easy to by cynical about any of them for a variety of reasons. But dipping my toe in this little confluence of domains for this one field was enlightening and uplifting. The helpers are helping. They're working on big problems out in the open and they're kicking ass. They're apparently too busy for cynicism, or they've not heard that things are supposed to be broken.

Lastly, this conference yet again confirmed my appreciation for this one small not-crappy part of covid days. I can never get enough of events like this. I was like a kid in a candy store watching these people teach and expound, show off their talents and creations, and wax enthusiastic about the future.