Vector data cubes as a bridge between raster and vector worlds | July 17th-23rd 2023

Abstract

This talk introduces the concept of vector data cubes - multi-dimensional arrays where at least one dimension is composed of vector geometries - and its implementation in Python within a new library Xvec, built on top of Xarray, Shapely 2.0 and GeoPandas.

TalkPyData: Software Packages & Jupyter (2023)

Description

Vector geospatial data, such as points, lines and polygons, are primarily stored in one-dimensional arrays; in Python, typically as a GeoPandas' GeoSeries column within a data frame. At the same time, raster data representing, among others, satellite imagery or digital terrain models (DTM) are multi-dimensional arrays with a number of dimensions being anywhere from 2 (e.g. DTM) to N (e.g. multi-spectral time series from Sentinel 2), in Python mostly captured as Xarray objects. The two, raster and vector, rarely meet, and if so, we either transform one to the other using rasterisation or vectorisation or link vector geometries to a raster index via a unique identifier. However, there is a space in between that would benefit from multi-dimensional arrays with support for vector geometry.

This talk presents a concept of vector data cubes - multi-dimensional arrays where at least one dimension is composed of vector geometries - and its implementation in Python using Xarray, Shapely 2.0 and GeoPandas under the hood. Using the new Xvec package extending Xarray's abilities, we can assign an array of vector geometries as dimension coordinates and use the geospatial capabilities within the Xarray object. A typical use case can be a time series of a land cover proportion per region, where one dimension is composed of vector region boundaries, other as land use classes and a third one capturing time. While the same can be stored as a long-form data frame, it is not an efficient format due to a large amount of data replication. Using the vector data cube instead brings both efficiency and convenience as the included vector dimension can be directly used for indexing, plotting, and other operations that depend on the vector data and for which we would need to switch between Xarray and GeoPandas.

The speaker

Martin Fleischmann

I am a researcher in urban morphology and geographic data science focusing on quantitative analysis and classification of cities, remote sensing, and bits of AI. While not doing the research, I write open source software, promote open science and help others with their data.

I am an author or a maintainer of a range of open scientific software, including GeoPandas, the open source Python package for geographic data handling, momepy, the urban morphology measuring toolkit for Python, Xvec, the tool for vector data cubes, and PySAL, the Python library for spatial analysis.