We can get more from spatial, GIS and public domain datasets! | July 17th-23rd 2023

Abstract

Are prices of short-term rental apartments in your region similar? How similar are they, and at which distance do they tend to be correlated?
Do you have access to a few air pollution measurements but must provide a smooth map over the whole area?
Is your machine learning model based on remote sensing data from Earth Observation satellites, and do you want to include data sampled on Earth?
Do you work with county-level socio-economic factors, but you want to get insights at a finer scale?

if any(answer), then come and see what we can do with the pyinterpolate package designed exactly for spatial interpolation!

TalkPyData: Machine Learning, Stats (2023)

Description

The talk introduces spatial interpolation in Python at the intermediate/advanced level. You should know Python basics and the statistical concepts of variance and correlation to understand the presentation.

Spatial interpolation is the new kid on the block, and in the future, with IoT devices everywhere, it will gain much more weight (along with its relative - time-series modeling). Heavily based on statistics, it falls into a single concept:

Everything is related to everything else, but near things are more connected than distant things. (W. Tobler).

The presentation will focus on one concept and two techniques:

the concept of spatial correlation (the example from the leisure & tourism market),
the point kriging technique used to interpolate values at unseen locations (weather readings enhancement),
the Poisson Kriging Area-to-Point technique for preparation of the input from regional statistics that fit into satellite-based observations (cancer rates in U.S. counties).

Techniques for spatial data interpolation are used by:

mining industry,
agriculture & forestry,
defense & security,
retail,
public health,
administration, urban planning, water management,
weather forecasting,
and everywhere where we sample events distributed over an area or analyze regional statistics.

Why should we bother from the business perspective? We have various ML models, and house-price prediction is a topic of many tutorials. There are a few reasons: with spatial interpolation, we get knowledge about spatial dependencies between our samples, we get uncertainty measures, and we may perform analysis on a relatively small sample (> 50 samples) that won't work with the ML pipeline. We may even link geostatistical models within complex machine-learning pipelines. And finally, we may save a lot of money on sampling.

Abstract

Description

The speaker