Robust Data Transformation with Pandas: Typing, Validation, Testing

Level:: intermediate
Room:: club h
Start:: 13:45 on 18 July 2023
Duration:: 180 minutes

Abstract

We will explore possibilities for making our data analyses and transformations in Pandas robust and production ready. We will see how advanced group-by, resample or rolling aggregations work on large time series weather data. (As a bonus, you will learn about Prague climate.) We will use type annotations and schema validations with the Pandera library to make our code more readable and robust. We will also show the potential of property-based testing using the Hypothesis package, with strategies generated from Pandera schemas. We will show how to avoid issues with time zones when working with time series data. By the end of the tutorial, you will have a deeper understanding of advanced Pandas aggregations and be able to write robust, production ready Pandas code.

Tutorial

Description

Instructions

Before attending the workshop, it's important to prepare your environment by following the instructions in the repository located at https://github.com/coobas/robust-pandas-workshop. This will ensure that you have all the necessary tools and dependencies installed to participate in the workshop.

Please note that we will be continuously updating the repository leading up to the workshop, so it's important to pull the latest changes on the day of the workshop to ensure that you have the most up-to-date materials.

The speakers

Jakub Urban

Lead Science Platform Engineer who loves empowering data scientists and bring their algorithms to production. In general a Python + Data (Science) enthusiast, with education and career history in computational physic. PyData Prague meetup co-organiser, university tutor of scientific Python. https://www.linkedin.com/in/urbanj/

Jan Pipek

Data Scientist at Pace Revenue. Organiser of Prague PyData meetups. Occasional lecturer of Scientific/Data Python.

← Back to schedule