Test your data like you test your code
- Level:
- intermediate
- Room:
- south hall 2b
- Start:
- Duration:
- 45 minutes
Abstract
I will introduce the concept of data unit tests and why they are important in the workflow of data scientists when building data products. In this talk, you will learn a new tool you can use to ensure the quality of the products you build.
Description
When data scientists build data products, they usually need to combine multiple data sources to train their models and then serve predictions. Making sure that the code and the data will be as expected throughout the full lifetime of the project is complex. To ensure the quality of the code, it is a best practice in software engineering to use automatic testing, this has a large corpus of support material. However, ensuring the quality of the data input and output holistically is not yet as well covered.
In this talk, I will explain the concept of data unit tests and why they are important. Then I will present an overview of the current libraries helping to build data unit tests. Finally, I will explain how we integrated it into our workflow at GetYourGuide.