Poisoned pickles make you ill | July 17th-23rd 2023

Abstract

Don’t you love pickles? In the data science space, the pickle module has become one of the most popular ways to serialise and distribute machine learning models - yet, pickles introduce a wide range of problems. For starters, it is incredibly easy to poison a pickle. Once this happens, a poisoned pickle can be used by an attacker to inject any arbitrary code into your ML pipelines. And what’s even worse: it’s incredibly hard to detect if a pickle has been poisoned!

Good news? Help is on the way! You now have access to an increasing number of tools to help you generate higher-quality pickles. And when those are not enough, you can always draw inspiration from the DevOps movement and their trust-or-discard processes.

This talk will show you how widespread pickles are and how easy it is to poison models serialised with pickle, but also how easy it is to start protecting them from attacks.

TalkPyData: Machine Learning, Stats (2023)

The speaker

Adrian Gonzalez-Martin

Adrian is the Head of ML Serving at Seldon, where his focus is to extend Seldon’s open source and enterprise MLOps products to solve large scale problems at leading organisations in the Automotive, Pharmaceutical and Technology sectors. When he is not doing that, Adrian loves experimenting with new technologies and catching up with the MLOps open source community, where he leads the MLServer project. Before Seldon, Adrian has worked as a Software Engineer across different startups, where he contributed and led the development of large production codebases. Adrian holds an MSc in Machine Learning from University College London, where he specialised in probabilistic methods applied to healthcare, as well as a MEng in Computer Science from the University of Alicante.