Skip to main content

Robot Holmes and The MLington Murder Mysteries

30 minutes


We will follow master detective Robot Holmes on his way to solve one of his hardest cases so far - a series of mysterious murders in the city of MLington. The traces lead him to the Vision-Language part of town, which has been a quiet and tranquil place with few incidents until lately. For a few months the neighbourhood has been growing extensively and careless benchmark leaders are dropping dead at an alarming rate.

Robot Holmes sets out to find the cause for this new development and will gather intel on some of the most notorious of the new citizens of the Vision-Language neighbourhood and find out what makes them tick.

TalkPyData: Deep Learning, NLP, CV


Join the detective Robot Holmes on an adventure through the streets of the vibrant city of MLington. Together, we will find out who is behind a series of mysterious murders in the bustling neighbourhood of Vision-Language Village (ViLaVi).

ViLaVi has seen a steady influx of new posh Vision-Language models, which are not only rapidly expanding the size of the district but also the quality of services that are offered in these streets.

On his journey Robot Holmes will compile an overview of the tasks these new models excel in and find out what makes them so good at it. Additionally he will gather details on some of the most successful of the Vision-Language models and eventually find out who or what is behind the series of murders.

By the end of our journey you will have a better overview of the rapidly expanding Vision-Language neighbourhood and will have knowledge of the most important inner workings of Vision-Language models like CLIP, OWL-ViT, and BLIP. You will also know how to run them yourself in a few lines of code with the transformers library by Hugging Face.

The talk is for everyone interested in the topic of Vision-Language models and who wants to gain some first insights into their ways of working. People who already have a profound knowledge of Vision-Language models are welcome as well, as they probably have never seen it presented as a crime story in a strangely familiar city.

The speaker