Note: this list might change
The CPU in your browser: WebAssembly demystified
In the recent years we saw an explosion of usage of Python in the browser: Pyodide, CPython on WASM, PyScript, etc. All of this is possible thanks to the powerful functionalities of the underlying platform, WebAssembly, which is essentially a virtual CPU inside the browser.
Writing a Python interpreter from scratch, in half an hour.
You use the Python interpreter every single day. It does a lot of things for you: checks that your code has valid syntax and is properly indented, imports modules from various locations, and runs your code instruction-by-instruction.
But if you've ever wondered how exactly it happens, this talk will teach you the entire process, by building a working python interpreter from scratch.
Rust for Python data engineers
Python is a popular language for data engineering but has some limitations in performance, concurrency, and production deployments. The Rust programming language offers powerful alternatives with strong compile-time and memory safety guarantees. In this talk, I'll explore how data engineers can leverage Rust to build high-performance data pipelines and processing systems. I'll cover the Rust ecosystem for data work, including frameworks and libraries for working with data formats, databases, streaming systems, and scientific computing. By combining Rust and Python, data engineers can harness the benefits of both languages and build robust end-to-end data systems that scale to meet demanding production needs.
GraphQL as an umbrella for microservices
Systems built with microservices tend to become complex over time. There are several approaches that encapsulate complex distributed system layouts with an API Gateway, or backends for frontends. Having a GraphQL gateway is one of the available options. This method of delivering client-facing APIs has become the standard with modern single-page applications.
Packaging Python Apps with Briefcase
Python has proven itself to be a powerful tool for data science, and for web servers. However, one area where it hasn't historically been popular is in building applications for end users.
In this talk, you'll discover how you can use Briefcase to distribute an app to users on desktop, mobile, and the web - all from a single Python codebase.
Face Off: Brute-force attack on Biometrical-databases
Magic happens every time you take your phone out of your pocket. Somehow, just by looking at the screen, your phone recognizes you (and only you) and magically unlocks.
Have you ever stopped for a minute and thought to yourself - How does that even work? And maybe more importantly, how secure is it?
In this session, we're going to understand how facial recognition works under the hood. We'll dive into some potential security problems, and we'll show you how we were able to break into a biometric database built on the Dlib-python-library by applying a sophisticated brute-force attack. The results will surprise you.
Food For Rabbits: Celery From Zero to Hero
In a world, full of Micro-Services, distributing tasks is a constant challenge, and there's only one tool that can rule them all.
In this workshop, we'll introduce Celery - a tool for distributing tasks in an easy, fast, and flexible manner, and take you from zero to hero!
- We're going to understand why we need a distributed task system, and why to choose Celery.
- We'll write our first Celery task.
- Understand how to configure and run Celery.
- Familiarize ourselves with Celery's fundamental concepts.
- Dive into celery customizable options.
- Finally, we'll see a real-life example of how we used Celery in our production system and how we customized it to fit our needs, and discuss how you can do the same.
Building Secure and Customized REST APIs with Django and DRF
Kuldeep Pisda, Vasundhara Shukla
This tutorial explains how to use Django and Django Rest Framework (DRF) to create REST APIs quickly. It covers implementing permissions on endpoints, serving different responses based on user permissions, and adding pagination. Additionally, it includes creating custom endpoints for specific needs. This tutorial suits those building scalable, secure, and customized REST APIs.
Rotating DB Passwords Without Breaking Your Django Server
This talk will focus on implementing a password rotation strategy for your database without disrupting your Django server or other applications that consume the database. Regular password rotation is a critical security practice, but it can pose challenges for applications and servers that rely on the password for access. We will discuss the importance of password rotation and explore the challenges of rotating passwords for a database in use by a Django server. We will also discuss several techniques for safely rotating database passwords, such as using connection pools and leveraging environmental variables. By the end of the session, attendees will better understand the security risks associated with static passwords and how to mitigate those risks through password rotation while keeping their Django server and other applications running smoothly.
How Python can help victims of violence
There are two values that everyone agrees with: Judicial Truth (criminals should be prosecuted, but innocent people left free), and Privacy (others shouldn't know unnecessarily about my private life).
But these two values are constantly put in opposition, e.g. videosurveillance helps gather evidence of crime, but it endangers our legitimate rights as citizens.
That's why we launched the WitnessAngel initiative, a research effort to invent new concepts and technologies able to reconcile Judicial Truth and Privacy.
With algorithms like Flightbox, with ideas like VideoTestimony and Familiar, and with the open-source code we provide, we work with associations and enterprises to eventually put life-changing solutions into the hands of the general public. So that countless victims of rape, abuse, bullying, stop facing the usual brick wall: "it's your word against theirs".
DuckDB: Bringing analytical SQL directly to your Python shell
In this talk, we will present DuckDB. DuckDB is a novel data management system that executes analytical SQL queries without requiring a server. DuckDB has a unique, in-depth integration with the existing PyData ecosystem. This integration allows DuckDB to query and output data from and to other Python libraries without copying it. This makes DuckDB an essential tool for the data scientist. In a live demo, we will showcase how DuckDB performs and integrates with the most used Python data-wrangling tool, Pandas. Besides learning about DuckDB's main charactestics, users will also experience a live demo of DuckDB and Pandas in a typical data science scenario, focusing on comparing their performance and usability while showcasing their cooperation. The demo is most interesting for an audience familiar with Python, the Pandas API, and SQL.
HPy: The Future of Python Native Extensions
Tim Felgentreff, Florian Angerer
Updating Python versions often forces us to update native extensions at the same time. But what if you need to update Python because of a security issue, but cannot (yet) move to a newer version of a dependency? Or you are running a proprietary binary extension that cannot easily be recompiled?
The HPy project provides a better C extension API for Python. It compiles to binaries that work across all versions of CPython, PyPy, GraalPy. HPy makes porting from the existing C API easy and its design ensures that the binaries we produce today stay binary compatible with future Python versions.
NumPy is the single largest direct user of the CPython C API we know of. After over 2 years of work and more than 30k lines of code ported, we can demonstrate NumPy running its tests and benchmarks with HPy. We will show the same NumPy binary run on multiple CPython versions and GraalPy. And we will discuss performance characteristics of this port across CPython, GraalPy, and PyPy.
Career Building Through Open Source & Community Participation
Open source has widely grown to allow different tech career paths to enhance projects with their skills & provide jobs for those interested in working with open source. Open source contribution programs provide & build interested persons' capacity to become professionals.
Active community participation helps enhance career growth.
Outreachy is a paid and remote internship OS program that empowers, grows talents, and prepares them for career growth. Outreachy provides internships to people subject to systemic bias and impacted by underrepresentation in the technical industry where they are living.
At the end of this session, beginners and persons on the intermediate level will have enough knowledge of how they can build a career in open source; experts will also get more insights on how they can contribute to the advancement of open source contribution by giving back to the community as a mentor helping new contributors understand the open source ecosystem and contribution.
From Jupyter Notebooks to a Python Package: The Best of Both Worlds
A Jupyter notebook is quite handy for rapid REPL (Read-Eval-Print-Loop) style tasks such as exploratory data analysis. However, we would feel deficiencies in proper SW engineering supports at some point as the notebook grows to have larger and more complicated code. It is because the Jupyter notebook lacks several important features including code sharing, refactoring support, git integration and advanced editing. Fortunately, traditional full-fledged IDEs, such as VS Code or PyCharm, are available at hand and they support these lacking features very well. Then, why don’t we take advantage of the best of both worlds?
In this beginner-level hands-on talk, I will demonstrate how to transform a Jupyter notebook workflow to a proper Python package using VS Code. I will also introduce several basic but essential refactoring recommendations. By doing so, you can use the refactored package for several notebooks and even share with your colleagues and friends.
Orchestrating Python Workflows in Apache Airflow
Apache Airflow is an Open Source workflow orchestrator. It is a python library that allows you to automate complex code and integrate it with a plethora of Data Sources. It is provided with an integrated UI and API for both your human and programmatic needs.
After 5 years of running Airflow in production, I hope to share some insights on the technology. The strengths and weaknesses, recommended features and more dangerous ones, and similar considerations on the UI.
I'll also be talking about how you can make your own Operators in Airflow.
Come take a deeper dive into the same solution used by Airbnb, Slack, Walmart and many more to efficiently run their data pipelines.
Upgrading Django - from legacy to latest
Django is a framework that's been around for more than 15 years, which makes for enough legacy projects to deal with.
In this talk we'll show practical tips and tricks for how to get Django from legacy to latest & greatest.
Solving Small-Data Problems in Management Accounting
Alexander CS Hendorf, Lucas-Raphael Müller
Controllers deal with numbers all day long. They have to check a lot of data from different sources. Often the reports contain erroneous or missing data. Identifying outliers and suspicious data is time-consuming.
This presentation will introduce a Small Data Problem-End2End workflow using statistical tools and machine learning to make controllers' jobs easier and help them be more productive.
We will demonstrate how we used amongst others,
- dirty cat
- fastnumbers to create a self-improving system to automate the screening of reports and report outliers in advance so that they can be eliminated more quickly.
Want to learn something new about yourself? This talk will showcase some approaches to get the best from behavioral tracking as well as silent wearables tracking. Where and how to get data with my experience regarding the quality (expectation management), what to do with the raw data (IDA + some knowledge needed), how to convert insights into actions.
Responding to Earthquakes using Machine Learning and Racing through Time
Right after the devastating earthquakes in Turkey, there has been a massive flow of tweets and posts from survivors and their relatives, calling for help. There was a need to extract the data, make it meaningful and open to public, so we have come up with afetharita.com. The machine learning part of the application is completely based on open-source tools in Python and I will go through the pipeline and the process.
Optimizing Your CI Pipelines
Take your Continuous Integration to the next level! Learn how to optimize your pipelines for faster and more efficient builds through parallelization, caching, failing early, conditional runs, and more.
BDD - how to make it work?
Behaviour-driven development promises evergreen documentation or human-readable executable specification - sounds great. However, adopting it takes much more than simply installing behave or pytest-bdd and writing Gherkin. This talk will show what.
Using NLP to Detect Knots in Protein Structures
Proteins are essential components of our bodies, with their function often dependent on their 3D structure. However, uncovering the 3D structure has for a long time been redeemed by months of hard work in the lab. Recent advances in Machine learning and NLP have made it possible to build models (eg. Alphafold) capable of predicting the protein's 3D structure with the same precision as experimental methods.
In this talk, I will explore an even more specific application of language models for proteins - the detection of a knot in a protein's 3D structure solely from the protein amino acid sequence. Knotting in proteins is a phenomenon that can affect their function and stability. Thanks to NLP and interpretation techniques we can try to uncover why and how proteins tie themself into a knot. In this research, we rely on many python-based tools starting from Biopython to Pymol and Hugging Face transformer library.
A Magic Implementation of NotImplemented
Dirty Equals is a new python library by Samuel Colvin, the creator of Pydantic. It will transform how you write tests, especially for APIs.
I made some contributions to it, which forever changed how I thought about
NotImplemented. I thought it was a placeholder for unfinished work and unexpected use cases. I thought the language quirks it created in equality comparison were annoying.
But in DirtyEquals, it’s a magic way to transform Python’s built in equality operator... And that changed how I think about language quirks, full stop.
CLI application development made easier with typer
Do you feel like digging through github code to learn how to use it is painful? Also think simply packaging and publishing your library to the world on pypi sometimes isn't enough to help others use what you are working on? Then come join me, as this talks is definitely for you!
In this presentation, I'd like to present you typer, and why it's probably the easiest and most affordable way to create command line applications (in 2023) that your users will love to use. We'll discuss it's key strong points, how to structure your CLI application, and make it ready to be packaged and published with no hussle.
Build a terminal TODO app with Textual
Learn how to build powerful terminal-based user interfaces (TUIs) with ease using Textual - an open-source Python framework.
Throughout this tutorial, you'll learn how to use Textual's built-in widgets, reactive features, and message-passing system to create a dynamic and user-friendly TODO app that's perfect for managing your daily tasks.
From creating and displaying tasks to editing and deleting them, you'll cover all the essential features needed to make a functional TODO app.
You'll also learn how to use Textual CSS to style your TUI for a polished and elegant look, together with some tips and tricks to make it even easier to develop your TUIs in Textual.
This tutorial provides everything you need to get started with building TUIs in Python. By the end of the tutorial, you'll have a fully functional and stylish TODO app that showcases Textual's versatility and useful features.
Scipp: multi-dimensional arrays with labeled dimensions and physical units
Inspired by Xarray, Scipp (scipp.github.io) enriches raw NumPy-like multi-dimensional data arrays by adding named dimensions and associated coordinates. For an even more intuitive and less error-prone user experience, Scipp adds physical units to arrays and their coordinates. Scipp data arrays additionally support a dictionary of masks, as well as histogram bin-edge coordinates.
One of Scipp's key features is the possibility of using multi-dimensional non-destructive binning to sort record-based "tabular"/"event" data into arrays of bins. This provides fast and flexible binning, rebinning, and filtering operations, all while preserving the original individual records.
Scipp ships with data display and visualization features for Jupyter notebooks, including a powerful plotting interface. Named Plopp, this tool uses a graph of connected nodes to provide interactivity between multiple plots and widgets, requiring only a few lines of code from the user.
Interactive control of robots can be a challenge, as it requires a lot of things to happen in parallel while at the same time reacting to data from sensors and control signals. Using python's async facilities may greatly simplify this task, allowing us to write code that is similar to the non-parallel version, but that is at the same time easy to compose into bigger program doing many things at once. I will talk about my own experiences programming the Fluffbug robot with CircuitPython, point out the problems and the solutions I found.
How to land your new Python Developer job: a Recruiter's perspective
Looking for a job is already a job. How can you make sure that you are successful in the role of a Python Developer job-seeker? Join this talk to learn directly from an insider the tips & tricks about what technologies are in-demand, how to look for your next role, how to display your experience (or lack of) in your CV, how to prepare for interviews, and much more.
Serverless billion-scale vector search for AI applications
From recommendation systems to LLM-based applications, vector search is a critical component of the modern AI workflow. Existing vector solutions are complicated to use, hard to maintain, and cost too much. LanceDB is a free open-source vector store that can perform low latency vector search on billion-scale vector datasets on a single node.
Learning the ropes: understanding Python generics
What if you don't want a Cat to be an Animal? What is the Liskov Substitution Principle? And what on earth is contravariance?
Discover the answers to these questions and more, as we explore the foundations of generic types in Python. And by the end, you might even understand the weirder errors that Mypy sometimes throws your way.
Ultimative session about hidden gems of Django Admin.
The Django Admin Panel is a complex and bad-documented tool in the Django that can greatly speed up development if you start to understand it. “Isn’t it easier for us to write our Backend?” I will answer: “No, it’s not easier!”. 8 years of insights and discoveries in my Talk. Here i want talk about multiple admin sites, ModelAdmins possibilities, object state versioning and app configs as completely forgotten hidden power.
What are you yield from?
Many developers avoid using generators. For example, many well-known python libraries use lists instead of generators. The generators themselves are slower than normal list loops, but their use in code greatly increases the speed of the application. Let’s discover why.
Kubernetes <3 Python - Deploy Python apps & extend Kubernetes with Python
You don't have to be an Ops expert to make Kubernetes useful! In this workshop, you will learn how to overcome complexity, and love Kubernetes as a Platform to deploy a Python web application or your data science and machine learning pipelines. You will learn how and when to use basic elements of Kubernetes like Deployments and Stateful Sets.
Once you understand these basic elements, you will learn how to extend Kubernetes using Python. You will learn how to define custom resources and controllers to automate all things related to your applications' life cycle, from ETL through sending email for password reset to where your imagination stops.
In the end of this workshop, you will have deployed a python web application and successfully extend Kubernetes with so-called operators to manage the complete life-cycle of your application.
Diving into Event-Driven Architectures with Python
Event-Driven Architectures (EDAs) target a real need in today's application landscape, as systems grow more complex or need to scale organically.
The talk will introduce the architecture and provide insights into different components which can be managed, connected and implemented with Python.
pytest tips and tricks for a better testsuite
pytest lets you write simple tests fast - but also scales to very complex scenarios: Beyond the basics of no-boilerplate test functions, this training will show various intermediate/advanced features, as well as gems and tricks.
To attend this training, you should already be familiar with the pytest basics (e.g. writing test functions, parametrize, or what a fixture is) and want to learn how to take the next step to improve your test suites.
If you're already familiar with things like fixture caching scopes, autouse, or using the built-in tmp_path/monkeypatch/... fixtures: There will probably be some slides about concepts you already know, but there are also various little hidden tricks and gems I'll be showing.
Private Data Anonymization with Python, Fundamentals
Abel Meneses Abad, Oscar L. Garcell
How to bring large legal document repositories into the public domain without releasing private data? The fundamental concepts behind document anonymization are entity recognition, masking type, and pseudoanonymization. Using python language and a collection of libraries such as spacy, pytorch, and others we can achieve good scores of anonymization. How is this applied within a flow containing AI models for NER? Once anonymized how to improve the result by doing more text mining with python based apps and human in the loop. Although it was approved in 2016, the application of the GDPR at the European level remains a challenge in banking, legal, and other contexts. This talk covers the process of transforming pdf and docx documents into xml, processing them using regexp and spacy/torch models, and how to parse these results using AntConc and Textacy. All the ideas will be supported with the real experience of the MAPA project a European project for anonymization finished in 2022.
Subclassing, Composition, Python, and You
Ever seen a code base where understanding a simple method meant jumping through tangled class hierarchies? We all have! And while "Favor composition over inheritance!" is almost as old as object-oriented programming, strictly avoiding all types of subclassing leads to verbose, un-Pythonic code. So, what to do?
The discussion on composition vs. inheritance is so frustrating because far-reaching design decisions like this can only be made with the ecosystem in mind – and because there's more than one type of subclassing!
Let's take a dogma-free stroll through the types of subclassing through a Pythonic lens and untangle some patterns and trade-offs together. By the end, you'll be more confident in deciding when subclassing will make your code more Pythonic and when composition will improve its clarity.
Language Models for Music Recommendation
Nischal Harohalli Padmanabha, Raghotham Sripadraj
Music streaming services like Spotify and youtube are famous for their recommendation systems and each service takes a unique approach to recommending and personalize content. While most users are happy with the recommendations provided, there are a section of users who are curious how and why a certain track is recommended. Complex recommendation systems take various factors like track metadata, user metadata, and play counts along with the track content itself.
Inspired by Andrej Karpathy to build an own GPT, we have to use Language Models to build our own music recommendation system.
Interactive, animated reports and dashboards in Streamlit with ipyvizzu.
It's great when you can share the results of your analysis not only as a presentation but as something that non-data scientists can explore on their own, looking for insights and applying their business expertise to understand the significance of what they find.
With its accessibility for both creators and viewers, Streamlit offers a brilliant platform for data scientists to build and deploy data apps. Now, with the integration of ipyvizzu - a new, open-source data visualization tool focusing on animation and storytelling - you can quickly create and publish interactive, animated reports and dashboards on top of static or dynamic data sets and your models.
Fish and chips and Apache Kafka®
Apache Kafka® is the de facto standard in the data streaming world for sending messages from multiple producers to multiple consumers, in a fast, reliable and scalable manner.
Come and learn the basic concepts and how to use it, by modelling a traditional British fish and chips shop!
Python on Arm architecture
Arm is everywhere technology matters: 250+ billion chips in everything from sensors to smartphones to servers. Due to its simplicity, versatility, and growth in popularity over the past decade Python is the most used language in the world.
In this presentation I will show you what the status of Python is on Arm architecture on all major operating systems and how you could help to improve it further.
Solving Multi-Objective Constrained Optimisation Problems using Pymoo
Pymoo is an open source python framework with state-of-the-art optimisation and post performance analysis capabilities. It provides an object oriented interface to solve constrained Single/Multi-Objective optimisation problems with a catalog of algorithms, customisations and post-optimisation evaluation functionalities. With additional features like Visualisation of optimal pareto-fronts, decision making, parallelization and customised sampling, Pymoo promises to be highly valuable for scalable optimisation solutions.
Most introductory Python books and online resources like w3schools.com try to be complete when a new concept is explained. This does not always work well for beginners. E.g. if you have just grasped how a while-loop works, it may cause too much cognitive load to also understand the break and continue options, let alone the else clause. The learning psychologist Jerome Bruner introduced the term "spiral learning". The idea is that you don't teach all aspects of a new concept, but just enough to use it. At a later stage a teacher can revisit the subject and explain more details, when a student needs this to take the next step. Spiral Python is a road map of subjects that can be found in any introductory book or online resource about Python, but absolutely original in the sense that it takes into account how people learn in a natural way. You do not need to know the whole language before you can use it. Spiral Python also contains exercises (to practice) and challenges (to motivate).
Too Big for DAG Factories?
Do you need to transform, optimize and scale your data workflow? In this talk, we’ll review use cases, and you’ll learn how to dynamically generate thousands of DAGs (Directed Acyclic Graphs) with Airflow.
Whisper AI: Live Translated Subtitles for 96 Languages
Whisper AI, a new model from OpenAI, has been largely overlooked despite its impressive ability to accurately transcribe and translate human speech from audio.
In this talk I will explore the architecture of the model and explain why it works so well. Additionally, I will live demo the model's capabilities in three languages, showing how you can use it on your own computer to generate English subtitles for a wide range of content.
Decorators - A Deep Dive
Python offers decorator to implement re-usable code for cross-cutting task. The support the separation of cross-cutting concerns such as logging, caching, or checking of permissions. This can improve code modularity and maintainability.
This tutorial is an in-depth introduction to decorators. It covers the usage of decorators and how to implement simple and more advanced decorators. Use cases demonstrate how to work with decorators. In addition to showing how functions can use closures to create decorators, the tutorial introduces callable class instance as alternative. Class decorators can solve problems that use be to be tasks for metaclasses. The tutorial provides uses cases for class decorators.
While the focus is on best practices and practical applications, the tutorial also provides deeper insight into how Python works behind the scene. After the tutorial participants will feel comfortable with functions that take functions and return new functions.
Would Rust make you a better Pythonista?
What would a Pythonista gain from becoming a Rustacean other than semicolons and brackets?
In this talk I'll share the learnings and achievements I got by adding the Rust programming language into my Python life. Illustrating a real story now in production at scale, I'll walk you through all the pains and joys of this unexpected journey which changed me more than I anticipated.
- Project introduction
- Motivations of selecting this project to learn Rust
- Tales of a Pythonista learning Rust
- Results, numbers and production graphs
- How Rust influences my daily Python
- Was it worth it? Should you do it too?
Designing a Human-Friendly CLI for API-Driven Infrastructure
As Bloomberg’s infrastructure grows and evolves, the tools we use to manage it are becoming increasingly important. To streamline infrastructure management, our team set out to design a REST API and constituent CLI (Command Line Interface) that would comprise a single interface for both programmatic and human interaction with our infrastructure. Traditionally, building a CLI that is tightly coupled to an API requires maintaining a separate codebase, which is tedious and error-prone. Instead, we designed a CLI that dynamically generates commands based on the OpenAPI JSON documentation. However, since APIs are designed for computer interaction, we designed our API to include the information needed to implement a human-friendly CLI. Leveraging Python, FastAPI, and numerous other open source projects, we built a stable, extensible tool that greatly improves how we interact with our infrastructure.
Python interoperability: building a Python-first, petabyte-scale database
How can you scale Python to run at petabyte scale, with the reliability needed to trade billions of dollars? With ArcticDB we have been doing exactly that for the last four years, by leveraging interoperability between Python and high-performance C++, with a detailed understanding of the data structures inside Python and a few extra tricks up our sleeves.
Come take a peek under Python's bonnet and learn how to hotwire a few things along the way.
PEP 458 a solution not only for PyPI
Kairo de Araujo, Martin Vrachev
PEP 458 uses cryptographic signing on PyPI to protect Python packages against attackers. The implementation of the PEP inspired the Repository Service for TUF (RSTUF), a project accepted into the OpenSSF sandbox. We identified that the design could benefit other organizations and repositories looking to secure their software supply chains. In this talk we would answer the following questions:
- How did the PEP 458 design help to start the Repository Service for TUF (RSTUF)?
- How could RSTUF be used for PyPI with its millions of packages?
- How can RSTUF be deployed by any organization at any scale without requiring TUF expertise?
Additionally, in this talk, we would give an overview of PEP 458, how it works, and give a high-level overview of TUF.
A quick guide to logging for Django developers
logging module is a really powerful tool for troubleshooting with a lot of potential to save us hours of debugging.
The aim for the talk is to provide an overview how the logging module in python works, how Django uses it and how to improve our logging to make it better for our web project.
High Volume PDF Text Extraction using Python Open-Source Tools
All major companies have huge amounts of (mostly PDF) documents that contain important - even critically important - information, that does no longer exist anywhere else in their data stores.
Reports, once generated for shareholders and legal or financial authorities, may still be useful for developing longterm forecasts or triggering company management decisions.
By definition, documents are intended for human perception, and as such contain unstructured data from an information technology perspective.
Therefore, tools to extract PDF text content (mostly, but not only text) from millions of pages have become important vehicles to recreate structured information.
This presentation talks about extraction "need for speed" in this Big Data scenario, the need for integration with OCR capabilities and presents an open-source toolset which combines both, top-of-the-class performance and maximum extraction detail.
The Standard Library Tour
Are you tired of writing complicated code only to discover that Python has tools in its standard library that could have made your life easier? Join us for a tour of the standard library where we'll dive into less-known modules that do well-known things and well-known modules that do less-known things. This talk is tailored to beginners or anyone who wants to learn more about Python's standard library.
Robot Holmes and The MLington Murder Mysteries
We will follow master detective Robot Holmes on his way to solve one of his hardest cases so far - a series of mysterious murders in the city of MLington. The traces lead him to the Vision-Language part of town, which has been a quiet and tranquil place with few incidents until lately. For a few months the neighbourhood has been growing extensively and careless benchmark leaders are dropping dead at an alarming rate.
Robot Holmes sets out to find the cause for this new development and will gather intel on some of the most notorious of the new citizens of the Vision-Language neighbourhood and find out what makes them tick.
pip install malware
pip install malware: it’s that easy. Almost all projects depend on external packages, but did you know how easy it can be to install something nasty instead of the dependency you want? I'll be showing this live, as I make malware and install it from PyPI onto my own computer during the talk!
The State of Production Machine Learning in 2023
As the number of production machine learning use-cases increase, we find ourselves facing new and bigger challenges where more is at stake. Because of this, it's critical to identify the key areas to focus our efforts, so we can ensure our machine learning pipelines are reliable and scalable. In this talk we dive into the state of production machine learning, and we will cover the concepts that make production machine learning so challenging, as well as some of the recommended tools available to tackle these challenges.
Site Unseen: hidden python customization
Python offers us the ability to customize how it starts up. In some cases arbitrary python code can get executed before the first line of your module is reached. This is necessary for some of its dynamic nature, like virtualenvs but can also be harnessed to make the interpreter experience truly personal.
Polars vs Pandas - what's the difference?
Have you heard about Polars? What are the differences? Is Polars replacing Pandas? In this talk, we are going to demystify these questions about Polars. Compares the differences between Polars and Pandas, and explains the pros and cons of both of them.
Unleashing the Power of dbt and Python for Modern Data Stack
This talk will introduce dbt and demonstrate how to leverage Python to unlock its full potential. Attendees will learn best practices for working with dbt, how to integrate it with other tools in their data stack, and how to use Python packages like fal to perform complex data analysis. With real-world examples and use cases, this talk will equip attendees with the tools to build a modern, scalable, and maintainable data infrastructure.
Build, Serve, and Deploy a Fast, Production-Ready API with Python and Robyn
Join our hands-on workshop and discover how to build fast, production-ready APIs using Robyn, a developer-friendly web framework for Python. We'll guide you through key features like GraphQL, WebSockets, and data validation, as well as essential topics like app structure, database modeling, and code splitting. With our workshop, you'll gain practical experience and valuable insights into Robyn's simple and extensible API, middleware, and deployment process.
Zero-Copy Zen: Boost Performance with Memory View
Kesia Mary Joies, Aby M Joseph
Are you tired of struggling with memory management in Python? Do you want to take your skills to the next level and achieve maximum performance while minimising memory usage? Look no further, here is Zero-Copy in Python! Zero-copy is a technique in computer programming that allows data to be transferred between different parts of a program without being copied to intermediate buffers. In Python, this technique can be achieved using the memory view object, which provides a view into the memory of other objects. Learn how to efficiently manipulate large datasets and optimise your code with the help of this powerful tool. Whether you're working with sockets, objects or memory profiling, memory view is your key to faster and more efficient Python programming.
Bulletproof Python – Writing fewer tests with a typed code base
A fully typed code base requires less test code to achieve the same level of confidence in its correctness. We'll analyze specific code examples and see how dependent types and exhaustiveness checking make certain classes of tests obsolete.
Stop using print! Understanding and using the "logging" module
If you're like me, then you've long known about Python's "logging" module, but you've ignored it because it seemed too complex. In this talk, I'll show you that "logging" is easy to learn and use, giving you far more flexibility than you can get from inserting calls to "print" all over your code. I'll show you how you can start to use "logging" right away -- but also how you can use it to create a sophisticated logging system that sends different types of output to different destinations. After this talk, you'll know how to use "logging", and you'll be less likely to use "print" in your applications.
Asyncio Evolved: Enhanced Exception Handling with TaskGroup in Python 3.11
With the release of Python 3.11 in October 2022, PEP 654 "Exception Groups and except" was accepted, and asyncio.TaskGroup() was added. This enhancement of exception and cancellation handling has allowed asyncio to evolve more flexibly, addressing the existing issues with asyncio APIs, such as insufficient cancellation and exception handling in asyncio.gather.
In this talk, I would like to discuss the problems of existing asyncio APIs and how the newly introduced asyncio.TaskGroup() solves these issues. Attendees will learn about the improved way of handling exceptions and cancellations using asyncio.TaskGroup(), enabling them to write more efficient and robust asynchronous code with Python 3.11.
Unlocking Healthcare data: the power of Open Formats in Python Data Science
Are you a data scientist or developer working in healthcare? Are you tired of dealing with proprietary data formats for biological and vital sign information? It's time to unlock the power of open data and make your research more impactful.
In this talk, we'll explore how you can leverage Python analytics to manipulate and analyze complex datasets of patient information, including blood work, ECG, EEG, echocardiography, radiography, and more.
We'll also dive into the world of open data formats, and show you how using these formats can make it easier to anonymize, convert, and collaborate on research.
Don't miss this opportunity to learn how Python analytics and open data formats can help you unlock the insights hidden in your data and improve patient outcomes.
Gathering data from the web using Python
Information is abundant and readily available on the internet. However, the sheer amount of data can be overwhelming and time-consuming to navigate through. That's where web scraping comes in - a powerful tool used to extract data from websites and turn it into a usable format.
In this tutorial, we will explore the basics of web scraping and how to implement it using Scrapy (a Python framework). Whether you are a data analyst, programmer, or researcher, this tutorial will equip you with the fundamental skills needed to create your own web scraper and extract valuable information from websites.
Story Generation using Stable Diffusion in Python
Recently, most works focus on synthesizing independent images; While for real-world applications, it is common and necessary to generate a series of coherent images for story-telling. In this work, we mainly focus on story visualization and continuation tasks and propose AR-LDM, a latent diffusion model auto-regressively conditioned on history captions and generated images. To my best knowledge, this is the first work successfully leveraging diffusion models for coherent visual story synthesizing.
Geospatial Data Processing in Python: A Comprehensive Tutorial
In this tutorial, you will learn about the various Python modules for processing geospatial data, including GDAL, Rasterio, Pyproj, Shapely, Folium, Fiona, OSMnx, Libpysal, Geopandas, Pydeck, Whitebox, ESDA, and Leaflet. You will gain hands-on experience working with real-world geospatial data and learn how to perform tasks such as reading and writing spatial data, reprojecting data, performing spatial analyses, and creating interactive maps. This tutorial is suitable for beginners as well as intermediate Python users who want to expand their knowledge in the field of geospatial data processing.
Vector data cubes as a bridge between raster and vector worlds
This talk introduces the concept of vector data cubes - multi-dimensional arrays where at least one dimension is composed of vector geometries - and its implementation in Python within a new library Xvec, built on top of Xarray, Shapely 2.0 and GeoPandas.
Dive into codebase like a pro
How to get familiar with codebase you need to maintain with minimum suffering? How to leave codebase easier to deal with for your colleagues so they don’t have to suffer like you did?
If you are experienced developer or a junior just starting your journey, inheriting codebase can be a very challenging task. Especially if the codebase is not quite up to your standards, or it’s just huge and complex beast.
I will convey my experience and tips and tricks on inheriting code I acquired during 12 years of software development on new and old projects.
The talk will provide guidelines to ease taking over code from somebody else, as well as remind developers of the importance that planning, preparation and documentation have in facilitating code change and project growth.
Unlocking the Power of Raft Consensus with rqlite using Python
Distributed databases are widely used in modern applications for their high availability and scalability. Have you ever wondered how data integrity is maintained with the data across multiple nodes? One of the key components of achieving this is distributed consensus. Raft is a widely used consensus algorithm that provides a fault-tolerant and highly available system. In this talk, we will explore how to implement Raft consensus using the rqlite distributed database in python.
The Python's stability promise
Many modules you use and love have a portion of their implementation written in other languages, and for that a Python extension need to be made. Python offers a C-API that allow people extending the language, and being a nice glue-language, C is also a bridge to many other languages as well.
So if everything is simple, what's the deal with stability? Changes in the C-API might break the functionality in older versions, so PEP 387 saves the day with a policy for backward compatibility. Starting from Python 3.2, the Limited API was introduced, which defined a subset of Python's C-API that it's promised that if used, the code can be compiled in one version, and run in many others as well.
Also, having a Stable ABI compatible wheel, allow you to only have one-wheel-per-OS, and not one-wheel-per-python-version, which can simplify your release process.
This talk will introduce the Limited API concept, and provide the necessary information to include it in your project.
Continue Thinking Small: Next level machine learning with TinyML
The Internet of Things has been flourishing for many years, and Python has been playing an important role on the “easy to automate” topic for many devices. One of the challenges for the next generation ML is to think small, you read that right “thinking small”. It’s time to start being able to have mechanisms with super well-trained ML models in small-devices: ML on Microcontrollers. We are going to dive into TinyML and evaluate different setups to interact with sensors on microcontrollers. We will discuss the different hardware options and frameworks to start with, while checking different use cases that TinyML can solve, like: agriculture, conservation, health issues detection, ecology monitoring etc. In this talk, you will learn about Tiny Machine Learning (TinyML), which is an approach that explores machine learning to be deployed in embedded systems that enable run ML on microcontrollers. Lastly, we will discuss real use-cases and a practical case that could be implemented at home
Generative AI: Beyond technicalities – an ethical perspective
Machines have become smarter than ever before. Recently, we have started using computers for solving problems beyond computations, and it might not be wrong to call them electronic creators. The future laptop might have a prompt based word application, replacing the current Word, where one has to type their thoughts and formulate an entire document from scratch. Similarly, we might see a prompt based Paint application, instead of the typical Paint program, that generates the paintings for us. In my opinion, AI-based applications are not in fiction anymore, and we may soon be using them on our computers. However, there is a possibilty that the Generative AI can be potentially harmful for society. We need to explore the ethical concerns, and how the AI can impact our society. In this talk, we will try to understand how Generative AI is becoming a part of our future and how we can use it in a responsible and ethical manner.
What my 300+ fantastic young students taught me about Python.
Computer pRogramming, Technology, Bit-coinism Success, Climate Change and Billionaires are all associated with one another. This talk will describe how a cohort of 299+ young people (aged 11-14) were introduced to Python Programming, at the same time, for the very first time. And in this talk, I would like to share with a great secret in that I have actually learnt more from the young students than they learnt from me. This talk is about how these young people have opened my eyes, mind and heart about alternative ways of looking at and appreciating:- The humble IF statement; the under-rated FOR loop, the dry Return Statement, the functional Maths & Random modules, etc. as if one were an artist. We will talk about the renewed delight of looking at these things from fresh pairs of eyes and how we can take these new learnings forward.
How LocalStack is recreating AWS with Python
At LocalStack, we are building a platform that enables development and testing of cloud applications on your local machine. The core is an open source AWS emulator that is primarily written in Python. It is among the top Python projects on GitHub, and has seen a massive uptake in contributions over the past two years. Many Python software developers and architects will relate to the struggles of maintaining a large and complex Python codebase, while keeping developer teams productive. In this talk, we'll explore how we at LocalStack tackle these as we re-create AWS for local development. We'll explain our approaches to automating around AWS specifications, building a highly modular and pluggable system to make it easy for teams to integrate their components, the software patterns we use to keep devs productive, as well as our approach to automated contract testing using pytest.
Apache Spark vs cloud-native SQL engines
Currently, SQL and Cloud Data Warehouses (DWH) are extremely popular for good reason. They are great for dashboarding and business intelligence (BI) use cases due to their ease-of-use. However, their combination might not be the best choice for every problem. More precisely, business-critical data pipelines with high complexity might be better suited for frameworks such as Apache Spark which greatly benefit from the tight integration with general purpose languages like Python (e.g., PySpark).
Expect an opinionated comparison between Apache Spark and seemingly easier-to-use cloud native SQL engines. By the end of this talk, you will be challenged to think about why they are complementary and when each has its justification.
Breaking the Stereotype: Evolution & Persistence of Gender Bias in Tech
Did you know that originally programming was a female-heavy field? How did we get to the stereotype of the antisocial programmer (and therefore male)?
How the concept that good programmers appeared to have been “born, not made” is still affecting our tech industry and society.
Asyncio without Asyncio
This tutorial aims to demystify asyncio builtin module by implementing it from scratch without any dependencies other than the Python Standard Library. We will go through the problem of blocking IO and how it is possible to solve it without single "async" and "await" statement using native Python concepts. Then, we will demystify async/await syntax and see how it is implemented. We will also build our own scheduler which will have a similar API as asyncio, which will be able to run async functions the same way asyncio does. And finally we will build real asynchronous http proxy using our own asyncio implementation. Why reinvent the wheel? - "I hear and I forget. I see and I remember. I do and I understand.".
Don’t Panic! A Developer’s Guide to Security
As a developer, you play a crucial role in the security of your projects. At the same time, it can be difficult to know if what you’re doing is enough. Luckily, you don’t have to be a security expert to contribute to the security of your projects. Instead, you can use industry standards as a guide for your approach to security.
In this talk, I will introduce you to a framework that is especially accessible to developers, the OWASP DevSecOps Maturity Model, and help you get started with a systematic approach to improving the security of your projects.
Test your data like you test your code
I will introduce the concept of data unit tests and why they are important in the workflow of data scientists when building data products. In this talk, you will learn a new tool you can use to ensure the quality of the products you build.
Games of Life: generative art in Python
We're entering the age of machine-generated art. Many of the new systems are shockingly impressive but impossible to replicate by individuals because they rely on complex machine learning techniques with huge datasets that aren't feasible to do in a home environment. Fortunately, there's an entire group of clever approaches to generate graphics that look cohesive, unique, and deliberate... and that you can easily do on your own computer.
In this short talk we'll go through a few of those algorithms like Clifford attractors, slime mold simulation, and reduction of source imagery to geometric primitives. We'll generate images and animations, we'll dabble in 2D and 3D. You'll leave the talk with your own ideas how to create attractive visualizations out of thin air. The talk assumes familiarity with Python and high-school math.
Python 3.11 What’s new?
The topic aims to introduce participants to the latest from Python in version 3.11, released in early October 2022, which includes:
• Speed improvements; • Standard Libraries Improvements; • Self type; • Exception Notes; • Better Error Messages; • Improved Type Variables; • Variadic generics; • Marking individual TypedDict items as required or potentially missing; • Arbitrary literal string type; • Data class transforms; • TOML read-only support in stdlib; • Exception Groups; • Negative Zero Formatting.
Understanding Neural Network Architectures with Attention and Diffusion
Neural networks have revolutionized AI, enabling machines to learn from data and make intelligent decisions. In this talk, we'll explore two popular architectures: Attention models and Diffusion models.
First up, we'll discuss Attention models and how they've contributed to the success of large language models like ChatGPT. We'll explore how the Attention mechanism helps GPT focus on specific parts of a text sequence and how this mechanism has been applied to different tasks in natural language processing.
Next, we'll dive into Diffusion models, a class of generative models that have shown remarkable performance in image synthesis. We'll explain how they work, how they're different from other generative models, and their potential applications in the creative industry.
By the end of the talk, you'll have a better understanding of these cutting-edge neural network architectures. We'll also give some examples of how easily you can use them in your own projects.
Building and Deploying Fair and Unbiased ML Systems : An Art, Not Science
There has been a renaissance around Artificial Intelligence systems in recent years. However, despite the hype, only a small percentage, i.e. 13% of Machine Learning models see the light of day! Well, effectively building and deploying machine learning models is more of an art than science! ML models are indeed inherently complex, have fuzzy boundaries, and rely heavily on data distribution. But what if they are trained on biased data? Then they’ll generate highly biased decisions! As the famous saying goes by, “Garbage in, garbage out,” so if the model is trained on skewed and unfair data distribution, they are bound to produce fuzzy output! So, join me in this talk as I will share my learnings in developing effective practices to build and deploy ethical, fair and unbiased machine learning models into production.
OCR, information through images
The acquisition and processing of images to find information is a field of multiple possibilities since the world has a lot of visual information that applied to different areas can demonstrate its great potential
Building native Rust modules for Python
We'll cover the basics of Rust and demonstrate how to create a Rust module that can be imported and used within Python. Discover the advantages of using Rust in Python, especially regarding improved performance.
Develop your Python cloud applications offline with LocalStack
Waldemar Hummer, Thomas Rausch, Alexander Rashed
This tutorial provides a hands-on introduction to LocalStack - the leading platform to develop and test cloud applications entirely on your local machine!
LocalStack provides a set of 70+ AWS services, running in a local Docker container. The hugely popular open source project (46k+ Github stars, 130+ million downloads) is today considered a “must-have” in the toolbox of every AWS cloud developer around the globe.
Outline: (1) Intro to AWS cloud development with Python (2) Developing Python cloud apps with LocalStack (3) Advanced integrations for IaC and CI/CD pipelines (4) Python internals & advanced features in LocalStack (5) Summary and wrap-up
This interactive session covers live coding to showcase common use cases, settings for local debugging of Lambdas and containerized apps, as well as advanced features that can radically improve team collaboration. We'll also glance over the large ecosystem of tools & integrations - including Terraform, Pulumi, CDK, Serverless.
PyScript and the magic of Python in the browser
Python running on the browser is the new frontier to creating true client-side web and mobile applications. Today we can many incredible things that were not possible just a few months ago before WASM, Pyodide and PyScript.
The talk will cover what's possible today, cover the major features offered by PyScript and walk through creating amazing applications and games with Python, on the browser, without the need for Python server-side logic.
How Python can help monitor governments
Judite Cypreste, Patricia Bongiovanni Catandi
With the risk of losing access to information, Python has been used to create means for society to continue having the right to know what government officials are doing in Brazil.
This lecture aims to show how the difficulty of accessing Brazilian government information has been combated by creating tools that use Python and how the language has been a useful tool for those who seek to leave society in the light of information.
Performance tips by the FastAPI Expert
Is your FastAPI really fast? Did you benchmark it, or you just have faith?
On this talk, Marcelo will give tips to improve the performance of your FastAPI application, and you’ll see how impactful those changes can be.
PyTorch 2.0 - Why Should You Care
Pytorch is one of the most popular machine learning frameworks, and its latest iteration (PyTorch 2.0) landed just a couple of days back. Among other things, PyTorch 2.0 offers faster performance with a fully backward-compatible API that guarantees the development ergonomics that PyTorch is known for.
In this talk, we will examine how practitioners (researchers and engineers) can benefit from optimizations provided by PyTorch 2.0 and what other improvements are on the horizon.
Python Linters at Scale
Black, Flake8, isort, and Mypy are useful Python linters but it’s challenging to use them effectively at scale in the case of multiple codebases, in a large codebase, or with many developers. Linter analysis on large codebases is slow. Linters may slow down developers by asking them to fix trivial issues. Running linters in distributed CI jobs makes it hard to understand the overall developer experience.
In this talk, we'll walk you through solving those scaling problems using a reusable linter framework that releases new linter updates automatically, reuses consistent configurations, runs linters on only updated code to speedup runtime, collects logs and metrics to provide observability, and builds auto fixes for common linter issues. Our linter runs are fast and scalable. Every week, they run 10k times on multiple millions of lines of code in over 25 codebases, generating 25k suggestions for more than 200 developers. Its autofixes also save 20 hours of developer time every week.
From Dataset to Features: A Python-Based Evolutionary Approach
Neeraj Pandey, Hitesh Khandelwal
Multilabel classification is a machine learning task in which each instance is assigned to a group of labels. It has gained widespread use in various applications in recent years. Preprocessing, such as feature selection, is an important step in any machine learning or data mining task. It helps to improve the performance of an algorithm and reduce computational time by eliminating highly correlated, irrelevant, and noisy features. A new algorithm called Black Hole, inspired by the phenomenon of black holes, has recently been developed to tackle multi-label classification problems. In this talk, we present a modified version of the Black Hole algorithm that combines it with two genetic algorithm operators: crossover and mutation. The combination of Black Hole and genetic algorithms has the potential to solve multi-label classification problems across a range of domains.
Leveraging the power of Django REST Framework's renderers with HTMX.
HTMX has been quite popular lately in the Django circles and has demonstrated how powerful it can be with vanilla Django. But... have you thought about HTMX paired with Django REST Framework and more specifically paired with DRF's flexible renderer system?
sktime - python toolbox for time series
This tutorial presents sktime - a unified, open source framework for machine learning with time series in python. sktime provides interfaces to algorithms of various types, and modular tools for pipelining, composition, and tuning. You will learn how identify your learning task, and how to build, use, and evaluate different algorithms on real-world data sets.
Serve notebook as a Web App with Mercury framework
Make your coding meaningful to non-technical recipients! Write your code in Jupyter Notebook, add widgets with the Mercury framework, and easily turn your notebook into an interactive web app. Or.. create a dashboard, a report, and DEPLOY it.
The needle and the haystack: visualizing single datapoints out of billions
Python tools like Bokeh and Dash let you build custom Web-based interactive visualization apps and dashboards. While these solutions work well to visualize megabyte-sized datasets, web technologies struggle to render gigabyte or larger datasets efficiently, because they transfer all the data into the client browser. Pre-rendering the data on the server using a tool like Datashader can visualize such large datasets efficiently, but the resulting static renderings make exploring individual datapoints difficult.
This talk demonstrates how the HoloViz ecosystem of tools (holoviz.org) allows you to run exploratory notebooks and build dashboards that do server-side rendering of billions of data points without losing the ability to interactively inspect and annotate individual samples in the browser.
Running Python packages in the browser with Pyodide
Pyodide is a port of CPython to WebAssembly/Emscripten enabling Python packages to run directly in the browser or Node.js. We will provide an overview of Pyodide's architecture, capabilities, and potential use cases before looking into building, running, and testing Python packages for the browser.
We will also discuss how browser-specific optimizations, such as code splitting, tree shaking, and lazy loading could be adapted to Python to reduce package size and load time.
Finally, we will mention some of the common restrictions of the browser runtime and how they can be overcome in Python packages.
Rethinking Graph algorithms: introducing GraphBLAS
What if graph algorithms could be expressed as linear algebraic operations ? And what if this translation would make graph algorithms super efficient so that this would represent a viable and scalable alternative for high-performance graph analytics ? GraphBLAS provides a powerful and expressive framework for creating graph algorithms based on the elegant mathematics of sparse matrix operations on a semiring. In this talk, we will introduce the general concepts and the theory behind the GraphBLAS standards. We will explore practical examples using
python-graphblas, i.e. the official Python API to
GraphBLAS, and its integration with
Threat to Life: Preventing Planned Murders with Python
At the Netherlands Forensic Institute (NFI), we've developed a Python-based deep learning model to spot life-threatening messages in lawfully intercepted communication data, like those from the infamous chat service Encrochat.
Thanks to the application of our model in collaboration with the Dutch Police, dozens of potential victims of violent crimes, including murder, serious assault, and kidnapping, have been warned and safeguarded. In this talk, we'll dive into the development, implementation, and success of our deep learning model in the fight against violent criminal activities. We'll also tackle the risks tied to using deep learning for these cases and discuss the precautions we took to ensure responsible and accurate use.
Working in Units: How to Decouple the Database and Domain Layers in Python
A crucial element of architecting a software application for scale is the collaboration of domain experts and developers. For that to happen, the application must separate the domain layer— where elements that represent the real world reside—from the infrastructure layer—where these elements are translated into precise software processes.
Within the Fintech team at Kiwi.com, we are rearchitecting a critical service to accept more payment providers. As part of this refactor, we are adopting the Unit of Work pattern to disentangle domain entities from the database processes that represent them. This way, domain experts can share their knowledge with developers more easily, and developers can find opportunities for optimization without the involvement of domain experts in the process.
Attendees will gain a solid understanding of how to implement the UoW pattern in their Python applications, how it fits into the broader context of DDD, and how to prepare their code for future growth.
Talks: How we are making CPython faster. Past, present and future.
Python 3.11 is considerably faster than 3.10. How did we do that? And how are we going to make 3.12 and following releases even faster?
In this talk, I will present a high level overview of the approach we are taking to speeding up CPython. Starting with a simple overview of some basic principles, I will show how we can apply those to streamline and speedup CPython. I will try to avoid computer science and software engineering terminology, in favor of diagrams, a few simple examples, and some high-school math. Finally, I make some estimates about how much faster the next few releases of CPython will be, and how much faster Python could go.
Word Wranglers & News Navigators: Taming GPT-3 Beast for Media Monitoring
The emergence of ChatGPT has led to an exponential growth of prospects and implementations in the field of Natural Language Processing (NLP). Various teams were struck with FOMO (Fear of Missing Out) and hastened to incorporate Large Language Models (LLMs) into their products. By using OpenAI models (text-curie-001, davinci, gpt-3.5-turbo), we successfully integrated them into our production on March 2, granting our users the ability to receive text summaries in their email reports and comprehend the essence of any article within our application. Three weeks later, we trained our own large language model for the same purpose. This talk will delve into our journey, exploring the lessons and insights gleaned from our hands-on experience with these cutting-edge tools.
The coding conventions that makes our lives easier
Discover how coding conventions can enhance code quality, readability, maintainability, and reduce errors. Join us as we discuss the creation and implementation of coding conventions, and how to use linters for maintenance.
Time Made Easy: Simplify Date and Time Handling with Python's Pendulum
Pendulum is a Python package for working with dates, times, and timezones. It offers a simple and intuitive API for common date/time operations and provides advanced functionality for dealing with more complex scenarios. Some of the interesting points of Pendulum include support for leap years, time zones, and daylight saving time, as well as a fluent API for creating and modifying dates and times.
One of the standout features of Pendulum is its support for time zones. The library comes with a comprehensive list of time zones, and it can automatically adjust dates and times to the local time zone of a given location. Additionally, Pendulum can handle time zone conversions with ease, making it easy to work with date/time data across different time zones.
Pendulum also provides a powerful API for creating and modifying dates and times. With its fluent interface, developers can create and manipulate dates and times using a natural, human-readable syntax.
Architecting Data: A Programmer's Guide to Synthetic Data
Finding good datasets or web assets to build data products or websites with, respectively, can be time-consuming. For instance, data professionals might require data from heavily regulated industries like healthcare and finance. In contrast, software developers might want to skip the tedious task of collecting images, text, and videos for a website. Luckily, both scenarios can now benefit from the same solution, Synthetic Data.
Synthetic Data is artificially generated data created with machine learning models, algorithms, and simulations, and this workshop is designed to show you how to enter that synthetic world by teaching you how to create a full-stack tech product with five interrelated projects. These projects include reproducible data pipelines, a dashboard, machine learning models, a web interface, and a documentation site. So, if you want to enhance your data projects or find great assets to build websites with, come and spend 3 fun and knowledge-rich hours in this workshop.
Designing an HTTP client
HTTPX is a fully featured HTTP client for Python 3, which provides sync and async APIs, and support for both HTTP/1.1 and HTTP/2. It also includes a built-in command-line client.
We'll be taking a look at the architecture of the client, learning from the design decisions behind it, and gaining a better understanding of HTTP along the way.
Apache Arrow and Substrait, the secret foundations of Data Engineering
Apache Arrow, and its Python library PyArrow are becoming the standard de facto for transfering data and interoperability between libraries and languages. As more compute engines, storages and databases start to speak arrow, you might be relying on it without even knowing. The same transformation is happening with Substrait, that is on track to be the standard representation of query plans themselves. Allowing queries to be routed to different engines as far as they speak substrait, or even decomposed and forwarded to different engines. This talk we will provide a quick introduction to the Arrow ecosystem, showing to Python developers how libraries like Pandas, Polars and PyArrow itself leverage Arrow and how compute engines like Velox, Datafusion and Acero are embracing Arrow and Substrait. The talk will also show how a basic database system based on Arrow and Substrait can be built with a minimum amount of code thanks to all the foundations they provide.
A Brief History of Data Storage
For millennia, humans have known things. Pretty quickly, we started writing them down; our brains aren't very good at storing all the things we know reliably, and we needed something more durable.
A long time ago, this meant clay tablets with cuneiform on them, and things have only got more complicated from there. Nowadays, we try to store data so that computers can understand it too, and that's given us a bewildering array of options - portable hard drives, magnetic tape storage and so much more.
In this talk, we'll take a look at the history of data storage, and discuss why some methods have worked better than others. We'll talk about why writing things down for humans is different than doing it for computers, and why it's difficult to do both at the same time (this is what code is). Finally, we'll look at today's state-of-the-art for keeping data safe, and discuss what the future might hold.
This talk has no prerequisites, although a fondness for weird facts will help!
Music information retrieval with Python
The advancements of artificial intelligence in computer vision and natural language processing often make the headlines, but the subspace of musical AI is developing just as rapidly. Let’s take a dive into the research area of music information retrieval and see how Python enables some of its proudest achievements. You’ll learn about common MIR tasks and get ideas on how you can analyze, generate and interact with music using code, so you can start exploring right away! No music theory knowledge nor prior experience with MIR is expected.
From Algorithms to Agendas: A Beginner's Guide to Integer Programming
This talk will provide an introduction to Integer Programming and demonstrate how it can be used for conference scheduling. We will explore the basics of Integer Programming and how it can be applied to optimize the allocation of talks to time slots and rooms in a conference program. By the end of the talk, attendees will have a better understanding of how this powerful tool can help to create an efficient and effective conference schedule that maximizes attendee satisfaction. Whether you're a conference organizer or simply interested in learning more about optimization algorithms, this talk is for you!
An unbiased evaluation of environment management and packaging tools
Python packaging is quickly evolving and new tools pop up on a regular basis. Lots of talks and posts on packaging exist but none of them give a structured, unbiased overview of the available tools.
This talk will shed light on the jungle of packaging and environment management tools, comparing them on a basis of predefined features.
No Holds Barred Web Framework Battle
Which framework is better: Django, FastAPI, or Flask? What about the host of less well known options? Also, does full stack matter or are micro frameworks the way to go? This talk is a frank discussion of which frameworks you should consider using in 2023, and which frameworks should be avoided.
Caching in microservices
There are two hard problems in programming: naming things and cache invalidation. I'll cover the latter, in a microservice-based system. Given a fairly standard setup with API Gateway and a backend service with its own database, I'll show how to implement cache that allows us to avoid database queries without modifying API client.
The whole talk is based on live coding.
Poisoned pickles make you ill
Don’t you love pickles? In the data science space, the pickle module has become one of the most popular ways to serialise and distribute machine learning models - yet, pickles introduce a wide range of problems. For starters, it is incredibly easy to poison a pickle. Once this happens, a poisoned pickle can be used by an attacker to inject any arbitrary code into your ML pipelines. And what’s even worse: it’s incredibly hard to detect if a pickle has been poisoned!
Good news? Help is on the way! You now have access to an increasing number of tools to help you generate higher-quality pickles. And when those are not enough, you can always draw inspiration from the DevOps movement and their trust-or-discard processes.
This talk will show you how widespread pickles are and how easy it is to poison models serialised with pickle, but also how easy it is to start protecting them from attacks.
Deep Dive into Asynchronous SQLAlchemy - Transactions and Connections
SQLAlchemy is one of the most popular ORM libraries in Python. In this talk I will try to present caveats and gotchas that other Pythonists can find on their way while writing the asynchronous backend application using SQLAlchemy as an ORM. Mainly we will focus on how SQLAlchemy handles transactions and connections to the database and what issues we may face because of it.
The Python package repository accelerating software development at CERN
Python’s expressive syntax, ease of use, and powerful ecosystem of third-party packages are all major contributing factors to its thriving use for accelerator controls at CERN. Providing access to this rich ecosystem in a protected environment, whilst also allowing developers to augment this with internally developed packages is a key enabling service. Existing open-source solutions didn’t meet our needs, and the evolving Package index standardisation, as well as exposure to dependency confusion attacks, left us searching for a more modular and flexible approach.
In this presentation we will demonstrate the Python package upload, index, and browsing services developed at CERN. We will discuss the gradual transition from our existing repository service (based on Nexus), and demonstrate - with the help of recent packaging PEPs - the flexibility that modularising the services has brought, helping us to meet our needs for local specialisation and enhanced security measures.
f"yeah!" - How we are supercharging f-strings in Python 3.12
Pablo Galindo Salgado, Marta Gomez
Everybody loves f-strings in Python. But what if they could be even better? Thanks to PEP 701, Python 3.12 will ship with an improved version of f-strings that will once and for all fix the little remaining problems that f-strings have had, while also supercharging them with new cool powers. In this talk, you will discover the dark little secrets of how f-strings were being processed before Python 3.12 and the many things that didn't work and you didn't know about. You will learn how we changed thousands of lines of manually written C code without anybody noticing, how we changed the oldest part of CPython so quotes behave like parentheses, and how we taught the PEG parser to understand f-strings. Plus, you'll gain an understanding of how these new and improved capabilities will provide several advantages for both end-users and library developers, while also reducing the maintenance cost of the CPython implementation.
Instrumenting CPython with eBPF
eBPF is a amazing technology that can run sandboxed programs in a privileged context such as the operating system kernel. But are eBPF programs limited to the operating system kernel? eBPF programs have fast access to resources like memory. These programs can access the memory of running Python applications very faster, allowing you to instrument Python processes with low overhead!
In my presentation, I will show how Python's internal structure supports instrumentation through the use of eBPF. Following that, we'll experiment with eBPF and other modern techniques to instrumenting the Python applications. I'll explain explain why eBPF is more appropriate and efficient technology for instrumentation. By the end of the session, we will have developed an eBPF-based simple tracing tool for instrumenting Python applications.
After this presentation, you will better understand how eBPF can help you in the instrumentation of Python applications.
Practical tools for documentation at scale
In a hands-on workshop I'll introduce some of the tools and methods I have developed to improve documentation consistently and effectively, at scale - by a thousand people or more, working on a hundred or so software products and other projects.
Give your program Appeal!
This talk presents Appeal, a new library for command-line parsing in Python. Appeal avoids the cumbersome APIs and repetition endemic to the currently prevalent libraries in this space by leveraging Python's own function call interface. This talk will familiarize the audience with Appeal, its motivation, its approach, and its expressive power, and show them how to use Appeal in their own programs.
Pydantic: Making life easier with data validation
Abstract: Data validation is a critical component of any software application, ensuring that the data processed by the application is accurate and consistent. However, data validation can often be a tedious and error-prone process, especially when dealing with complex data structures. Pydantic, a powerful and flexible data validation library for Python, simplifies the process of data validation by providing a declarative syntax that is easy to read and write.
The Power of Spec-Based Testing:Adding Functional Requirements to Unit Test
Testing is a crucial part of the software development process. But, with so many testing techniques available, it can be challenging to know which one to use. While unit testing is a popular technique, it's not always the most effective or efficient way to ensure software quality. In this talk, we’ll explore spec-based testing, a technique that focuses on verifying that the software behaves in accordance with its specifications or requirements.
The digital State of the European Union
What is the European digital identity? How can you access digital public services from another EU country? Why is it so hard to create an European ecosystem of digital services? Does the EU support open source?
This (opinionated) talk will present the current State of the digital services in the EU. Will summarize the normative and technical challenges, and their impacts on the resulting platforms in terms of UX, cybersecurity and maintainability.
Adding zero-downtime migrations strategy in a SaaS project
Zero-downtime migration is a technique for running database migrations without stopping the web app. As clients' databases grow larger, applying necessary updates to the database can become time-consuming or potentially break the database schema. This talk will describe problematic operation types and provide a strategy for writing and running migrations to release new software versions without downtime.
Dynamically generated methods with a non-generic signature
In other words, Descriptors + PEP-362 (function signature object) and a seasoning of PEP-487 (simpler customization of class creation via
There are different ways to have generated methods and attributes attached to all classes in a library, and this talk presents the way we’re doing it in scikit-learn. Here you’ll understand the use-case, and see the details and challenges presented by it, and how we approached them.
From idea to production
In this talk we will take you through a complete journey a website takes - from conception to running in production, the right way.
What is the best setup for local development, how to then move to testing and production? An opinionated talk from two veterans.
Teaching Children Python-What Works?
We will explore the latest research on how children gain programming knowledge, how to keep them interested and excited, and how this might inform the way we support adult newcomers to programming. Practical advice and suggestions for activities will be given to attendees.
Pygoat - Learn Django security the hard way
Learn to secure your Django apps by attacking (and then securing) Pygoat - An intentionally vulnerable Python Django application. Explore the OWASP top 10 vulnerabilities and understand how to mitigate them from Django apps.
GraphQL Subscriptions: Real-time Data with WebSockets* and Strawberry 🍓
Bring your GraphQL APIs to life with real-time data using Strawberry! 🌟 In this talk, we'll dive into GraphQL Subscriptions and explore how to leverage WebSockets for interactive, real-time updates. Say goodbye to constant polling and hello to efficient, seamless communication!
- Understanding GraphQL Subscriptions and their role in real-time data delivery.
- Setting up WebSocket connections and integrating them with your GraphQL server using Strawberry.
- Designing subscription schemas and handling server-side events for seamless updates.
- Enhancing client-side experiences with real-time data and updates.
Lessons from Prague
Our EuroPython takes place in Prague - a city with some lessons for us, about programming, software and technology. More than 100 years ago Prague produced buildings that hint at how far our ideas in software might take us, and writers and artists who imagined challenges that have lately become real.
Language Model Zen
Beautiful is better than ugly.
The frontier of AI Language Models awaits exploration.
We, Pythonistas, face choices on how to use these tools.
Advanced models like GPT-4, BARD, and LLaMa generate human-like responses.
The nature of Language Models is fear,
But tools like TransformerLens show The Way.
Understanding The Model is possible.
The nature of Language Models is excitement.
Using them out of the box is one option.
Prompt engineering is another.
ChatGPT plugins and LangChain offer a third choice.
Fine-tuning them presents a fourth.
Training them from scratch is the fifth option.
Not using them at all is the final option. It may be safer.
The output for one LM is the prompt for another.
While openai is an excellent library, and
LangChain composes language models and utilities.
GPT's plugin system also composes language models and utilities, and
There should be one-- and preferably only one --obvious way to do it.
Fighting Money Laundering with Python and Open Source Software
In this talk proposal, we will discuss how to detect the chain of fraudulent transactions and help the investigation agencies by providing useful insights to fight money laundering with the help of Python programming language and packages.
Introducing Incompatible Changes in Python
Python 2 to Python 3 migration used the D-day approach which failed. We learnt from our mistake and we are introducing incompatible changes differently now. Document changes, provide a way to write code compatible with the old and the new way, tooling to ease the migration, design long term approach to reduce the need for incompatible changes.
We can get more from spatial, GIS and public domain datasets!
- Are prices of short-term rental apartments in your region similar? How similar are they, and at which distance do they tend to be correlated?
- Do you have access to a few air pollution measurements but must provide a smooth map over the whole area?
- Is your machine learning model based on remote sensing data from Earth Observation satellites, and do you want to include data sampled on Earth?
- Do you work with county-level socio-economic factors, but you want to get insights at a finer scale?
if any(answer), then come and see what we can do with the
pyinterpolate package designed exactly for spatial interpolation!
We are Python Weekend!
Alena Osipova, Andrej Zaujec, Lukas Kubis
code.kiwi.com community has been running Python Weekend — an educational community project — since 2016. Over the past 7 years we helped hundreds of Python developers complete the program accelerating their careers, traveled to 10+ cities all over Europe and collaborated with numerous local Python communities to make it happen.
At first glance, Python Weekend is a 2.5-day supervised coding event for Junior+ Python devs, where participants build the prototype of core Kiwi.com technology, while getting support from a group of experienced engineers, for free. But there’s so much more to this.
At this poster session, we will share how to run an educational Python project so the community, business and local tech scene would benefit. We will show how to shape the culture of connecting real-business challenges and junior talent. We will see how dev edu projects can impact the culture of mentorship and present feedback of mentees from all over the world to share their experience.
Zero downtime deployments: Is it worth the effort?
Learn about the advantages and disadvantages of zero downtime deployment strategy, as well as best practices for implementing it in your organization. Learn how to make changes to production systems while keeping users up to date. Don't pass up this chance to optimize your software deployments.
How well do we understand our Universe? Let’s Python it out!
As our understanding of the Universe is expanding, the desire to model the physics that govern cosmic evolution is more evident than ever, driving the emergence of cosmological simulations that model the Universe from the beginning of time till present day. In combination with Machine Learning, they allow for an unprecedented capability; one can train AI models on simulations, where the evolution history of galaxies is available, that can in turn be applied on real galaxies. In this work, we propose the use of Python as a ML tool, through the popular library Tensorflow, to quantify the impact of different cosmological models on the derivation of the history of galaxies. Python accompanies us at every step of the way, from creating the datasets and training the probabilistic neural networks to the visualization of the results, as we attempt to shed light on the cosmic past of galaxies, surpassing the unshakeable reality that we can only observe them at a specific moment in time.
What a screen reader can teach you about remote Python debugging
The NVDA screen reader is a Python application embedded in a C++ program for performance reasons. Its features can be extended through addons that are also written in Python, so a way to debug both the core and the addons code is highly desirable.
However, debugging code that runs within an embedded Python is a bit tricky, and the task gets even more complicated if you are a blind programmer and hitting a breakpoint freezes the screen reader that you are using to access the computer!
In this talk I will teach you how I found a solution to this challenge by using the Microsoft's debugpy library for remote debugging, a technique taht could be useful to debug other C++ apps that use embedded Python. I will explore how remote debugging works, how to setup a debugging environment using VSCode and how to perform a debugging session attaching to the embedded Python as a remote process.