From Jupyter Notebooks to a Python Package: The Best of Both Worlds

Level:: beginner
Room:: terrace 2a
Start:: 12:10 on 19 July 2023
Duration:: 45 minutes

Abstract

A Jupyter notebook is quite handy for rapid REPL (Read-Eval-Print-Loop) style tasks such as exploratory data analysis and data science. However, we would feel deficiencies in proper SW engineering supports at some point as the notebook grows to have larger and more complicated code. It is because the Jupyter notebook lacks several important features including code sharing, refactoring support, version control and advanced editing. Fortunately, traditional full-fledged IDEs, such as VS Code or PyCharm, are available at hand and they support these lacking features very well. Then, why don’t we take advantage of the best of both worlds?

In this beginner-level hands-on talk, I will demonstrate how to transform Jupyter notebook workflows to a proper Python package using VS Code. I will also introduce several basic but essential refactoring recommendations. By doing so, you can use the package for several notebooks and even share with your colleagues and friends.

TalkPyData: Software Packages & Jupyter (2023)

Description

Notebooks, code and slides are available in this repo.

Introduction

Jupyter Notebook
- Provides ideal workflows for data science
- Pros: REPL, interactivity, integration of code / output / documentation, visualization, rapid prototyping, result sharing, etc.
- Cons: lacks of debugging, code sharing, refactoring, version control, advanced editing, etc.
Full-fledged IDEs
- Designed to maximize programmer productivity
- One iteration might take a long journey
We can benefit from the best of both worlds by using a Python package

Jupyter Notebook Data Science Workflow

Data Loading
Preprocessing
Exploratory Data Analysis (EDA)
Prediction

To (Your Own) Python Package

What is a package and why do we want to use it?
How to create a (minimal) package
How to import and use
Live refactoring examples

Wrap up / Some tips

Publish your awesome package
PyScaffold
VS Code

The speaker

Sin-seok SEO

I am working at Safran as a research engineer. My major responsibility in the company is analyzing data obtained from airplanes and helicopters using various statistical models and machine-learning algorithms. Formerly, I worked at Samsung Electronics in South Korea for 3 years as a senior engineer. At Samsung, I have developed various computer networking related algorithms and software for smartphones and IoT devices to improve user experiences. Before joining Samsung, I finished my Ph.D. at Pohang University of Science and Technology (POSTECH) in South Korea. The theme of my thesis was "Traffic Engineering in Data Center Networks using Software Defined Networking."

← Back to schedule