From Jupyter Notebooks to a Python Package: The Best of Both Worlds
- Level:
- beginner
- Room:
- terrace 2a
- Start:
- Duration:
- 45 minutes
Abstract
A Jupyter notebook is quite handy for rapid REPL (Read-Eval-Print-Loop) style tasks such as exploratory data analysis and data science. However, we would feel deficiencies in proper SW engineering supports at some point as the notebook grows to have larger and more complicated code. It is because the Jupyter notebook lacks several important features including code sharing, refactoring support, version control and advanced editing. Fortunately, traditional full-fledged IDEs, such as VS Code or PyCharm, are available at hand and they support these lacking features very well. Then, why don’t we take advantage of the best of both worlds?
In this beginner-level hands-on talk, I will demonstrate how to transform Jupyter notebook workflows to a proper Python package using VS Code. I will also introduce several basic but essential refactoring recommendations. By doing so, you can use the package for several notebooks and even share with your colleagues and friends.
Description
Notebooks, code and slides are available in this repo.
Introduction
- Jupyter Notebook
- Provides ideal workflows for data science
- Pros: REPL, interactivity, integration of code / output / documentation, visualization, rapid prototyping, result sharing, etc.
- Cons: lacks of debugging, code sharing, refactoring, version control, advanced editing, etc.
- Full-fledged IDEs
- Designed to maximize programmer productivity
- One iteration might take a long journey
- We can benefit from the best of both worlds by using a Python package
Jupyter Notebook Data Science Workflow
- Data Loading
- Preprocessing
- Exploratory Data Analysis (EDA)
- Prediction
To (Your Own) Python Package
- What is a package and why do we want to use it?
- How to create a (minimal) package
- How to import and use
- Live refactoring examples
Wrap up / Some tips
- Publish your awesome package
- PyScaffold
- VS Code