Robust and Reproducible Data Science Code

Welcome to the Robust and Reproducible Data Science Code repository! This project is designed as part of a class on writing clean, maintainable, and reproducible data science code. The repository demonstrates best practices for structuring data science projects, implementing reusable components, and ensuring reproducibility.

Overview

The goal of this project is to teach foundational concepts in writing robust data science code. The repository includes examples and tools to help you learn and apply these concepts in your own projects.

Key topics covered include:

Writing clean and maintainable code using object-oriented programming (OOP).
Designing reusable components with abstract base classes (ABC).
Ensuring type safety with tools like beartype.
Using configuration-driven workflows with Hydra.
Setting up reproducible environments with hatch.

Repository Structure

.
├── src/
│   ├── postgrad_class/
│   │   ├── conf/               # Configuration files for Hydra
│   │   ├── model/              # Model implementations (e.g., LinearModel, SimpleNNModel)
│   │   ├── notebooks/          # Jupyter notebooks for examples and exercises
│   │   ├── __about__.py        # Project metadata
│   │   ├── __init__.py         # Package initialization
│   │   └── main.py             # Entry point for running models
├── tests/                      # Unit tests for the project
├── pyproject.toml              # Project configuration for Hatch
├── LICENSE                     # License information
├── README.md                   # Project documentation
└── .gitignore                  # Git ignore rules

Getting Started

Prerequisites

To run this project, you need:

Python 3.8 or later
hatch for managing the project environment and dependencies (installation isntructions for Hatch).
- If you're used to conda, an easy way to install hatch is to create a separate conda env with your desired python version and install hatch there. You can then run the hatch commands below inside this environment.

Installation

Clone the repository:

git clone https://github.com/BojeDeforce/postgrad-class.git
cd postgrad-class

Set up the environment using hatch:
```
hatch env create
hatch shell
```

Examples

The src/postgrad_class/notebooks/example.py notebook (in py:percent format) introduces key concepts and provides hands-on examples. Open it in Jupyter Notebook.

Running Jupyter Notebooks on a Notebook Server with Hatch

Jupyter notebooks run on a notebook server, which is a Python process that provides a web interface to interactively write, run, and visualize code. This server handles kernel management, file I/O, and communication between the front-end (UI) and the back-end (kernel) over HTTP. The actual computation happens in a kernel — typically a Python interpreter — that executes code sent from the notebook interface.

Launching a Jupyter Notebook Server with Hatch

You can launch a notebook server within your active Hatch environment (see step 2 above) by running:

jupyter notebook --no-browser

This will start the Jupyter server and print a URL with a token, for example:

http://127.0.0.1:8888/?token=your-token-here

You have now two main options:

In your browser Simply paste the provided URL into your browser to access the classic Jupyter notebook interface.
In VSCode or another IDE

Open the Command Palette and select:
"Jupyter: Specify local or remote Jupyter server"
Paste the URL with the token.
Now, when you open a .ipynb file, VSCode will connect to the running kernel on that server.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Created by Boje Deforce. For questions or feedback, feel free to reach out via GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robust and Reproducible Data Science Code

Overview

Repository Structure

Getting Started

Prerequisites

Installation

Examples

Running Jupyter Notebooks on a Notebook Server with Hatch

Launching a Jupyter Notebook Server with Hatch

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/postgrad_class		src/postgrad_class
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
writing_robust_ds_code.pdf		writing_robust_ds_code.pdf

License

B-Deforce/robust_ds_code

Folders and files

Latest commit

History

Repository files navigation

Robust and Reproducible Data Science Code

Overview

Repository Structure

Getting Started

Prerequisites

Installation

Examples

Running Jupyter Notebooks on a Notebook Server with Hatch

Launching a Jupyter Notebook Server with Hatch

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages