Skip to content

jhlee0637/OpenLearn-BioAI

Repository files navigation

OpenLearn-BioAI: Reproducible Bioinformatics & AI Pipeline Collection

License: MIT Python Nextflow Machine Learning

Project Overview

Bionformatics is the OPEN SCIENCE.

The internet is filled with bioinformatics resources - tools, pipelines, and tutorials for studying, practicing, and utilizing computational biology. But here's the problem: most of these valuable resources have a critical flaw - the data is missing.

Why are datasets often unavailable?

  1. The data is too big - Multi-terabyte genomic datasets exceed typical storage
  2. The data is too private - Clinical and sensitive biological information
  3. They just don't want to share it - Proprietary or competitive restrictions

So here, I'm gathering and re-creating bioinformatics hands-on practices with public datasets that you can actually download, run, and analyze. This project bridges the gap between theoretical knowledge and practical implementation, boosting your real-world bioinformatics experience.

Project Name Data Size Technology Stack Status Documentation
Fetal Health Multiple Classification 223KB Python, TensorFlow/PyTorch ✅ Complete README
BrainOmics2024 Variable Python, scikit-learn 🚧 Active README
Nextflow RNA-seq STAR Feature Count Variable Nextflow, STAR, featureCounts ✅ Complete README

Goal

  • Actually runnable: Every workflow includes complete, accessible data
  • End-to-end implementation: From raw data to final analysis
  • Production-ready code: Professional-grade implementations
  • Educational focus: Designed to enhance learning and skill development

About

Bioinformaitcs and AI practice with "public" dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published