NYSCF Automated Deep Phenotyping Dataset (ADPD) The New York Stem Cell Foundation Research Institute (NYSCF), in collaboration with Google Accelerated Science has built a high-throughput, high-content Cell Painting–based phenotyping platform that combines advanced and scalable cell culture automation with cutting-edge deep learning algorithms. Great care was taken to minimize experimental confounds in the study design, including choice and source of cell lines, automated cell handling and assaying including cell expansion, freeze-down, thawing, seeding and Cell Painting. An important goal of this work was to achieve sufficient reproducibility that analysis and cross-validation could be performed across plates, plate layouts and, most importantly, batches, to create a technical foundation for large-scale, population-based phenotypic profiling and drug screening.
We chose to test our platform on primary fibroblasts from subjects with Parkinson’s disease (PD) and demographically matched healthy controls. Using deep learning in parallel with automated Cell Painting analysis, we were able to confidently separate PD (both sporadic and LRRK2, ROC AUC 0.79 (0.08 standard deviation (SD)) from healthy controls, demonstrating the potential use for this platform for unbiased PD disease modeling and drug discovery. Furthermore, our platform was able to successfully identify a cell line within a cohort of 96 total lines with 91% mean accuracy (6% SD), across batches and plate layouts, demonstrating the robustness of our screening platform and revealing the presence of surprisingly strong individual signatures. Full details of our platform and study are summarized in: [article link]. As part of the publication, the entire dataset of raw and processed images along with example code for reproducing our findings and a near real-time image analysis Fiji macro has been made available to the scientific community, which are detailed below. To our knowledge, this is the largest publicly available Cell Painting–based high-content imaging data set in the world. We have made available the necessary code to recapitulate key findings of our paper, including the PD vs. healthy classification and cell-line classification, as well as sample code for generating deep embeddings from microscopy images and the CellProfiler pipeline.
The dataset is available through our website: https://nyscf.org/open-source/nyscf-adpd/
The ADPD Dataset © 2021 by NYSCF is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/