Skip to content

nf-core radseq design #1

@remiolsen

Description

@remiolsen

This is built on https://github.com/remiolsen/NGI-RADseqQC

This is a major rewrite to make this pipeline harmonious with nf-core, update the tools used (Stacks 2.0, remove read-joining, etc). Also a few other points that have been on my wishlist for improving usability.

Main tasks

Core pipeline tasks

  • Remove FLASH
  • Make a dockerfile. Is Stacks 2.0 on bioconda?
  • Write a python script to parse denovo stacks to get: coverage, raw # sample loci, catalog loci per sample, "shared" loci histogram. Parse process_radtags also?
  • Make a MultiQC configuration to import this data
  • Get publically available data from ENA. Make proper test data.
  • Make a MultiQC module for Stacks >= 2.0

Polish

  • Make a GH release
  • Documentation, documentation, documentation
  • Travis-CI
  • Python3 support for in silico digest helper script

Others -- Stretch goals

  • Think about what output files stacks should be creating by default.
  • Let the user specify which output files to create -- Nah the defaults are probably fine -- Nuh-uh we need more!
    • genepop
    • structure
  • Scripts for running the Stacks web UI -- It's been removed in v >= 2.0
  • Pick a set of “best practice” parameters for Stacks and run all of these.
  • Clearly report r80 statistic of each run, i.e # of polymorhic loci shared by at least 80% of individuals in the population -- http://doi.org/10.1111/2041-210X.12775
  • Support running Stacks with a reference genome
  • Support for premade population map file
  • Support for already processed reads (skipping trimming and process_reads)
  • Option to not output trimmed and/or processed fastq files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions