cubar

Comprehensive Codon Usage Bias Analysis in R

Overview

Codon usage bias refers to the non-uniform usage of synonymous codons (codons that encode the same amino acid) across different organisms, genes, and functional categories. cubar is a comprehensive R package for analyzing codon usage bias in coding sequences. It provides a unified framework for calculating established codon usage metrics, conducting sliding-window analyses or differential usage analyses, and optimizing sequences for heterologous expression.

Features

🧬 Codon-Level Analysis

RSCU calculation: Relative synonymous codon usage analysis
Amino acid usage: Frequency of each amino acid in sequences
Codon weights: Calculate weights based on gene expression, tRNA availability, and mRNA stability
Optimal codon inference: Machine learning-based identification of optimal codons
Codon-anticodon visualization: Visualization of codon-tRNA pairing relationships

📊 Gene-Level Metrics

Codon frequency tabulation: Count codon occurrences across sequences
CAI (Codon Adaptation Index): Measure similarity to highly expressed genes
ENC (Effective Number of Codons): Assess codon usage bias strength
Fop (Fraction of Optimal codons): Calculate proportion of optimal codons
tAI (tRNA Adaptation Index): Match codon usage to tRNA availability
CSCg (Codon Stabilization Coefficients): Quantify mRNA stability effects
Dp (Deviation from Proportionality): Analyze virus-host codon usage relationships
GC content metrics: Overall GC, GC3s (3rd codon positions), GC4d (4-fold degenerate sites)

🛠️ Utilities & Tools

Sliding window analysis: Positional codon usage patterns within genes
Sequence optimization: Redesign sequences for optimal expression
Differential codon usage: Statistical comparison between sequence sets
Quality control: Comprehensive CDS validation and preprocessing

Why Choose cubar?

🚀 High Performance: Process large datasets (>100,000 sequences) efficiently using optimized Biostrings and data.table backends
🧬 Flexible Genetic Codes: Support for all NCBI genetic codes plus custom genetic code tables
🔗 R Ecosystem Integration: Seamlessly integrate with other bioinformatics and data analysis packages
📚 Comprehensive Documentation: Extensive tutorials, examples, and theoretical background
🔬 Research Ready: Implements established metrics with proper citations and validation

Installation

Stable Release (Recommended)

Install the latest stable version from CRAN:

install.packages("cubar")

Development Version

Install the latest development version from GitHub:

# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install cubar from GitHub
devtools::install_github("mt1022/cubar", dependencies = TRUE)

Dependencies

System Requirements:

R (≥ 4.1.0)

Required Packages:

Biostrings (≥ 2.60.0) - Bioconductor package for sequence manipulation
IRanges (≥ 2.34.0) - Bioconductor infrastructure for range operations
data.table (≥ 1.14.0) - High-performance data manipulation
ggplot2 (≥ 3.3.5) - Data visualization
rlang (≥ 0.4.11) - Language tools

Note: Bioconductor packages will be installed automatically, but you may need to update your R installation if you encounter compatibility issues.

Documentation & Tutorials

📖 Complete documentation is available within R (?function_name) and on our package website.

🎯 Getting Started

Introduction to cubar - Basic usage and core functionality
Non-standard Genetic Codes - Working with alternative genetic codes
Codon Optimization - Sequence optimization strategies

📚 Advanced Topics

Mathematical Foundations - Detailed theory behind the metrics
Function Reference - Complete function documentation

Example Workflow

Here's a typical analysis workflow demonstrating key functionality:

library(cubar)
library(ggplot2)

# 1. Load and quality-check sequences
data(yeast_cds)
clean_cds <- check_cds(yeast_cds)

# 2. Calculate codon frequencies
codon_freq <- count_codons(clean_cds)

# 3. Calculate multiple metrics
enc <- get_enc(codon_freq)           # Effective number of codons
gc3s <- get_gc3s(codon_freq)         # GC content at 3rd positions

# 4. Analyze highly expressed genes
data(yeast_exp)
yeast_exp <- yeast_exp[yeast_exp$gene_id %in% rownames(codon_freq), ]
high_expr <- head(yeast_exp[order(-yeast_exp$fpkm), ], 500)
rscu_high <- est_rscu(codon_freq[high_expr$gene_id, ])
cai <- get_cai(codon_freq, rscu_high)

# 5. Visualize results
df <- data.frame(ENC = enc, CAI = cai, GC3s = gc3s)
ggplot(df, aes(color = GC3s, x = ENC, y = CAI)) + 
  geom_point(alpha = 0.6) + 
  scale_color_viridis_c() +
  labs(title = "Codon Usage Bias Relationships",
       x = "Effective Number of Codons", y = "Codon Adaptation Index")

🆘 Getting Help

📋 GitHub Issues: Report bugs, request features, or ask questions
📖 Documentation: Check function help (?function_name) and online docs

Related Packages

For complementary analysis, consider these R packages:

Biostrings - Sequence input/output and manipulation
Peptides - Peptide and protein property calculations

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

GitHub Copilot was used to suggest code snippets during development
GitHub Education for providing free access to development tools
The R and Bioconductor communities for excellent foundational packages
Contributors and users who have provided feedback and improvements

📚 Documentation • 🐛 Report Bug • 💡 Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.github		.github
R		R
data		data
man		man
revdep		revdep
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
cubar.Rproj		cubar.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

cubar

Table of Contents

Overview

Features

🧬 Codon-Level Analysis

📊 Gene-Level Metrics

🛠️ Utilities & Tools

Why Choose cubar?

Installation

Stable Release (Recommended)

Development Version

Dependencies

Documentation & Tutorials

🎯 Getting Started

📚 Advanced Topics

Example Workflow

🆘 Getting Help

Related Packages

License

Acknowledgments

About

Licenses found

Uh oh!

Releases 8

Uh oh!

Contributors 2

Languages

License

Licenses found

mt1022/cubar

Folders and files

Latest commit

History

Repository files navigation

cubar

Table of Contents

Overview

Features

🧬 Codon-Level Analysis

📊 Gene-Level Metrics

🛠️ Utilities & Tools

Why Choose cubar?

Installation

Stable Release (Recommended)

Development Version

Dependencies

Documentation & Tutorials

🎯 Getting Started

📚 Advanced Topics

Example Workflow

🆘 Getting Help

Related Packages

License

Acknowledgments

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 8

Uh oh!

Contributors 2

Languages