Population Genomics Methods Development

Developing computational tools and genomic technologies for evolutionary biology

Overview

The Cresko Laboratory has been at the forefront of developing genomic tools and computational methods that have transformed evolutionary biology research. Our innovations have enabled cost-effective, high-resolution genomic studies in both model and non-model organisms, democratizing population genomics research worldwide.

RAD-seq Technology Development

Revolutionary Innovation

Restriction site Associated DNA sequencing (RAD-seq), developed through a collaboration between the Cresko and Johnson labs at the University of Oregon, has fundamentally changed how we study population genomics:

Core Advantages

Genome-wide coverage: Thousands to millions of markers across the genome
De novo discovery: No requirement for prior genomic resources
Cost effectiveness: Orders of magnitude cheaper than whole-genome sequencing
Scalability: From individuals to thousands of samples
Flexibility: Adjustable marker density for different applications

Stacks Software Pipeline

Comprehensive Analysis Platform

Stacks, our flagship software package, provides an integrated pipeline for RAD-seq data analysis:

Core Functionality

De novo assembly: Building loci without a reference genome
Reference alignment: Mapping reads to existing genome assemblies
SNP discovery: Identifying and genotyping polymorphic sites
Population statistics: Calculating FST, π, and other metrics
Data export: Multiple output formats for downstream analyses

Visit the Stacks website →

Algorithm Development

Key algorithmic innovations in Stacks include:

Assembly Algorithms

Maximum likelihood framework for locus assembly
Error correction for sequencing mistakes
Paralogy detection and filtering
Coverage-based genotype calling
Population-aware genotyping models

Statistical Methods

Kernel-smoothed FST calculations
Sliding window analyses
Bootstrap confidence intervals
Correction for multiple testing
Effective population size estimation

Community Impact

Stacks is now used by thousands of researchers worldwide: - 10,000+ citations: Widely adopted across diverse fields - Active user community: Support forum and mailing list - Regular updates: Continuous improvement and bug fixes - Comprehensive documentation: Tutorials, protocols, and examples - Workshop training: Regular courses on RAD-seq analysis

New Computational Approaches

Machine Learning Applications

We’re pioneering the use of machine learning in population genomics:

Deep Learning Models

Neural networks for genotype imputation
Convolutional networks for selection detection
Recurrent networks for demographic inference
Autoencoders for dimensionality reduction
Generative models for simulation

Applications

Predicting adaptive potential
Classifying selection signatures
Inferring population structure
Detecting hybridization
Identifying technical artifacts

Graph-based Methods

Moving beyond linear reference genomes:

Pangenome Approaches

Graph construction from population data
Variation graphs for complex regions
Structural variant detection
Haplotype-resolved assembly
Population-specific references

Network Analyses

Gene regulatory network inference
Epistatic interaction networks
Population connectivity graphs
Phylogenetic networks
Coexpression networks

Long-read Sequencing Integration

Hybrid Approaches

Combining short and long-read technologies:

Method Development

Linked-read RAD-seq protocols
Long-read validation of RAD loci
Phasing using long-read data
Structural variant discovery
Gap filling in genome assemblies

Computational Tools

Read simulation and validation
Hybrid assembly pipelines
Error correction algorithms
Variant calling methods
Visualization tools

Single-cell Genomics Methods

Population-level Single-cell Studies

Extending population genomics to the cellular level:

Technical Development

Single-cell RAD-seq protocols
Cell-type-specific genotyping
Allele-specific expression in single cells
Somatic mutation detection
Lineage tracing methods

Analytical Frameworks

Population genetics of somatic variation
Cell-type evolution models
Developmental trajectory inference
Integration with bulk sequencing
Multi-modal data integration

Environmental DNA Methods

eDNA for Population Genomics

Developing methods for population studies using environmental samples:

Protocol Development

Species-specific eDNA capture
Population-level eDNA genotyping
Quantitative eDNA methods
Temporal sampling strategies
Spatial distribution mapping

Computational Approaches

Allele frequency estimation from eDNA
Population size inference
Community composition analysis
Contamination detection
Quality control metrics

Software Ecosystem

Additional Tools Developed

Beyond Stacks, we’ve created numerous specialized tools:

Population Genetics Software

PopGen Pipeline: Automated population genetics workflows
Selection Scanner: Genome-wide selection detection
Demography Inferencer: Coalescent-based demographic modeling
Hybrid Detector: Identifying admixed individuals
Phylo Constructor: Phylogenetic tree building from RAD data

Visualization Tools

RAD Viewer: Interactive exploration of RAD loci
Population Structure Plotter: Advanced STRUCTURE-like plots
Genome Scanner: Sliding window visualization
Network Visualizer: Population connectivity graphics
Tree Annotator: Phylogenetic tree decoration

Best Practices and Standards

Method Standardization

We contribute to community standards for genomic methods:

Protocol Guidelines

Sample collection and preservation
DNA extraction optimization
Library preparation standards
Sequencing depth recommendations
Replication and validation strategies

Analytical Standards

Quality filtering thresholds
Statistical significance criteria
Multiple testing corrections
Reproducibility requirements
Data sharing formats

Training and Education

Capacity Building

We’re committed to training the next generation:

Educational Resources

Online tutorials and videos
Comprehensive user manuals
Example datasets and workflows
Troubleshooting guides
FAQ compilations

Workshops and Courses

Annual RAD-seq workshops
Bioinformatics boot camps
Software-specific training
Custom institutional workshops
Virtual learning modules

Current Development Projects

Active Software Projects

Cloud-based RAD-seq processing
Real-time data quality monitoring
Automated parameter optimization
Integration with Galaxy platform
Mobile app for field data collection

Method Innovation

Ultra-low input protocols
Ancient DNA applications
Methylation-sensitive RAD-seq
Chromatin accessibility mapping
Expression-RAD integration

Collaborations

Our methods development involves:

Computer scientists and software engineers
Statisticians and mathematicians
Molecular biologists and biochemists
Field biologists and ecologists
Conservation practitioners

Impact and Applications

Our methods have been applied to:

Conservation genetics of endangered species
Agricultural crop improvement
Fisheries stock assessment
Human disease gene mapping
Microbial community analysis
Ancient DNA studies
Forensic genetics
Invasion biology

Future Directions

Ongoing and planned developments include:

Quantum computing applications
Blockchain for data integrity
Federated learning approaches
Automated experimental design
Real-time sequencing analysis

Learn More

For more information about our methods development: - Download Stacks → - Contact us →