Getting Started¶
This guide will walk you through running your first analysis with Anatalyst.
Basic Usage¶
Anatalyst is designed to be run from a configuration file that defines the modules to use and their parameters. The basic workflow is:
- Create a configuration file
- Run the pipeline
- Review the results
Creating a Configuration File¶
Create a YAML file that defines your pipeline. Here's a minimal example:
pipeline:
name: my_first_pipeline
output_dir: ./output
r_memory_limit_gb: 8
figure_defaults:
width: 8
height: 6
modules:
- name: data_loading
type: DataLoading
params:
file_path: /path/to/your/filtered_feature_bc_matrix.h5
- name: qc_metrics
type: QCMetrics
params:
mito_pattern: "^MT-"
ribo_pattern: "^RP[SL]"
- name: report_generator
type: ReportGenerator
Save this as minimal_config.yaml
.
Running the Pipeline¶
Execute the pipeline using the run_pipeline.py
script:
This will:
- Load your data
- Calculate QC metrics
- Generate a report in the output directory
Example with a Complete Analysis¶
Here's a more comprehensive example configuration for a full analysis workflow:
pipeline:
name: complete_analysis
output_dir: ./output
r_memory_limit_gb: 8
figure_defaults:
width: 8
height: 6
checkpointing:
enabled: true
modules_to_checkpoint: all
max_checkpoints: 5
modules:
- name: data_loading
type: DataLoading
params:
file_path: /path/to/filtered_feature_bc_matrix.h5
- name: pre_qc
type: QCMetrics
params:
mito_pattern: "^MT-"
ribo_pattern: "^RP[SL]"
create_plots: true
- name: ambient_removal
type: AmbientRNARemoval
params:
raw_counts_path: /path/to/raw_feature_bc_matrix.h5
filtered_counts_path: /path/to/filtered_feature_bc_matrix.h5
ndims: 30
resolution: 0.8
- name: doublet_detection
type: DoubletDetection
params:
expected_doublet_rate: 0.05
- name: filtering
type: Filtering
params:
filters:
n_genes_by_counts:
type: numeric
pct_counts_mt:
type: numeric
predicted_doublet:
type: boolean
- name: post_qc
type: QCMetrics
params:
mito_pattern: "^MT-"
ribo_pattern: "^RP[SL]"
- name: normalization
type: PearsonNormalization
params:
n_top_genes: 2000
- name: dim_reduction
type: DimensionalityReduction
params:
n_pcs: 50
compute_umap: true
compute_tsne: true
- name: report_generator
type: ReportGenerator
params:
generate_html: true
Save this as complete_config.yaml
and run it as before:
Resuming from a Checkpoint¶
If your pipeline fails or stops for any reason, you can resume from the last checkpoint:
python -m /workspace/scripts/run_pipeline.py --config complete_config.yaml --checkpoint doublet_detection
This will resume the pipeline from after the "doublet_detection" module.
Viewing the Results¶
After running the pipeline, check the output directory:
output/
├── checkpoints/ # Pipeline checkpoints
├── images/ # Generated figures
├── analysis_report.md # Markdown report
└── analysis_report.html # HTML report
Open the HTML report in a web browser to view the analysis results, including visualizations and parameter settings.
Next Steps¶
- Explore the Configuration documentation to learn about all available options
- Browse the Modules section to understand each analysis step in detail
- Check out the Examples for more use cases and sample analyses