Getting Started¶

This guide will walk you through running your first analysis with Anatalyst.

Basic Usage¶

Anatalyst is designed to be run from a configuration file that defines the modules to use and their parameters. The basic workflow is:

Create a configuration file
Run the pipeline
Review the results

Creating a Configuration File¶

Create a YAML file that defines your pipeline. Here's a minimal example:

pipeline:
  name: my_first_pipeline
  output_dir: ./output
  r_memory_limit_gb: 8
  figure_defaults:
    width: 8
    height: 6

modules:
  - name: data_loading
    type: DataLoading
    params:
      file_path: /path/to/your/filtered_feature_bc_matrix.h5

  - name: qc_metrics
    type: QCMetrics
    params:
      mito_pattern: "^MT-"
      ribo_pattern: "^RP[SL]"

  - name: report_generator
    type: ReportGenerator

Save this as minimal_config.yaml.

Running the Pipeline¶

Execute the pipeline using the run_pipeline.py script:

python -m /workspace/scripts/run_pipeline.py --config minimal_config.yaml

This will:

Load your data
Calculate QC metrics
Generate a report in the output directory

Example with a Complete Analysis¶

Here's a more comprehensive example configuration for a full analysis workflow:

pipeline:
  name: complete_analysis
  output_dir: ./output
  r_memory_limit_gb: 8
  figure_defaults:
    width: 8
    height: 6
  checkpointing:
    enabled: true
    modules_to_checkpoint: all
    max_checkpoints: 5

modules:
  - name: data_loading
    type: DataLoading
    params:
      file_path: /path/to/filtered_feature_bc_matrix.h5

  - name: pre_qc
    type: QCMetrics
    params:
      mito_pattern: "^MT-"
      ribo_pattern: "^RP[SL]"
      create_plots: true

  - name: ambient_removal
    type: AmbientRNARemoval
    params:
      raw_counts_path: /path/to/raw_feature_bc_matrix.h5
      filtered_counts_path: /path/to/filtered_feature_bc_matrix.h5
      ndims: 30
      resolution: 0.8

  - name: doublet_detection
    type: DoubletDetection
    params:
      expected_doublet_rate: 0.05

  - name: filtering
    type: Filtering
    params:
      filters:
        n_genes_by_counts:
          type: numeric
        pct_counts_mt:
          type: numeric
        predicted_doublet:
          type: boolean

  - name: post_qc
    type: QCMetrics
    params:
      mito_pattern: "^MT-"
      ribo_pattern: "^RP[SL]"

  - name: normalization
    type: PearsonNormalization
    params:
      n_top_genes: 2000

  - name: dim_reduction
    type: DimensionalityReduction
    params:
      n_pcs: 50
      compute_umap: true
      compute_tsne: true

  - name: report_generator
    type: ReportGenerator
    params:
      generate_html: true

Save this as complete_config.yaml and run it as before:

python -m /workspace/scripts/run_pipeline.py --config complete_config.yaml

Resuming from a Checkpoint¶

If your pipeline fails or stops for any reason, you can resume from the last checkpoint:

python -m /workspace/scripts/run_pipeline.py --config complete_config.yaml --checkpoint doublet_detection

This will resume the pipeline from after the "doublet_detection" module.

Viewing the Results¶

After running the pipeline, check the output directory:

output/
├── checkpoints/          # Pipeline checkpoints
├── images/               # Generated figures
├── analysis_report.md    # Markdown report
└── analysis_report.html  # HTML report

Open the HTML report in a web browser to view the analysis results, including visualizations and parameter settings.

Next Steps¶

Explore the Configuration documentation to learn about all available options
Browse the Modules section to understand each analysis step in detail
Check out the Examples for more use cases and sample analyses