Quickstart Guide

Installation

Sunbeam is a Python package that can be installed in a variety of ways.

Tip

The core of Sunbeam is written in Snakemake which means it might not behave well if you’re trying to integrate it into a larger pipeline. Instead consider making a Sunbeam extension for what you need.

From the PyPi repository (Python 3.11+ required):

python -m venv --python=python3.13 sunbeam_env/
source sunbeam_env/bin/activate
pip install sunbeamlib

sunbeam -h

Tip

Refer to the examples page for lots of walkthroughs of common Sunbeam use cases.

Project Initialization

Let’s say your sequencing reads live in a folder called /sequencing/project/reads, with one or two files per sample (for single- and paired-end sequencing, respectively). These files must be in gzipped FASTQ format (.fastq.gz).

Let’s create a new Sunbeam project (we’ll call it my_project):

Tip

These commands pick up where the installation instructions left off. If you’re in a virtual environment, you should still be in it. If you’re using Docker, you should have already run the container and be inside it.

Using Conda to manage worker environments and keeping all compute local:

sunbeam init my_project --data_fp /sequencing/project/reads

Tip

Snakemake has a number of different options for environment managers, compute services, and storage backends. See docs on executor and storage plugins for more information. And remember that you have to install the relevant plugin before you can run it.

Sunbeam will create a new folder called my_project and put three files there:

  • config.yaml contains a snakemake profile that will be used to run my_project.

  • sunbeam_config.yml contains all the configuration parameters for each step of the Sunbeam pipeline.

  • samples.csv is a comma-separated list of samples that Sunbeam found in the given data folder, along with absolute paths to their FASTQ files.

Right now we have everything we need to do basic quality-control. However, let’s go ahead and set up contaminant filtering to make things interesting.

Contaminant filtering

Sunbeam can align your reads to an arbitrary number of contaminant sequences or host genomes and remove reads that map above a given threshold.

To use this, make a folder containing all the target sequences in FASTA format. The filenames should end in .fasta to be recognized by Sunbeam. In your sunbeam_config.yml file, edit the host_fp: line in the qc section to point to this folder.

Running the Pipeline

Tip

If you installed Sunbeam using Pip, you will need to have either Conda or Apptainer/Singularity installed to run the pipeline, depending on your choice of dependency manager (conda is the default).

After you’ve finished editing your config file, you’re ready to run Sunbeam:

In most cases (Standard, Slurm, Apptainer/Singularity from the Init instructions), you can run the pipeline with:

sunbeam run --profile my_project/

By default, this will do a lot, including trimming and quality-controlling your reads and removing contaminant, host, and low-complexity sequences.

Viewing Results

The output is stored under my_project/sunbeam_output. QCed and decontaminated reads are in my_project/sunbeam_output/qc/decontam/.

Extending the Pipeline

See the Sunbeam Extensions page for instructions on how to add extensions to your Sunbeam project.