User Guide¶
Requirements¶
A relatively-recent Linux computer with more than 2Gb of RAM
We do not currently support Windows or Mac. (You can run this on Windows using the Ubuntu [WSL](https://docs.microsoft.com/en-us/windows/wsl/about)).
Installation¶
Sunbeam has two options for installation, either with git or with tar. For development work on sunbeam, use git. For standard usage, installing each version of sunbeam that you need from tarballs into separate directories is recommended (i.e. if you want versions 3 and 4 installed, you would repeat the tar install process below for sunbeam3.1.1 and sunbeam4.0.0 (or whatever specific versions you want)).
On a Linux machine, download the tarball for the sunbeam version you want (sunbeamX.X.X
)
then unpack and install it.
wget https://github.com/sunbeam-labs/sunbeam/releases/download/v4.0.0/sunbeam.tar.gz
mkdir sunbeam4.0.0
tar -zxf sunbeam4.0.0.tar.gz -C sunbeam4.0.0
cd sunbeam4.0.0 && ./install.sh
On a Linux machine, download a copy of Sunbeam from our GitHub repository, and install.
git clone --branch v4.0.0 https://github.com/sunbeam-labs/sunbeam.git
cd sunbeam
./install.sh
Tip
If you’re planning on doing development work on sunbeam, use ‘git clone git@github.com:sunbeam-labs/sunbeam.git’ instead.
The installer will check for and install the three components necessary for Sunbeam to work. The first is Conda, a system for downloading and managing software environments. The second is the Sunbeam environment, which will contain all the core dependencies. The third is the Sunbeam library, which provides the necessary commands to run Sunbeam.
If you don’t have Conda installed prior to this, you will need to add a line
(displayed during install) to your config file (usually in ~/.bashrc
or
~/.profile
). Restart your terminal after installation for this to take
effect.
Testing¶
We’ve included tests that should verify all the dependencies are installed and Sunbeam can run properly. We strongly recommend running this after installing or updating Sunbeam:
python -m pytest tests/ -vvl
If the tests fail, you should either refer to our troubleshooting_ guide or file an issue on our Github page.
Tip
You can speed up the testing process by using the environment created during the install process with something like this ‘bash tests/run_tests.bash -e SUNBEAM_ENV_NAME’. Without this argument the script will create a temporary environment.
Updating¶
Sunbeam follows semantic versioning practices. In short, this means that the version has three numbers: major, minor and patch. For instance, a version number of 1.2.1 has 1 as the major version, 2 as the minor, and 1 as the patch.
When we update Sunbeam, if your config files and environment will work between upgrades, we will increment the patch or minor numbers (e.g. 1.0.0 -> 1.1.0). All you need to do is the following:
git pull
./install.sh --upgrade all
Sunbeam v3+ is designed to be installable separately on a system that already has sunbeam installed. This means multiple versions of sunbeam can be installed on the same machine in different repositories.
Uninstalling or reinstalling¶
If things go awry and updating doesn’t work, simply uninstall and reinstall Sunbeam.
source deactivate conda remove -n sunbeamX.X.X --all cd ../ && rm -rf sunbeam/
Then follow the installation instructions above.
Installing Sunbeam extensions¶
As of version 3.0, Sunbeam extensions can be installed by running sunbeam extend
followed by the URL of the extension’s GitHub repo:
sunbeam extend https://github.com/sunbeam-labs/sbx_kaiju/
For Sunbeam versions prior to 3.0, follow the legacy installation instructions on the extension to install.
Setup¶
Activating Sunbeam¶
Almost all commands from this point forward require us to activate the Sunbeam conda environment:
source activate SUNBEAM_ENV_NAME
You should see ‘(SUNBEAM_ENV_NAME)’ in your prompt when you’re in the environment. To leave
the environment, run source deactivate
or close the terminal.
Tip
You can see a list of installed sunbeam environments using the command ‘conda env list’.
Creating a new project using local data¶
We provide a utility, sunbeam init
, to create a new config file, profile, and sample
list for a project. The utility takes one required argument: a path to your
project folder. This folder will be created if it doesn’t exist. You can also
specify the path to your gzipped fastq files, and Sunbeam will try to guess how
your samples are named, and whether they’re paired.
sunbeam init --data_fp /path/to/fastq/files /path/to/my_project
In this directory, a new config file and a new sample list were created (by
default named sunbeam_config.yml
and samplelist.csv
, respectively) as well as a
profile file (named config.yaml
). Edit
the config and profile files in your favorite text editor. All the keys for the config are
described below.
Note
Sunbeam will do its best to determine how your samples are named in the
data_fp
you specify. It assumes they are named something regular, like
MP66_S109_L008_R1.fastq.gz
and MP66_S109_L008_R2.fastq.gz
. In
this case, the sample name would be ‘MP66_S109_L008’ and the read pair
indicator would be ‘1’ and ‘2’. Thus, the filename format would look like
{sample}_R{rp}.fastq.gz
, where {sample} defines the sample name and
{rp} defines the 1 or 2 in the read pair.
If you have single-end reads, you can pass --single_end
to sunbeam
init
and it will not try to identify read pairs.
If the guessing doesn’t work as expected, you can manually specify the
filename format after the --format
option in sunbeam init
.
Finally, if you don’t have your data ready yet, simply omit the --data_fp
option. You can create a sample list later with sunbeam list_samples > samples.csv
.
If some config values are always the same for all projects (e.g. paths to shared
databases), you can put these keys in a file and auto-populate your config file
with them during initialization. For instance, if you have a custom trimmomatic adapter template
located at /home/user/adapter.fa
, you could have a file containing the
following called common_values.yml
:
qc:
adapter_template: "/home/user/adapter.fa"
When you make a new Sunbeam project, use the --defaults common_values.yml
as
part of the init command.
If you have Sunbeam extensions installed, in Sunbeam >= 3.0, the extension config
options will be automatically included in new config files generated by
sunbeam init
.
If you want to customize options in the profile instead, you can create a custom profile
template named sunbeamlib/data/custom_profile.yaml
and fill it with whatever options you
want included in each sunbeam run. Snakemake has a curated list of common profiles
here for working with HPC platforms and job schedulers.
A default and a slurm profile are included by default. You would use this custom profile with
--profile custom
as part of the init command.
Further usage information is available by typing sunbeam init --help
.
Configuration¶
Sunbeam has lots of configuration options, but most don’t need individual attention. Below, each is described by section.
Sections¶
all¶
root
: The root project folder, used to resolve any relative paths in the rest of the config file.output_fp
: Path to where the Sunbeam outputs will be stored.samplelist_fp
: Path to a comma-separated file where each row contains a sample name and one or two paths (if single- or paired-end) to raw gzipped fastq files. This can be created for you bysunbeam init
orsunbeam list_samples
.paired_end
: ‘true’ or ‘false’ depending on whether you are using paired- or single-end reads.version
: Automatically added for you bysunbeam init
. Ensures compatibility with the right version of Sunbeam.
qc¶
suffix
: the name of the subfolder to hold outputs from the quality-control stepsleading
: (trimmomatic) remove the leading bases of a read if below this qualitytrailing
: (trimmomatic) remove the trailing bases of a read if below this qualityslidingwindow
: (trimmomatic) the [width, avg. quality] of the sliding windowminlength
: (trimmomatic) drop reads smaller than this lengthadapter_template
: (trimmomatic) path to the Illumina paired-end adaptors (templated with$CONDA_ENV
) (autofilled)fwd_adapters
: (cutadapt) custom forward adaptor sequences to remove using cutadapt. Replace with""
to skip.rev_adapters
: (cutadapt) custom reverse adaptor sequences to remove using cutadapt. Replace with""
to skip.cutadapt_opts
: (cutadapt) options to pass to cutadapt. Replace with""
to pass no extra options.kz_threshold
: a value between 0 and 1 to determine the low-complexity boundary (1 is most stringent). Ignored if not masking low-complexity sequences.host_fp
: the path to the folder with host/contaminant genomes (ending in *.fasta)
classify¶
suffix
: the name of the subfolder to hold outputs from the taxonomic classification steps
assembly¶
suffix
: the name of the folder to hold outputs from the assembly steps
annotation¶
suffix
: the name of the folder to hold contig annotation results
blastdbs¶
root_fp
: path to a directory containing BLAST databases (if they’re all in the same place)
mapping¶
suffix
: the name of the subfolder to create for mapping output (bam files, etc)
benchmarks¶
suffix
: the name of the subfolder to create for benchmark data
logs¶
suffix
: the name of the subfolder to create for logs
Building Databases¶
A detailed discussion on building databases for tools used by Sunbeam, while important, is beyond the scope of this document. Please see the following resources for more details:
Tip
These were all moved to extensions in sunbeam v4. Some vestiges remain in the main pipeline for compatibility with extensions but these should be considered deprecated and will be removed in future versions.
Running¶
To run Sunbeam, make sure you’ve activated the sunbeam environment. Then run:
sunbeam run --profile path/to/project/
There are many options that you can use to determine which outputs you want. By default, if nothing is specified, this runs the entire pipeline. However, each section is broken up into subsections that can be called individually, and will only execute the steps necessary to get their outputs. These are specified after the command above and consist of the following:
all_qc
: basic quality control on all reads (no host read removal)all_decontam
: quality control and host read removal on all samples
To use one of these options, simply run it like so:
sunbeam run --profile path/to/project/ all_qc
In addition, since Sunbeam is really just a set of snakemake rules, all the (many) snakemake options apply here as well. Some useful ones are:
-n
performs a dry run, and will just list which rules are going to be executed without actually doing so.-k
allows the workflow to continue with unrelated rules if one produces an error (useful for malformed samples).-p
prints the actual shell command executed for each rule, which is very helpful for debugging purposes.--cores
specifies the total number of cores used by Sunbeam. For example, if you run Sunbeam with--cores 100
and each rule/processing step uses 20 threads, it will run 5 rules at once.
Cluster options¶
Sunbeam inherits its cluster abilities from Snakemake. There’s nothing special
about installing Sunbeam on a cluster, but in order to distribute work to
cluster nodes, you have to use the --cluster
and --jobs
flags. This is
handled by using a cluster profile instead of the default. Sunbeam comes with a
slurm profile template but you can create others or use existing ones from
here. Once you’ve initialized a
project with a cluster profile, run it as normal:
sunbeam run --profile /path/to/cluster/project/
Edit any options set in the profile as if they are snakemake command line arguments.
Outputs¶
This section describes all the outputs from Sunbeam. Here is an example output directory.
├ sunbeam_output
├ logs
└ qc
├ cleaned
├ decontam
├ log
│ ├ decontam
│ ├ cutadapt
│ └ trimmomatic
└ reports
Quality control¶
└ qc
├ 00_samples
├ 01_cutadapt
├ 02_trimmomatic
├ 03_komplexity
├ cleaned
├ decontam
├ log
│ ├ decontam
│ ├ komplexity
└ reports
This folder contains the trimmed, low-complexity filtered reads in
cleaned
. The decontam
folder contains the cleaned reads that did not map
to any contaminant or host genomes. In general, most downstream steps should reference the decontam
reads.