Quickstart Guide¶
Installation¶
There are two installation methods available, installing via git or via tar. We do not currently support non-Linux environments.
On a Linux machine, download the tarball for the sunbeam version you want (sunbeamX.X.X
)
then unpack and install it.
wget https://github.com/sunbeam-labs/sunbeam/releases/download/v4.0.0/sunbeam.tar.gz
mkdir sunbeam4.0.0
tar -zxf sunbeam4.0.0.tar.gz -C sunbeam4.0.0
cd sunbeam4.0.0 && ./install.sh
On a Linux machine, download a copy of Sunbeam from our GitHub repository, and install.
git clone --branch v4.0.0 https://github.com/sunbeam-labs/sunbeam.git
cd sunbeam
./install.sh
Tip
If you’re planning on doing development work on sunbeam, use ‘git clone git@github.com:sunbeam-labs/sunbeam.git’ instead.
This installs Sunbeam and all its dependencies, including the Conda environment manager, if required. It will finish by printing instructions to continue that should look like:
conda activate ENV_NAME
python -m pytest tests/
This runs some tests to make sure everything was installed correctly.
Tip
If you’ve never installed Conda before, you’ll need to add it to your shell’s path. If you’re running Bash (the most common terminal shell), the installation script should print the necessary command.
If the tests fail, check out our troubleshooting section or file an issue on our GitHub page.
Setup¶
Let’s say your sequencing reads live in a folder called
/sequencing/project/reads
, with one or two files per sample (for single- and
paired-end sequencing, respectively). These files must be in gzipped FASTQ
format.
Let’s create a new Sunbeam project (we’ll call it my_project
):
source activate ENV_NAME
sunbeam init my_project --data_fp /sequencing/project/reads
Sunbeam will create a new folder called my_project
and put three files
there:
config.yaml
contains a `snakemake profile<https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles>`_ that will be used to runmy_project
.sunbeam_config.yml
contains all the configuration parameters for each step of the Sunbeam pipeline.samples.csv
is a comma-separated list of samples that Sunbeam found in the given data folder, along with absolute paths to their FASTQ files.
Right now we have everything we need to do basic quality-control. However, let’s go ahead and set up contaminant filtering to make things interesting.
Contaminant filtering¶
Sunbeam can align your reads to an arbitrary number of contaminant sequences or host genomes and remove reads that map above a given threshold.
To use this, make a folder containing all the target sequences in FASTA
format. The filenames should end in “fasta” to be recognized by Sunbeam. In your sunbeam_config.yml
file, edit the host_fp:
line in the qc
section to point to this folder.
Running¶
After you’ve finished editing your config file, you’re ready to run Sunbeam:
sunbeam run --profile my_project/
By default, this will do a lot, including trimming and quality-controlling your
reads and removing contaminant, host, and low-complexity sequences. Each of these steps can also be run independently by adding arguments after the sunbeam run
command. See Running for more info.
Viewing results¶
The output is stored by default under my_project/sunbeam_output
. For more information on the output files and all of Sunbeam’s different parts, see our full User Guide!