Software Structure¶
Overview¶
Sunbeam is a snakemake pipeline with a python library acting as a wrapper (sunbeamlib
).
Calling sunbeam run [args] [options]
is a call to this wrapper library
which then invokes the necessary snakemake commands. The main Snakefile can be
found in the root directory and it makes use of rules from rules/
and
extensions/
, scripts from scripts/
, and environments from envs/
. Tests
are run with pytest and live in the test/
directory. Documentation lives in docs/
and is
served by ReadTheDocs.
Tip
Some of these sections won’t exist if you install via tar.
Sections¶
sunbeam/ (root directory)¶
The root sunbeam directory holds a few important files including
environment.yml
, setup.py
, and install.sh
. The environment file defines
the dependencies required to run sunbeam and is used to create the main sunbeam
environment. The setup file defines the structure and dependencies of the
sunbeamlib and makes it installable via pip. The install script is used to install
sunbeam and has its own page in the documentation.
Tip
environment.yml
defines the main sunbeam environment that you activate in
order to run the pipeline. Internally, sunbeam then manages a number of
other environments (defined in envs) on a per-rule basis.
There is also .readthedocs.yaml
, which sets up the Sphinx build of the documentation
to be able to import sunbeamlib, and MANIFEST.in
, which tells sunbeamlib to include
the data/
subdirectory while installing.
docs/¶
Each page of the sunbeam documentation is here in the form of a .rst
file.
The additional files are all involved in the setup and deployment of the docs
to ReadTheDocs using Sphinx. Most of these are autogenerated by Sphinx. The one
bit of trickiness comes from importing the version of sunbeam into the docs
build. This is done in conf.py
by adding the sunbeam root to sys.path
and
then importing sunbeamlib
which stores the version tag in a __version__
variable using semantic_version
.
workflow/envs/¶
This directory contains .yml
files defining environments that will be managed
by snakemake as it runs. Anywhere that a rule is defined with
conda: /path/to/ENV_NAME.yml
, when snakemake reaches that rule, that
environment will be created if it doesn’t exist already and then activated
while running the rule. These environments are created in sunbeam/.snakemake/
by default.
The accompanying files named something like ENV_NAME.ARCH.pin.txt
are generated
with snakedeploy
. They list all the packages and exact versions in a given
environment (and for the architecture they were generated on, e.g. linux-64) so that
snakemake can first try to use that exact environment and only if it fails, try to
solve the .yml
file for itself.
extensions/¶
This directory will contain any extensions you install with sunbeam extend
or
any extensions that you develop as well as a .placeholder
file that is just
there to make sure the directory always exists. Any extensions should be in
their own directories that start with sbx_
.
workflow/rules/¶
This directory contains all of the snakemake rules that get imported by the
main Snakefile
. The rules are organized into subdirectories by function and
each subdirectory has an associated environment to run its rules in envs/
.
workflow/scripts/¶
This directory contains any python code that needs to be executed by snakemake rules. Again they are organized into subdirectories to match function and each is named according to the rule that calls it.
sunbeamlib/¶
This directory contains the python library that acts as a wrapper for
snakemake. The python files in the root contain a number of utility functions
whiles those in scripts/
define the commands for sunbeam.
scripts/command.py
takes in sunbeam [cmd]
and then routes it to the file
matching the given command. The data/
directory contains the default config
file as well as some sample config templates for running on a cluster. It also
contains the default profile template and one for slurm.
tests/¶
This directory contains the tests for the core sunbeam pipeline. Under data/
are raw, shortened bacterial genomes and host genomes used for generating the
reads used as input. e2e/
contains end-to-end tests for each sunbeam
programm: config, extend, init, list_samples, and run. unit/
contains unit
tests broken into two sections, rules/
, which tests each rule in the
pipeline individually, and sunbeamlib
, which tests functions within
sunbeamlib.