Developing Sunbeam Extensions
Sunbeam extensions allow you to add new features to the pipeline. You might use an extension to carry out a new type of analysis, or to produce a report from Sunbeam output files. Extensions can be re-distributed and installed by other researchers to facilitate reproducible analysis.
This page will go deep into the weeds on how to develop extensions for Sunbeam. If you’re looking for how to install/run them, check out Sunbeam Extensions. Make sure you’ve read the Developing Sunbeam page first so that you have a good grounding in the internals of Sunbeam. And the more you’re familiar with Snakemake and Python, the better (both have great documentation).
Starting a new extension
A Sunbeam extension consists of a directory that contains a rules file. The rules file must have a name that ends with the file extension “.smk”. It is customary for the name of the extension to start with “sbx_”, which is shorthand for “Sunbeam extension.” Technically, only the rules file is needed for a Sunbeam extension. In practice, almost all extensions include other files to facilitate the installation of software dependencies, to specify parameters in the configuration file, and to give instructions to users.
Using sbx_template
We’ve created a GitHub template for creating new Sunbeam extensions easily. To use the template, go to the sbx_template repo and click the Use this template button in the top right. Once you’ve created the new repository, wait a minute for the CI to update all the names and links, then clone the repository to your computer. You can then edit the files in the repository to create your extension.
Writing the rules file
The rules file contains the code for one or more rules in Snakemake format. These rules describe how to take files produced by Sunbeam and do something more with them. When Sunbeam is run, the Snakemake workflow management system determines how to put the rules together in the right order to produce the desired result.
Example: An extension for AwesomeTool
Say we have a tool we want to include in Sunbeam called AwesomeTool, that provides strain level taxonomic classification of every read and then provides a human-readable report of how interesting each one is in descending order of interest. (unfortunately, AwesomeTool doesn’t exist outside of this example). AwesomeTool has a command line interface (CLI) that takes gzipped fastq files and a database file as input, and produces a gzipped output file. The command looks like:
We’ll start our development effort by updating the rules file (this assumes you’re using sbx_template; if not, follow along with those files in mind). We want to remove the dummy rules from the template sbx_awesome_tool.smk and replace them with our own:
This gives us our target (all_awesome_tool) and the rule that runs the program (run_awesome_tool). The target rule is a list of all the output files we expect to produce. The runner rule describes how to run the program, including the input files, output files, and command to run. You might notice that there are a few things here we will need to define to supplement these rules. For starters, we need to define the awesome_db option in the config. To do this, we will modify the config.yml file:
We also need to specify the environment that AwesomeTool is installed in. To do this, we will modify envs/sbx_awesome_tool_env.yml:
And with that, we have a working extension! You can run it with the command:
We can go check out our output files in /path/to/project/sunbeam_output/classify/awesome.
Example: A reproducible report
After running sbx_awesome_tool a while, you might realize you’re compiling the same standard report over and over again. To address this, we can create a new rule that produces a summary report on the results of AwesomeTool on each sample. To do this, we will add a new rule to our rules file and adjust our target rule:
We choose R because we’re familiar with it (but you should choose what you’re familiar with). The script command points to a file scripts/awesome_report.R that we now have to make:
Note how the snakemake rule variables are accessed through the injected snakemake object. Access patterns differ accross languages but this is the general idea. After running the pipeline, we can check out our report in /path/to/project/sunbeam_output/classify/awesome/awesome_report.html.
Example: Complicated use case
AwesomeTool is great, but now we realize the standard CLI doesn’t offer enough flexibility for what we want to do. We’ll have to make a custom script to run it, importing some of the internals from the AwesomeTool package. To do this, we will need to modify our runner rule:
And now we make the script:
Testing the extension
Testing is a tricky thing with Snakemake. Because it is a workflow language, most of the code we write is just putting pieces together. So if we took the orthodox functional programming approach, we would test that the pipeline compiles (i.e. that we can run a dryrun without errors) and then claim that it will work because we can trust all the pieces to be tested themselves. So often we will start our testing with a dryrun:
Tip
Adding the --directory flag pointing to a temp directory is a good idea for running these tests in a non-CI environment. This will prevent a bunch of snakemake garbage from being left behind in your working directory and make the tests more reliable.
This is nice, but in practice, we dont’ really trust the components of our pipeline enough to stop here. If we have the ability to create a small enough test case to run the whole extension, that is an ideal case for testing everything easily. For example, here we can take our small set of synthetic reads in .tests/data/reads and a small dummy database we created with AwesomeTool’s awesome database building utiltity in .tests/data/db. We can then run the pipeline on these small files and check that the output is what we expect. This is a bit of a pain to set up, but it is worth it in the end. We can do this by adding this to our test file:
If you don’t have the ability to run all the rules of your extension in a CI environment (some tools just won’t work on that small a dataset), you’ll have to get more creative. You can test individual rules, explore Snakemake’s unit testing framework, or test script logic directly.
Tip
If you have enough custom logic that it needs testing, consider moving it to an extension library. This will allow you to test the logic separately from Snakemake’s machinations. For instance, make a Python function for filtering reads if they have less than 20 A’s. We would start by writing the function in the extension library, then calling it from the Snakemake script file (passing the snakemake input, output, etc objects), then testing the function.
The extension comes with a .github directory that defines CI workflows for automatically running tests.
Releasing the extension
Releases for Sunbeam extensions are pretty mellow. We don’t have a proper repository or enforced release structure. You could never release your extension and it would still work fine. But if you want to maintain good versioning practices and developer hygiene, you can use GitHub’s release system and update the version number in VERSION.