6. Snakemake

This tutorial was developed assuming a unix-like architecture (Ubuntu 14.04).

6.1. Introduction

6.1.1. Snakemake concepts

  • Inspired by GNU Make: system of rules & targets
  • A rule is the recipe for a target
  • Rules are combined by matching their inputs and outputs

6.1.2. Installation

sudo apt-get -y install python3-pip
sudo pip3 install snakemake

6.2. Downloads for practical exercises

6.2.1. Ubuntu libraries

sudo apt-get -y install zlib1g-dev                          # samtools (1-6)
sudo apt-get -y install libncurses5-dev libncursesw5-dev    # samtools (1-6)

sudo apt-get -y install r-base-core                         # Rsamtools (4-6)
sudo pip3 install "rpy2<2.5.6"                              # Rsamtools (4-6)

sudo pip3 install pyyaml                                    # Config management (5-6)

6.2.2. Tuto material

wget https://github.com/rioualen/SnakeChunks/archive/1.0.tar.gz
tar xvzf 1.0.tar.gz
cd SnakeChunks-1.0/doc/snakemake_tutorial

6.2.3. Samtools

wget -nc http://sourceforge.net/projects/samtools/files/samtools/1.3/samtools-1.3.tar.bz2
bunzip2 -f samtools-1.3.tar.bz2
tar xvf samtools-1.3.tar
cd samtools-1.3
make
sudo make install
cd SnakeChunks-1.0/doc/snakemake_tutorial

6.2.4. Rsamtools

R
source("http://bioconductor.org/biocLite.R")
biocLite("Rsamtools")
quit()``

6.3. Demo workflows

6.3.1. Workflow 1: Rules and targets

  • Only the first rule is executed by default
  • Rule all defines the target
  • Rule sam_to_bam automatically produces the target
# file: workflow1.py
rule all:
    input: "GSM521934.bam"

rule sam_to_bam:
    input: "GSM521934.sam"
    output: "GSM521934.bam"
    shell: "samtools view {input} > {output}"

In the terminal:

snakemake -s workflow1/workflow1.py

6.3.2. Workflow 2: Introducing wildcards

  • Wildcards can replace variables
  • Workflow applies to list of files or samples
  • Use of the expand function
# file: workflow2.py
SAMPLES = ["GSM521934", "GSM521935"]

rule all:
    input: expand("{sample}.bam", sample = SAMPLES)

rule sam_to_bam:
    input: "{file}.sam"
    output: "{file}.bam"
    shell: "samtools view {input} > {output}"

In the terminal:

snakemake -s workflow2/workflow2.py

6.3.3. Workflow 3: Keywords

  • Rules can use a variety of keywords
  • An exhaustive list can be found here
# file: workflow3.py
SAMPLES = ["GSM521934", "GSM521935"]

rule all:
    input: expand("{sample}.bam", sample = SAMPLES)

rule sam_to_bam:
    input: "{file}.sam"
    output: "{file}.bam"
    params:
        threads = 2 log: "{file}.log"
    benchmark: "{file}.json"
    shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"

In the terminal:

snakemake -s workflow3/workflow3.py

6.3.4. Workflow 4: Combining rules

  • Dependencies are handled implicitly, by matching filenames
  • Commands can be executed by keywords run or shell
  • Several languages: R, bash, python
# file: workflow4.py
from snakemake.utils
import R

SAMPLES = ["GSM521934", "GSM521935"]

rule all:
    input: expand("{sample}_sorted.bam", sample = SAMPLES)

rule sam_to_bam:
    input: "{file}.sam"
    output: "{file}.bam"
    params:
        threads = 2
    log: "{file}.log"
    benchmark: "{file}.json"
    shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"

rule bam_sorted:
    input: "{file}.bam"
    output: "{file}_sorted.bam"
    run:
        R("""
        library(Rsamtools)
        sortBam("{input}", "{output}")
        """)

In the terminal:

snakemake -s workflow4/workflow4.py

6.3.5. Workflow 5: Configuration file

  • Can be in json or in yml format
  • Acessible through the global variable config
# file: workflow5.py
from snakemake.utils
import R

configfile: "config.yml"

SAMPLES = config["samples"].split()
OUTDIR = config["outdir"]

rule all:
    input: expand(OUTDIR + "{sample}_sorted.bam", sample = SAMPLES)

rule sam_to_bam:
    input: "{file}.sam"
    output: "{file}.bam"
    params:
        threads = config["samtools"]["threads"]
    log: "{file}.log"
    benchmark: "{file}.json"
    shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"

rule bam_sorted:
    input: "{file}.bam"
    output: "{file}_sorted.bam"
    run:
        R("""
        library(Rsamtools)
        sortBam("{input}", "{output}")
        """)
# file: config.yml
samples: "GSM521934 GSM521935"
outdir: "SnakeChunks-1.0/doc/snakemake_tutorial/results/"
samtools:
    threads: "2"

In the terminal:

snakemake -s workflow5/workflow5.py

6.3.6. Workflow 6: Separated files

  • The keyword include is used to import rules
# file: workflow6.py

from snakemake.utils
import R

configfile: "config.yml"

SAMPLES = config["samples"].split()
OUTDIR = config["outdir"]

include: "sam_to_bam.rules"
include: "bam_sorted.rules"

rule all:
    input: expand(OUTDIR + "{sample}_sorted.bam", sample = SAMPLES)
# file: sam_to_bam.rules

rule sam_to_bam:
    input: "{file}.sam"
    output: "{file}.bam"
    params:
        threads = config["samtools"]["threads"]
    log: "{file}.log"
    benchmark: "{file}.json"
    shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"
# file: bam_sorted.rules

rule bam_sorted:
    input: "{file}.bam"
    output: "{file}_sorted.bam"
    run:
        R("""
        library(Rsamtools)
        sortBam("{input}", "{output}")
        """)

In the terminal:

snakemake -s workflow6/workflow6.py

6.3.7. Workflow 7: The keyword Ruleorder todo

6.3.8. Workflow 8: Combining wildcards with zip

6.3.9. Workflow 9: Combining wildcards selectively

6.3.10. Workflow 10: Using regular expression in wildcards

6.3.11. Other

  • temp()
  • touch()
  • target/all

6.4. Bonus: generating flowcharts

snakemake -s workflow6/workflow6.py --dag | dot -Tpng -o d.png
snakemake -s workflow6/workflow6.py --rulegraph | dot -Tpng -o r.png

include img

6.5. More on snakemake…

6.5.1. Documentation

6.5.2. Installation

apt-get install python3-pip
pip3 install snakemake

6.5.3. Reference

Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.