Installation¶
The SNP Pipeline software package consists of python scripts with dependencies on executable programs launched by the scripts.
Step 1 - Operating System Requirements¶
The SNP Pipeline runs in a Linux environment. It has been tested on the following platforms:
- Red Hat
- CentOS
- Ubuntu
Step 2 - Executable Software Dependencies¶
You should have the following software installed before using the SNP Pipeline.
Software | Tested Version | Description |
---|---|---|
Bowtie2 | 2.3.4.1 | A tool for aligning reads to long reference sequences |
SMALT | 0.7.6 | A tool for aligning reads to long reference sequences |
SAMtools | 1.8 | Utilities for manipulating alignments in the SAM format |
Picard | 2.18.4 | A set of tools for manipulating sequence data |
GATK | 3.8-1-0 | Variant discovery and genotyping tools |
VarScan | 2.3.9 | A tool to detect variants in NGS data |
tabix | 1.8 | A generic indexer for tab-delimited genome position files |
bgzip | 1.8 | Part of the tabix package, bgzip is a block compression utility |
BcfTools | 1.8 | Utilities for variant calling and manipulating VCFs and BCFs |
Note: the versions above are tested and known to work together. Other versions may also work.
Note: you will need either Bowtie2 or SMALT. You do not have to install both. However, the included result files were generated with Bowtie2. Your results may differ when using SMALT.
Note: Picard is required when removing deplicate reads and when realigning reads around indels. Both of these functions are enabled by default, but can be disabled in the configuration file.
Note: GATK is required when realigning reads around indels, which is enabled by default, but can be disabled in the configuration file.
Step 3 - Environment Variables¶
Define the CLASSPATH environment variable to specify the location of the Picard, VarScan, and GATK jar files. Add the following lines (or something similar) to your .bashrc file:
export CLASSPATH=~/software/varscan.v2.3.9/VarScan.jar:$CLASSPATH
export CLASSPATH=~/software/picard/picard.jar:$CLASSPATH
export CLASSPATH=~/software/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar:$CLASSPATH
Step 4 - Python¶
The SNP pipeline is compatible with python version 2.7, 3.4, 3.5, 3.6 and 3.7. The pipeline has not been tested on other python versions. You can either build from source or install a precompiled version with your Linux package manager.
Step 5 - Pip¶
This can be a troublesome installation step – proceed with caution. The pip tool is used to install python packages including the snp-pipeline and other packages used by the snp-pipeline. Some newer versions of Python include pip. Check to see if pip is already installed:
$ pip -V
If pip is not already installed, proceed as follows:
Download get-pip.py from https://pip.pypa.io/en/latest/installing.html#install-pip
$ python get-pip.py --user
Note: avoid using sudo when installing pip. Some users have experienced problems installing and loading packages when pip is installed using sudo.
Step 6 - Install the SNP Pipeline Python Package¶
There is more than one way to install the SNP Pipeline depending on whether you intend to work with the source code or just run it.
Installation Method 1 for Most Users¶
This is the recommended installation method for new users.
If you want to run the software without viewing or changing the source code, follow the instructions below.
At the command line:
$ pip install --user snp-pipeline
Or, if you have virtualenvwrapper installed:
$ mkvirtualenv snp-pipeline
$ pip install snp-pipeline
Installation Method 2 for Software Developers¶
If you intend to work with the source code in the role of a software developer, you should clone the GitHub repository as described in the Contributing section of this documentation.
Upgrading SNP Pipeline¶
If you previously installed with pip, you can upgrade to the newest version from the command line:
$ pip install --user --upgrade snp-pipeline
Uninstalling SNP Pipeline¶
If you installed with pip, you can uninstall from the command line:
$ pip uninstall snp-pipeline
Tips¶
There is a dependency on the python psutil package. Pip will attempt to install the psutil package automatically when installing snp-pipeline. If it fails with an error message about missing Python.h, you will need to manually install the python-dev package. In Ubuntu, use this command:
$ sudo apt-get install python-dev
You may need to upgrade your Java Runtime Environment (JRE) to run Picard.