1. Introduction¶
1.1. Description¶
Sentieon® Genomics software is a set of software tools that perform analysis of genomic data obtained from DNA sequencing.
1.2. Benefits and Value¶
The Sentieon® Genomics software enables rapid and accurate analysis of next-generation sequence data. The Sentieon® DNAseq pipelines enable germline variant calling on hundreds of thousands of samples simultaneously. The Sentieon® TNseq pipelines enable accurate calling of somatic variants in paired tumor-normal samples or in an unpaired tumor samples. The Sentieon® Genomics software produces more accurate results than other tools on third-party benchmarks with results obtained in one tenth the time of comparable pipelines.
1.3. Platform Requirements¶
Sentieon® Genomics software is designed to run on Linux and other POSIX-compatible platforms.
For Linux systems, we recommended using the following Linux distributions or higher: RedHat/CentOS 6.5, Debian 7.7, OpenSUSE-13.2, or Ubuntu-14.04.
Sentieon® Genomics software requires at least 16GB of memory, although we recommend using 64 GB of memory. The alignment step is typically the most memory consuming stage; you can check section Section 7.2.3 for ways to control the memory used in alignment.
Sentieon® Genomics software processes file data at speeds around 20-30MB/s per core using a state of the art server. In order to take advantage of the maximum speed that the software can provide, we recommend that you use high-speed SSD hard drives in your system, preferably two identical drives in a high speed RAID 0 striped volume configuration. When using a RAID 0 configuration, it is advised to use the drives to only store intermediate files, and move out any final results or important information out of those hard drives; this way, your server should have additional storage space to store the results.
Sentieon® Genomics software requires that your system have either python2.6.x, python2.7.x, or python3.x. You can check the version of the default python in your system by running:
python --version
If neither python2.6.x, python2.7.x, or python3.x is not the default python version in your system, you will need to install it and set an environmental variable to tell the software where the python binary is located
export SENTIEON_PYTHON=Python_binary_location
1.4. Installation procedure for a Linux Based system¶
In order to install the Sentieon® Genomics software in a Linux based system, you need to follow these installation instructions:
- Obtain the software release package. The software release package is named sentieon-genomics-202308.03.tar.gz, where 202308.03 is the release version.
- Obtain the license file from Sentieon®. The license file has extension .lic. The Sentieon® license control is based on a lightweight floating license server process running on a node, and serving licenses though TCP. In a HPC cluster, the license server process is typically running in a special non-computing node on the cluster periphery that has unrestricted access to the outside world through HTTPS, and serves the licenses to the rest of the nodes in the cluster by listening to a specific TCP port that needs to be open within the cluster. The license server needs to have external https access to validate the license, while the computing client nodes do not need to have access to the internet. You need to provide Sentieon® with the hostname (FQDN or IP) and port where you plan on deploying the license server.
- Upload the software release package and license file to your server.
- Decompress the release package with the command below. This will create a folder named sentieon-genomics-202308.03.
tar xvzf sentieon-genomics-202308.03.tar.gz
- Copy the license file to the directory that stores the license files
(
LICENSE_DIR
). If you are running a license server process, start the license server.
<SENTIEON_FOLDER>/bin/sentieon licsrvr --start LICENSE_DIR/LICENSE_FILE.lic
Set an environment variable in your system to indicate where the license is located (either
LICSRVR_HOST:LICSRVR_PORT
if you are using a license server orLICENSE_DIR/LICENSE_FILE.lic
if you are using a license file)We recommend that you add this line to the shell scripts you use to run the Sentieon® software, or to your shell profile:
#if you are using a license server export SENTIEON_LICENSE=LICSRVR_HOST:LICSRVR_PORT #if you are using a license file export SENTIEON_LICENSE=LICENSE_DIR/LICENSE_FILE.lic
1.5. Useful environmental variables when using the software¶
When using Sentieon® Genomics software there are certain environmental variables that may be useful or are required:
- SENTIEON_LICENSE: tells the software where the license is located. This variable is required.
#if you are using a license server export SENTIEON_LICENSE=LICSRVR_HOST:LICSRVR_PORT #if you are using a license file export SENTIEON_LICENSE=LICENSE_DIR/LICENSE_FILE.lic
- SENTIEON_PYTHON: tells the software the location of the python binary. This variable is useful when python2.6.x, python2.7.x, or python3.x is not the default python version in your system.
export SENTIEON_PYTHON=Python_binary_location
- SENTIEON_TMPDIR: tells the software where to store the intermediate temporary files. This variable is recommended and should point to a fast storage location to prevent the software from being I/O limited; this is specially important when using a slow NFS storage for input/output. When using this variable, the software will create a unique temporary subfolder in SENTIEON_TMPDIR for each command, which will reduce contention if multiple commands output to the same folder.
export SENTIEON_TMPDIR=/local_fast_scratch
- Environmental settings to use jemalloc memory allocation: this is recommended in systems with large memory, or a large number of CPUs (more than 32 vCPU). Please check the appnote in https://support.sentieon.com/appnotes/jemalloc/ for more information.
1.6. Keywords¶
Bioinformatics, DNA sequencing, Genomics.