1. Introduction¶
1.1. Description¶
Sentieon Genomics software is a set of software tools that perform analysis of genomic data obtained from DNA sequencing.
1.2. Benefits and Value¶
The Sentieon Genomics software enables rapid and accurate analysis of next-generation sequence data. The Sentieon DNAseq pipelines enable germline variant calling on hundreds of thousands of samples simultaneously. The Sentieon TNseq pipelines enable accurate calling of somatic variants in paired tumor-normal samples or in an unpaired tumor samples. The Sentieon Genomics software produces more accurate results than other tools on third-party benchmarks with results obtained in one tenth the time of comparable pipelines.
1.3. Platform Requirements¶
Sentieon Genomics software is designed to run on Linux and other POSIX-compatible platforms.
For Linux systems, we recommended using the following Linux distributions or higher: RedHat/CentOS 6.5, Debian 7.7, OpenSUSE-13.2, or Ubuntu-14.04. Sentieon Genomics software has not been tested on older releases, and may not work.
For Apple OSX systems, OSX 10.9 (Mavericks) or higher is recommended.
Sentieon Genomics software requires at least 16GB of memory, although we recommend using 64 GB of memory. The alignment step is typically the most memory consuming stage; you can check section Section 7.3.3 for ways to control the memory used in alignment.
Sentieon Genomics software processes file data at speeds around 20-30MB/s per core using a state of the art server. In order to take advantage of the maximum speed that the software can provide, we recommend that you use high-speed SSD hard drives in your system, preferably two identical drives in a high speed RAID 0 striped volume configuration. When using a RAID 0 configuration, it is advised to use the drives to only store intermediate files, and move out any final results or important information out of those hard drives; this way, your server should have additional storage space to store the results.
Sentieon Genomics software requires that your system have python2.6.x or python2.7.x. You can check the version of the default python in your system by running:
python --version
If python2.6.x or python2.7.x is not the default python version in your system, you will need to install it and set an environmental variable to tell the software where the python2.6.x or python2.7.x binary is located
export SENTIEON_PYTHON=Python_2_X_binary_location
1.4. Installation procedure for a Linux Based system¶
In order to install the Sentieon Genomics software in a Linux based system, you need to follow these installation instructions:
- Obtain the software release package. The software release package is named sentieon-genomics-201911.tar.gz, where 201911 is the release version.
- Obtain the license file from Sentieon. The license file has extension .lic. The Sentieon license control is based on a lightweight floating license server process running on a node, and serving licenses though TCP. In a HPC cluster, the license server process is typically running in a special non-computing node on the cluster periphery that has unrestricted access to the outside world through HTTPS, and serves the licenses to the rest of the nodes in the cluster by listening to a specific TCP port that needs to be open within the cluster. The license server needs to have external https access to validate the license, while the computing client nodes do not need to have access to the internet. You need to provide Sentieon with the hostname (FQDN or IP) and port where you plan on deploying the license server.
- Upload the software release package and license file to your server.
- Decompress the release package with the command below. This will create a folder named sentieon-genomics-201911.
tar xvzf sentieon-genomics-201911.tar.gz
- Copy the license file to the directory that stores the license files
(
LICENSE_DIR
). If you are running a license server process, start the license server.
<SENTIEON_FOLDER>/bin/sentieon licsrvr --start LICENSE_DIR/LICENSE_FILE.lic
Set an environment variable in your system to indicate where the license is located (either
LICSRVR_HOST:LICSRVR_PORT
if you are using a license server orLICENSE_DIR/LICENSE_FILE.lic
if you are using a license file)We recommend that you add this line to the shell scripts you use to run the Sentieon software, or to your shell profile:
#if you are using a license server export SENTIEON_LICENSE=LICSRVR_HOST:LICSRVR_PORT #if you are using a license file export SENTIEON_LICENSE=LICENSE_DIR/LICENSE_FILE.lic
1.5. Installation procedure for an Apple OSX system¶
In order to install the Sentieon Genomics software in a Mac OSX based system, you need to follow these installation instructions:
- Obtain the software release package. The software release package is named mac-sentieon-genomics-201911.tar.gz, where 201911 is the release version.
- Obtain the license file. The license file has extension .lic.
- Upload the software release package and license file to your computer.
- Decompress the release package with the command below. This will create a folder named mac-sentieon-genomics-201911.
tar xvzf mac-sentieon-genomics-201911.tar.gz
- Copy the license file to the directory that stores the license files
(
LICENSE_DIR
). - Set the environment variable in your system to indicate where the
license is located (
LICENSE_DIR/LICENSE_FILE.lic
). We recommend that you add these lines to you shell profile~/.bash_profile
:
export SENTIEON_LICENSE=LICENSE_DIR/LICENSE_FILE.lic
1.5.1. Using the software on an Apple OSX system¶
When using Sentieon Genomics software on a laptop using Apple OSX we recommend that you disable the energy saving capabilities of your computer, or temporarily change them while running commands.
To disable the energy saving capabilities of your computer:
- Open the system preferences.
- Select Energy Saver.
- Make sure you check the option to “Prevent computer from sleeping automatically when display is off”, as shown in Fig. 1.1.
Alternatively, you can temporarily disable the energy saving capabilities of your computer by running the command from a Terminal:
pmset noidle
1.6. Useful environmental variables when using the software¶
When using Sentieon Genomics software there are certain environmental variables that may be useful or are required:
- SENTIEON_LICENSE: tells the software where the license is located. This variable is required.
#if you are using a license server export SENTIEON_LICENSE=LICSRVR_HOST:LICSRVR_PORT #if you are using a license file export SENTIEON_LICENSE=LICENSE_DIR/LICENSE_FILE.lic
- SENTIEON_PYTHON: tells the software the location of the python binary. This variable is useful when python2.6.x or python2.7.x is not the default python version in your system.
export SENTIEON_PYTHON=Python_2_X_binary_location
- SENTIEON_TMPDIR: tells the software where to store the intermediate temporary files. This variable is recommended and should point to a fast storage location to prevent the software from being I/O limited; this is specially important when using a slow NFS storage for input/output. When using this variable, the software will create a unique temporary subfolder in SENTIEON_TMPDIR for each command, which will reduce contention if multiple commands output to the same folder.
export SENTIEON_TMPDIR=/local_fast_scratch
- Environmental settings to use jemalloc memory allocation: use the settings bellow to have the software use jemalloc instead of the standard OS memory allocation library. This is recommended in systems with large memory, or large number of CPUs (more than 32 vCPU). In the code below SENTIEON_INSTALL_DIR points to the location of the Sentieon software package.
export LD_PRELOAD=$SENTIEON_INSTALL_DIR/lib/libjemalloc.so.1 export MALLOC_CONF=lg_dirty_mult:-1
1.7. Keywords¶
Bioinformatics, DNA sequencing, Genomics.