GEM - mappability

Yuwei BaoMarch 21, 2023

GEM mappability [1] is a tool to estimate genome mappability and here is a tutorial by Dmytro Kryvokhyzha [2]. Thanks to Nick Riddiford for sharing his insights and scripts [3].

Why do we want to obtain the genome mappable region?

In the paper Fast Computation and Applications of Genome Mappability [1:1], the authors mentioned these three major reasons:

  1. Mappability information can be used as a prior to optimize an HTS experiment then to increase the number of uniquely mappable reads.
  2. Mappability information is essential when producing qualitative estimates.
  3. Mappability information is important when there is a need to re-sequence a particualr genomic region, or to produce quantitative estimates of transcript abundance from RNASeq experiments.

How do we obtain the genome mappable region?

Following the tutorial by Dmytro Kryvokhyzha [2:1], we can get GEM mappability and convert it to BED file.

1. Download GEM library

wget http://barnaserver.com/gemtools/releases/GEMTools-static-i3-1.7.1.tar.gz
tar -xvf GEMTools-static-i3-1.7.1.tar.gz
export PATH=$PATH:/PATH_TO/gemtools-1.7.1-i3/bin

2. Estimate GEM mappability

gem-indexer -T 10 -i reference.genome.fa -o OUT_FOLDER/PREFIX

gem-indexer Parameters

ParameterMeaningOther info
-iinput_file(mandatory)
-oindex_output_prefix(mandatory)
-T --threadsthread_number(for the BWT generator, default=1)
gem-mappability -T 10 -I OUT_FOLDER/PREFIX.gem -l 150 -o OUT_FOLDER/PREFIX_mappability_150

gem-mappability Parameters

ParameterMeaningOther info
-Iindex_prefix(mandatory)
-lread_length(mandatory)
-ooutput_prefix(mandatory)
-Tthread_number(default=1)

The reasons Dmytro use -T 10 and -l 150: "I used a 150bp kmer size because my data was generated with 150bp read length. Also, I run it on 10 cores (-T 10). You can change these options to fit your needs."

3. Convert GEM mappability to BED

gem-2-wig -I OUT_FOLDER/PREFIX.gem -i OUT_FOLDER/PREFIX_mappability_150.gem.mappability -o OUT_FOLDER/PREFIX_mappability

gem-2-wig Parameters

ParameterMeaningOther info
-I --indexarchive_file(mandatory)
-l --inputmappability_file(mandatory)
-ooutput_prefix(mandatory)

4. Merge overlapping intervals in BED

Dmytro wrote a python script [4] to merge overlapping mappability intervals.

python ~/git/genotype-files-manipulations/combine_overlapping_BEDintervals.py -i OUT_FOLDER/PREFIX_mappability_150.bed -o OUT_FOLDER/PREFIX_mappability_150.merged.bed -v 0

where -v defines the overhang size between intervals.

For more details, please visit Dmytro Kryvokhyzha's post: [2:2]

Summary script

#!/bin/bash
# Input files
ref=reference.genome.fa
out=PATH_TO_OUTPUT
pre=dml6

# Personalize parameters: threads (N) and read length (RL)
N=10
RL=150

gem-indexer -T $N -i $ref -o $out/$pre
gem-mappability -T $N -I $out/$pre.gem -l $RL -o $out/$pre\_mappability\_$RL
gem-2-wig -I $out/$pre.gem -i $out/$pre\_mappability\_$RL.gem.mappability -o $out/$pre\_mappability

  1. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377open in new window ↩︎ ↩︎

  2. https://evodify.com/gem-mappability/open in new window ↩︎ ↩︎ ↩︎

  3. https://github.com/nriddiford/Investigating-structural-variation-in-cancer-genomes/blob/master/GenomeMappability.mdopen in new window ↩︎

  4. https://github.com/evodify/genotype-files-manipulations/blob/master/combine_overlapping_BEDintervals.pyopen in new window ↩︎