dnaPipeTE (de-novo assembly & annotation Pipeline for Transposable Elements)
dnaPipeTE
can find, annotate and quantify Transposable Elements in small samples of NGS datasets. [1]
Installation
Use docker:
sudo docker pull clemgoub/dnapipete:latest
Usage
# Start the dnaPipeTE container with an interactive section
sudo docker run -it -v ~/Project:/mnt clemgoub/dnapipete:latest
# Run the dnaPipeTE
python3 dnaPipeTE.py -input /mnt/reads_input.fastq -output /mnt/output -RM_lib ../RepeatMasker/Libraries/RepeatMasker.lib -genome_size 170000000 -genome_coverage 0.1 -sample_number 2 -RM_t 0.2 -cpu 2
According to the original paper [2] for the tool, we need to pre-processing the reads.
For our own data, we have paired-end short reads from Illumina NGS technology. According to [2:1], we can use only the first read of each pair for analysis.
Fun fact
We actually ran the analysis separately using both first read R1 and second reads R2, and the results are different for the same sample.
I thought about merging paired-end reads, but the tool [1:1] mentioned explicitly that
"The input file must be a single-end FASTQ or FASTQ.GZ file of NGS reads. It can be either the R1 or R2 end of a paired-end library. "
But we do need to pre-process the reads to remove mitochondrial DNA to keep TE identification accurate.
Read Pre-processing
Paper [2:2] suggested:
- Quality file: use
FASTX-toolkit
: (http://hannonlab.cshl.edu/fastx_toolkit/) with a minimum 20 average Phred score on 90% of the reads. - Remove reads from mitochondrial DNA with
Bowtie2
by mapping reads to the whole mitochondrail genome sequence for own species on NCBI website.
Our approach:
- Use
fastp
[3] to pre-process QC/adapters/trimming/filtering/splitting/merging - Map reads to mitochondrial DNA
- Extract the reads that are not mapped to mitochondrial DNA
Script
If your FASTA file is indexed (you have a .fai file), use samtools to directly extract the mitochondrial genome.
samtools faidx genome.fasta mitochondrion_genome > mitochondrial_genome.fasta