Use svTools/svTyper to generate PON for identifying SV
svtools [1] are the comprehensive utilities to explore structural variations in genomes [2]
Pythin version
svtools requires Python 2.7.
Installation [3]
My personal preference is always using docker images [^docker]. But for experiment purpose, I want to use svtools (v0.3.0) using pip install
Failure
Attempt 1 failed using pip
conda activate py27_svtools
pip install svtools==0.3.0
svtools --version
There is error ImportError: No module named abc
Clean workspace
Uninstall the product from previous attempt if needed
pip uninstall svtools
Attemp 2 failed using direct installation from the git repo
git clone https://github.com/hall-lab/svtools.git svtools_test
cd svtools_test
git tag -l
git checkout tags/v0.3.0
pip install .
svtools --version
Even it shows Successfully installed svtools-0.3.0 but the version check still fails with the same error as previous attempt.
Attemp 3 failed using github release tarball
wget https://github.com/hall-lab/svtools/archive/refs/tags/v0.3.0.tar.gz
tar -xvzf v0.3.0.tar.gz
cd svtools-0.3.0/
pip install .
svtools --version
Even it shows Successfully installed svtools-0.3.0 but the version check still fails with the same error as previous attempt.
Succuss
Attempt 4: Just use docker image
I have not found a way to work around it to get svtools (v0.3.0) to run properly, so I will ignore the version and just use docker images of the latest version of svtools (v0.5.1). [^docker]
docker pull halllab/svtools:v0.5.1
docker run -v halllab/svtools:v0.5.1 svtools --help
It works. docker won another score in my heart. Let's move on.
Usage
svtools [-h] [--version] [--support] subcommand ...
svtools vcfpaste [-h] -f <FILE> [-m <VCF>] [-t <DIR>] [-q]
Use svTools to created a panel of normals (PON)
# 1. Prepare vcf files
tabix sample.vcf
# 2. Merge normal VCF files:
svtools vcfpaste sample1.vcf.gz sample2.vcf.gz sample3.vcf.gz -o merged_normals.vcf
# 3. Create a PON using the merged VCF file
svtools lsort merged_normals.vcf -o sorted_merged_normals.vcf
svtools lmerge sorted_merged_normals.vcf -i 50 -d 0.5 -o PON.vcf
# 4. Filter your tumor sample VCF file using the PON:
svtools afreq PON.vcf tumor_sample.vcf -o filtered_tumor_sample.vcf
Parameters
Here, -i 50 sets the minimum number of supporting evidence (like paired-end reads or split reads) for a variant to be included in the PON, and -d 0.5 sets the minimum allelic fraction for a variant to be included. You can adjust these parameters according to your needs. This command will create a PON file named PON.vcf.
SVTyper
SVTyper can compute genotype of structural variants based on breakpoint depth.
Pythin version
SVTyper requires Python 2.7.
Installation
pip install git+https://github.com/hall-lab/svtyper.git
Usage
- Using
normal.bamto callnormal.SV.vcf
svtyper -i $input.vcf -T $REF -B $normal -o $out/$NO.vcf
- Merge all
normal.SV.vcf
bcftools merge <path/to/normal_sample1.vcf> <path/to/normal_sample2.vcf> ... <path/to/normal_sampleN.vcf> \
-o <path/to/pon.vcf> \
-m all \
--threads <number of threads>
Now I realized that I am missing input.vcf, which is the product of a structural variant caller. It is werid that to use Delly, we want to have PON in hand. While to produce PON using svTyper, we need the results produced by SV callers such as Delly or Lumpy.