Filter bam using gatk
March 21, 2023
All the following information is from gatk (v4.2.6.1)
[1]
[2]
1. MarkDuplicates (Picard): Identifies duplicate readsjava -jar picard.jar MarkDuplicates \
-I input.bam \
-O marked_duplicates.bam \
-M marked_dup_metrics.txt \
--REMOVE_DUPLICATES TRUE \
Parameters
marked_dup_metrics.txt
was created to store duplication metrics.--REMOVE_DUPLICATES
is optional. If true do not write duplicates to the output file instead of writing them with appropriate flags set.
[3]
2. AddOrReplaceReadGroups (Picard): Assigns all the reads in a file to a single new read-group.java -jar picard.jar AddOrReplaceReadGroups \
I=input.bam \
O=output.bam \
RGID=4 \
RGLB=lib1 \
RGPL=ILLUMINA \
RGPU=unit1 \
RGSM=read_group_sample_name
[4]
3. BaseRecalibrator: Generates recalibration table for Base Quality Score Recalibration (BQSR)gatk BaseRecalibrator \
-I my_reads.bam \
-R reference.fasta \
--known-sites sites_of_variation.vcf \
--known-sites another/optional/setOfSitesToMask.vcf \
-O recal_data.table
Parameters
--known-sites
: One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.
[5]
4. ApplyBQSR: Apply base quality score recalibrationgatk ApplyBQSR \
-R reference.fasta \
-I input.bam \
--bqsr-recal-file recalibration.table \
-O output.bam
[6]
5. BuildBamIndex (Picard): Generates a BAM index ".bai" filejava -jar picard.jar BuildBamIndex \
I=input.bam
https://gatk.broadinstitute.org/hc/en-us/articles/5358824293659--Tool-Documentation-Index ↩︎
https://gatk.broadinstitute.org/hc/en-us/articles/5358880192027-MarkDuplicates-Picard- ↩︎
https://gatk.broadinstitute.org/hc/en-us/articles/5358911906459-AddOrReplaceReadGroups-Picard- ↩︎
https://gatk.broadinstitute.org/hc/en-us/articles/5358896138011-BaseRecalibrator ↩︎
https://gatk.broadinstitute.org/hc/en-us/articles/5358826654875-ApplyBQSR ↩︎
https://gatk.broadinstitute.org/hc/en-us/articles/5358886012443-BuildBamIndex-Picard- ↩︎