Introduction. SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM aims to be a format that:

$ java -jar-Xmx4g snpEff.jar -v hg19 SRR000982.filtered.variants.vcf > SRR000982.filtered.variants.annotated.vcf

GATK CombineVariants, see: CombineVariants - Combine variant records from different sources. From the above link usage examples: Merge two separate callsets. java -jar GenomeAnalysisTK.jar \ -T CombineVariants \ -R reference.fasta \ --variant input1.vcf \ --variant input2.vcf \ -o output.vcf \ -genotypeMergeOptions UNIQUIFY. Get the union of calls made on the same samples. Now it is time to import the vcf files into a genomics database. This is a database format that is developed by gatk, but we don't actually have to know any more about it. To run to the next step we first need to create a file with the sample names in it. Create a text file called samples.txt and put the following contents into it. • Recalibration using GATK. CountCovariates is multithreaded at the time of writing this document, but TableRecalibration is not. So, only part of the process is affected by an increase in threads. Users can speed up recalibration to some extent by increasing GATK_THREADS. • VCF Calls by GATK.

After a VCF-merge, read a VCF, look back at some BAMS to tells if the missing genotypes were homozygotes-ref or not-called. If the number of reads is greater than min.depth, then a missing genotype is said hom-ref. Usage