人生几何?

Variant Call Format

2021-10-19  本文已影响0人  吴十三和小可爱的札记

INFO fields

Name Brief description
AC allele count in genotypes, for each ALT allele, in the same order as listed
AF allele frequency for each ALT allele in the same order as listed (use this when estimated from primary data, not called genotypes)
AN total number of alleles in called genotypes

example 1

example 2

Genotype fields

GT : genotype, encoded as allele values separated by either of / or |. The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on.

For diploid calls examples could be 0/1, 1 | 0, or 1/2, etc.

- 0/0 : the sample is homozygous reference
- 0/1 : the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles
- 1/1 : the sample is homozygous alternate

For haploid calls, e.g. on Y, male nonpseudoautosomal X, or mitochondrion, only one allele value should be given; a triploid call might look like 0/0/1.

If a call cannot be made for a sample at a given locus, ‘.’ should be specified for each missing allele 5 in the GT field (for example ‘./.’ for a diploid genotype and ‘.’ for haploid genotype).

Subset vcf

# extract list of samples from VCF
bcftools view -S sample.txt input.vcf -Oz -o sample.vcf

# remove list of samples from VCF
bcftools view -S ^sample.txt input.vcf -Oz -o sample.vcf

# or
vcftools --vcf input.vcf  --recode --recode-INFO-all --stdout  --remove-indv sample1  --remove-indv sample2  --remove-indv sample3 > sample.vcf
上一篇下一篇

猜你喜欢

热点阅读