Why does a tabix fetch fail?
It is possible to get subsections of both bam files and vcf files both using our browser and on the command line.
In the browser you use our Data Slicer tool which is documented here. The slicing tool allows you to get specific subsections of externally visible bam and vcf files and you can also subsample by individual and population when using vcf files.
Both these tasks are also possible using the command line
For bam files you need to first install samtools. Then run a command like:
samtools view -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00154/alignment/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 17:7512445-7513455
For vcf files you need to use tabix with a command like:
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804 ALL.2of4intersection.20100804.genotypes.vcf.gz 2:39967768-39967768
You can also subset the samples you get genotypes for using vcftools like this:
tabix -h ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20100804/ALL.2of4in... 17:1471000-1472000 | perl /nfs/1000g-work/G1K/work/bin/vcftools/perl/vcf-subset -c HG00098 | bgzip -c /tmp/HG00098.20100804.genotypes.vcf.gz
vcftools will also allow you to compare your own snp calls with those in another vcf file using vcf-compare.
The vcf files can also provide information like allele frequency for the variants. The allele frequency is given in the info column of the vcf file with the key AF. This frequency will be based on all the individuals/populations in the file unless otherwise documented. When providing Global variant files we try and also provide with AF for the super populations and these will generally be found in the supporting directory under the release directory. The info column also provides allele count and allele number values. Allele number (AN) is total number of called alleles e.g AN:120 means there are 60 diploid individuals for this variant. Allele Count (AC) represents the number of alternative alleles, where there are more than one alternative alleles you get a comma separated list that will give the allele counts in the same order as the alternative alleles appeared in the ALT field. Please note that AC/AN will not always been the same as a given Allele frequency (AF). This is because AF estimates are based on additional data as well as the allele numbers and counts.
The majority of our vcf files are currently in format version 4.0
You can convert a vcf file into plink/ped formatted file using vcftools other_formats
You can also upload a remotely accessible vcf file which has a tabix index into our browser using the Attach Remote File option described in the tutorial
Please note all our vcf files using straight intergers and X/Y for their chromosome names in the Ensembl style rather than using chr1 in the UCSC style. If you request a subsection of a vcf file like
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804 ALL.2of4intersection.20100804.genotypes.vcf.gz chr2:39967768-39967768
It will fail.