Our final release of the Phase 3 variant set is now available on the FTP site, including a newly added VCF file for chrY.
The chrY variant calls were made with a different process from that of the autosomes; a separate README is available in the release directory describing some details.
The chrX VCF file has been updated to include standard annotation including DP, continental super-population allele frequency.
The site file in the release directory is now wgs containing autosomes, chrX and Y.
Two algorithms were used to discover short tandem repeats (STRs) in the phase3 data. However the STRs did not make into the final integrated call set. They are now available separately here.
The VCF files in the main release directory are also now available here in BCF format for faster processing time.
This release includes super population allele frequencies in the main release VCFs and functional annotation from the Ensembl Variant Effect Predictor along side many other datasets in the supporting directory. The complete list of data is covered in the Supporting Directory README. The issues which have been raised and resolved since our initial release are covered in the Known Issues README.
Please send any questions about this data set to firstname.lastname@example.org
Recent project announcements
The EMBL-EBI FTP site will be at reduced capacity between November 21st and December 8th due to EMBL-EBI wconsolidating its web infrastructure into a single data centre.
Please use the NCBI FTP site in preferance where possible during this period.
If you have any questions about this please email email@example.com
We have now added a set of Chromosome X variants as part of our final release.
The genotypes and sites are available in our main release directory.
We will update the file during November. We need to add functional annotation and super population allele frequency and per site sequence depth information.
The 1000 Genomes Project is an international collaboration to produce an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts. This resource will support genome-wide association studies and other medical research studies.
The genomes of about 2500 unidentified people from about 25 populations around the world will be sequenced using next-generation sequencing technologies. The results of the study will be freely and publicly accessible to researchers worldwide.
Further information about the project is available in the About tab. Information about downloading, browsing or using the 1000 Genomes data is available in the Data tab.