How do I find the reference genomes files?

Answer:

Our reference data sets can be found in technical/reference/ and this includes items like the reference genome, ancestral alignments and standard annotation sets.

There is also a frozen version of the reference data used for the pilot project available in pilot_data/technical/reference

Related questions:

Which reference assembly do you use?

Answer:

The reference assembly the 1000 Genomes Project has mapped sequence data to has changed over the course of the project.

For the pilot phase we mapped data to NCBI36. A copy of our reference fasta file can be found on the ftp site.

For the phase 1 and phase 3 analysis we mapped to GRCh37. Our fasta file which can be found on our ftp site called human_g1k_v37.fasta.gz, it contains the autosomes, X, Y and MT but no haplotype sequence or EBV.

Our most recent alignment release was mapped to GRCh38, this also contained decoy sequence, alternative haplotypes and EBV. It was mapped using an alt aware version of BWA-mem. The fasta files can be found on our ftp site.

Related questions: