Are there any statistics about how much sequence data is in IGSR?

Answer:

We do not provide summary statistics that span the collections in IGSR. However, our index files, provided for each data collection on the FTP site, include information for each collection. The following describes the information available using the 1000 Genomes Project files as an example, however, similar files are available for the other data collections.

For raw data, a sequence.index file contains base and read counts for each of the active FASTQ files.

For the aligned data all BAM and CRAM files have BAS files associated with them. These contain read group level statistics for the alignment. We also provide this in a collected form in alignment index files. The alignment indices for the alignments of the 1000 Genomes Project data to GRCh38 are available on the FTP site. There is also an historic alignment indices directory, which contains a .hsmetrics file with the results of the Picard tool CalculateHsMetrics for all the exome alignments and summary files, which compare statistics between old and new alignment releases during the 1000 Genomes Project.

Related questions:

How many individuals have been sequenced in IGSR projects and how were they selected?

Answer:

There is data from 4973 individuals in IGSR, some related.

Related questions:

What sequencing platforms and methods were used by different projects within IGSR?

Answer:

Data in IGSR spans a wide range of technologies.

The technologies used in recent work are listed in the data portal and are visible in ‘Technology view’, although this does not include older technologies used in, for example, the pilot phase of the 1000 Genomes Project. The portal also enables filtering of data sets by technology.

The most common form of data is Illumina genomic sequence data. However, the number of samples with long read data from PacBio and Oxford Nanopore is growing.

Further detail on the technologies present in our collection can be found in the accompanying publications for the given collection.

Related questions: