Description of data download and file structure
(73983 files, 3.0Gb)
As an example,
AR2/DATASETS/Dvar0.005/STOX1_DIR/contain the case and control datasets for simulation 54 of gene STOX1, simulated under AR2 with only deleterious effects having been introduced and with the total variance explained by the locus being 0.5%. All datasets are in GZIP compressed VCF format.
AR2/RESULTS/Dvar0.01_MAFthr0.005, where MAF cut-offs of 5% and 0.5% were used, respectively.
The scheme followed for the presentation of results is identical to that used by PSEQ, with result files obtained from other gene-based methods being re-formatted in the same way, for consistency. Each results file contains 9 columns, of which columns 1, 2, 5, 6, and 7 are of interest to the reader, containing the dataset being tested, the locus, the number of variants being tested, the test used, and the resulting p-value, respectively, in this order.
For instance, script
AR2/SCRIPTS/Dvar0.005/STOX1_DIR/was used to create datasets
STOX1_HM.Dvar0.005_54.cases.vcf.gz, mentioned above. The relevant information is contained in line 5 of each script (where the HAPGEN2 command is run).
A separate section on the website ("Dataset Generation", currently under construction) will contain required files and instructions for building datasets under various settings, including the ones employed in this work.
All files are GZIP compressed TAR archives — the file format is *.tar.gz
Please be aware that some browsers handle downloaded files differently, which can result in changed MD5 checksums