Post by Jian Yang on Jul 10, 2017 7:22:53 GMT
This is a method to run BLUP analysis using summary data from a GWAS or meta-analysis and LD from a reference sample with individual-level data. Similar methods have been proposed in recent studies to predict complex traits and diseases using GWAS summary data (Vilhjálmsson et al. 2015 AJHG; Robinson et al. 2017 Nat Hum Behav) and to predict age using summary data from transcriptome-wise association studies (Peters et al. 2015 Nat Comm).
Example
--cojo-file test.ma
Input file in GCTA-COJO format (see GCTA-COJO)
--cojo-sblup
Perform COJO-SBLUP analysis. The input parameter = m * (1 / h2SNP - 1) where m is the total number of SNPs used in this analysis (i.e. the number of SNPs in common between the summary data and the reference set), and h2SNP is the proportion of variance in the phenotype explained by all SNPs. h2SNP can be estimated from GCTA-GREML if individual-level data are available or from LD score regression analysis of the summary data.
--cojo-wind 1000
Specify a distance d (in Kb unit). LD between SNPs more than d Kb away from each other are ignored. The default value is 10000 Kb (i.e. 10 Mb) if not specified. We recommend a window size of 1 Mb for the ease of computation.
Note: since chromosomes are independent, the analysis can be performed for each chromosome separately.
Output file format
Columns are SNP, the coded allele, effect size in the original GWAS summary data, and BLUP estimate of the SNP effect (all SNPs are fitted jointly).
References
COJO-SBLUP method: Robinson et al. (2017) Genetic evidence of assortative mating in humans. Nat Hum Behav, 1:0016.
GCTA software: Yang J, Lee SH, Goddard ME and Visscher PM (2011) GCTA: a tool for Genome-wide Complex Trait Analysis. Am J Hum Genet, 88: 76-82. [PubMed ID: 21167468]
Example
gcta64 --bfile test --cojo-file test.ma --cojo-sblup 1.33e6 --cojo-wind 1000 --thread-num 20
--cojo-file test.ma
Input file in GCTA-COJO format (see GCTA-COJO)
--cojo-sblup
Perform COJO-SBLUP analysis. The input parameter = m * (1 / h2SNP - 1) where m is the total number of SNPs used in this analysis (i.e. the number of SNPs in common between the summary data and the reference set), and h2SNP is the proportion of variance in the phenotype explained by all SNPs. h2SNP can be estimated from GCTA-GREML if individual-level data are available or from LD score regression analysis of the summary data.
--cojo-wind 1000
Specify a distance d (in Kb unit). LD between SNPs more than d Kb away from each other are ignored. The default value is 10000 Kb (i.e. 10 Mb) if not specified. We recommend a window size of 1 Mb for the ease of computation.
Note: since chromosomes are independent, the analysis can be performed for each chromosome separately.
gcta64 --bfile test --chr 1 --cojo-file test.ma --cojo-sblup 1.33e6 --cojo-wind 1000 --thread-num 20
gcta64 --bfile test --chr 2 --cojo-file test.ma --cojo-sblup 1.33e6 --cojo-wind 1000 --thread-num 20
...
gcta64 --bfile test --chr 22 --cojo-file test.ma --cojo-sblup 1.33e6 --cojo-wind 1000 --thread-num 20
Output file format
rs10057531 T -0.0075 -0.0003681
rs10039735 T -0.0095 -0.000469503
rs1507712 A 0.0091 0.000438602
rs6869386 T -0.0068 -3.35496e-05
rs7734346 T 0.0106 0.000133413
...
Columns are SNP, the coded allele, effect size in the original GWAS summary data, and BLUP estimate of the SNP effect (all SNPs are fitted jointly).
References
COJO-SBLUP method: Robinson et al. (2017) Genetic evidence of assortative mating in humans. Nat Hum Behav, 1:0016.
GCTA software: Yang J, Lee SH, Goddard ME and Visscher PM (2011) GCTA: a tool for Genome-wide Complex Trait Analysis. Am J Hum Genet, 88: 76-82. [PubMed ID: 21167468]