|
Post by cdl on Oct 31, 2014 11:49:57 GMT
Hi,
I am interested in using the conditional and joint analysis in GCTA, but I was wondering how the program aligns the effect directions in the summary statistics file with those in the reference data. Though both files contain information about the reference and non-reference alleles, they typically don't specify which strand those refer to. This would seem to be especially problematic for AT and CG SNPs, as for those a strand mismatch would essentially be undetectable (for strand mismatch for the other types of SNPs, are those simply discarded or internally flipped to the correct strand?). Since misspecification of the sign of a SNP-SNP correlation could potentially induce a very strong (but spurious) joint/conditional effect for those SNPs and I couldn't find any explicit mention of strand alignment in the documentation, I wanted to make sure about how this was handled first.
Also, in a similar vein: are allele mismatches automatically dealt with as well (eg. a SNP with AC in the ref. data and AG in the summary stats file).
|
|
|
Post by Jian Yang on Nov 4, 2014 3:41:23 GMT
Yes, you are right. 1) The SNP alleles in the summary data and those in the reference data (PLINK genotype) need to be called on the same strand (positive strand). 2) In the summary file, you need to specify the coded allele (or someone would call it "effect allele" or "reference allele") to which the effect size is referred. In GCTA format, it needs to be "A1" (the second column of the input file). www.complextraitgenomics.com/software/gcta/cojo.html3) GCTA will match the coded allele specified in the input summary data to that in the reference data. If the alleles are not matched, you will see some funny results where the effect sizes of two SNPs blows up.
|
|