|
Post by Paul de Vries on Jun 20, 2014 7:02:03 GMT
I am doing conditional analysis (--massoc-cond cond.snplist), using a best-guess 1000G imputed data set with > 5000 unrelated individuals as the reference. The plink files do not contain information on position, so --massoc-wind is not doing it's job.
What should we think of new loci that become significant only after conditional analyses?
In my case, there were 2 such loci on chromosome 10. They were borderline before the conditional analysis, and became significant after adjustment. The jumps are not very big, but also not tiny (for example from 3E-8 to 9E-9, where the threshold we are using for 1000G data is 2.5E-8).
They are 10 and 30 Mbp away from the nearest hit.
Is this analogous to gaining more power when adjusted for age and sex, and thus something to report? Or are these artifacts that require steps to be taken (adding positional information to the plink files)?
Thanks!
|
|
|
Post by Jian Yang on Jun 20, 2014 7:12:28 GMT
A reference sample of > 4000 individuals is recommended (Yang et al. 2012 NG) for LD estimation. Your can use your own GWAS sample as the references sample (the best option) or one of the large participating cohort in a meta-analysis. You used > 5000 unrelated samples, which seems OK to me.
In Supplementary Fig 7 of Yang et al. 2012 NG, we have shown that SNPs > 10Mb are usually not in LD. I have to two suggestions: 1) You can match your SNPs to 1000G data to get the chr and bp information. Then the --massoc-wind option will apply. 2) You may replicate this using a new reference sample (with large sample size and similar population structure as your discovery sample).
The conditional analysis is not analogous to that of adjusting for age and sex. The latter is more about adjusting the residual variance whereas conditional analysis relies on LD between SNPs.
|
|
|
Post by Paul de Vries on Jun 20, 2014 21:54:01 GMT
So even for SNPs not in LD with any of the conditional SNPs, the residual variance explained by the conditional SNPs is not adjusted away.
OK thanks! I will follow your suggestions.
Paul
|
|
|
Post by Jian Yang on Jun 21, 2014 1:34:15 GMT
We fixed the residual variance the same the phenotypic variance (assuming variance explained by an individual SNP is small) to avoid over fitting of the model. Of course, this maybe over conservative but robust. If SNPs are not in LD, the conditional result will be the same as that from the original GWAS/meta-analysis.
|
|
|
Post by Paul de Vries on Jun 24, 2014 14:25:28 GMT
Dear Jian,
I have now added chr and position information to the plink files, but the loci still remain. Now I am doing massoc-slct though.
The LD pattern is as follows:
rs1-rs2: -0.0061 rs1-rs3: 0 rs2-rs3: 0
rs1 is the significant one at position 65Mbp. rs2 is borderline at position 71Mbp. rs3 is borderline at 91Mbp.
Yet they are jointly significant.
Is this making any sense?
|
|
|
Post by Jian Yang on Jun 25, 2014 6:45:42 GMT
I can understand rs1 and rs2 because of a small negative LD. I would try to replicate it using another independent reference sample. However, the 5e-8 is somewhat arbitrary and there is also a variance in conditional p-value due to LD estimation.
p-value for rs3 should remain unchanged if there are no other genome-wide significant SNPs within its window (+- 10Mb by default).
|
|