tlong
New Member
Posts: 5
|
Post by tlong on Feb 24, 2016 0:00:23 GMT
Hi,
I am using cojo-slct to find independent SNPs out of a GWAS analysis. I tried applying the approach on a few quantitative phenotypes but don't always have the most significant SNP in the independent SNP list. From the Nature Genetics 2012 paper, "Model selection" in online methods, it is described that the approach "start with a model with the most significant SNP in the single-SNP meta-analysis across the whoel genome with P value below a cutoff Pvalue". So I assume without backward selection, the most significant SNP should always be included in the independent SNP list. Could you please help me to understand what's been going on?
Here is the script I use:
gcta64 --bfile plink_bed_filename --extract single_gwas_significant_SNPs_for_the_phenotype --cojo-file single_gwas_SNPs_summary_statistics --cojo-slct --cojo-p 5e-8 --out independent_SNPs
Thank you very much!
Tao
|
|
tlong
New Member
Posts: 5
|
Post by tlong on Feb 24, 2016 0:13:05 GMT
single_gwas_SNPs_summary file:
ID alt ref AltFreq SnpWeight SnpWeightSE pvalue num_GWAS_genomes chr19_54136420_GT_G G GT 0.627 -0.020550385946 0.00312256019222 5.97698890709e-11 1953.0 chr19_54139698_G_T T G 0.5881 -0.0241693025795 0.00306677022351 5.352759898e-15 1953.0 chr19_54140254_A_G G A 0.6027 -0.0237988814953 0.00303911765836 7.89080046128e-15 1953.0 chr19_54140572_G_A A G 0.5901 -0.0244411811735 0.00307547961321 3.19968501637e-15 1953.0 chr19_54140778_C_T T C 0.5904 -0.0247518433672 0.00308044658761 1.60241114882e-15 1953.0 chr19_54140834_G_C C G 0.5911 -0.0248172780364 0.00307978084241 1.33620139603e-15 1953.0 chr19_54141163_A_G G A 0.5891 -0.023881926488 0.0030678910651 1.127272188e-14 1953.0 chr19_54142105_G_A A G 0.5893 -0.0243344738719 0.00308030164922 4.61844937839e-15 1953.0 chr19_54142554_C_T T C 0.5896 -0.0252110656229 0.00307803439002 4.64509507436e-16 1953.0 chr19_54149837_T_C C T 0.7268 -0.0206450571623 0.00305464352391 1.83432873939e-11 1953.0 chr19_54149862_C_CTT CTT C 0.7197 -0.0213092896388 0.00314557025403 1.64878494285e-11 1953.0 chr19_54151042_T_C C T 0.7243 -0.0201683711118 0.00306030397779 5.63104653336e-11 1953.0 chr19_54160472_C_G G C 0.498 0.0216668818683 0.00308337655278 2.90552062445e-12 1953.0 chr19_54161015_C_A A C 0.5182 -0.0214961043349 0.00307824920128 3.94161898387e-12 1953.0 chr19_54162234_T_C C T 0.6813 -0.0220631515587 0.00309038234083 1.3182043331e-12 1953.0 chr19_54170611_GTTTTA_G G GTTTTA 0.5591 -0.0283636096163 0.00308350995823 9.04854003302e-20 1953.0 chr19_54171009_T_C C T 0.555 -0.0233102837535 0.00299086282878 1.04916854658e-14 1953.0 chr19_54171328_C_T T C 0.5607 -0.0272444141288 0.00303937514945 7.16159504368e-19 1953.0 chr19_54172738_T_C C T 0.5748 -0.0262191806894 0.00305591583888 1.90627528528e-17 1953.0 chr19_54173068_T_C C T 0.5655 -0.0279448807592 0.00306646853372 1.93229224353e-19 1953.0 chr19_54173119_GC_G G GC 0.5499 -0.0263643896937 0.00308207483073 2.3636501922e-17 1953.0 chr19_54173307_C_G G C 0.5453 -0.0255508776745 0.00296373449593 1.3469999786e-17 1953.0 chr19_54173495_T_C C T 0.5676 -0.0281705493472 0.00306204330958 8.93331316279e-20 1953.0 chr11_61783884_T_C C T 0.3607 -0.0210599661074 0.00307867202433 1.05188921969e-11 1953.0 chr11_61784455_A_C C A 0.3592 -0.020806344318 0.00308181974079 1.92635764583e-11 1953.0 chr11_61785208_G_T T G 0.3541 -0.02077128801 0.00308270916616 2.10701425303e-11 1953.0 chr11_61790354_T_C C T 0.3554 -0.0207127027827 0.00308245948698 2.38509628797e-11 1953.0 chr11_61796827_G_T T G 0.3658 -0.0202166586798 0.00308441290644 7.12752787251e-11 1953.0 chr11_61798436_T_C C T 0.3661 -0.0201808869442 0.00307545009168 6.78668696587e-11 1953.0 chr11_61801834_C_G G C 0.3546 -0.0205322358179 0.00308336974328 3.57306867222e-11 1953.0 chr11_61802358_C_T T C 0.3543 -0.0202499697986 0.00308694358577 6.87872135393e-11 1953.0 chr11_61803876_C_G G C 0.3257 -0.0203775343064 0.00308455892108 5.06744420646e-11 1953.0 chr11_61803910_G_A A G 0.3198 -0.0208613360468 0.00308300338286 1.73817311701e-11 1953.0 chr11_61804006_T_C C T 0.3538 -0.0202233119426 0.00308651625076 7.23632325028e-11 1953.0 chr11_61806212_T_C C T 0.3469 -0.0203722946473 0.00305783109033 3.49524279678e-11 1953.0 chr11_61807686_A_G G A 0.3546 -0.0203466306548 0.00308386506134 5.36106142099e-11 1953.0 chr11_61812288_T_C C T 0.3193 -0.0202474271296 0.00308291153025 6.53731415649e-11 1953.0 chr11_61813163_C_T T C 0.3195 -0.0206087604763 0.00308370580624 3.04323086072e-11 1953.0 chr11_61814292_T_C C T 0.3257 -0.0203835518846 0.00308254479522 4.86187125373e-11 1953.0 chr11_61815236_T_C C T 0.32 -0.0205773402498 0.00307951702436 3.06590866951e-11 1953.0 chr11_61817672_A_G G A 0.3551 -0.0204715335963 0.00308338713455 4.07327668715e-11 1953.0 chr11_61826344_C_T T C 0.3543 -0.0203187231113 0.00308331379711 5.6465902131e-11 1953.0 chr11_61830500_A_G G A 0.3566 -0.0209346395344 0.00308404610131 1.50325250295e-11 1953.0
|
|
tlong
New Member
Posts: 5
|
Post by tlong on Feb 24, 2016 0:14:08 GMT
output independent SNPs file:
Chr SNP bp refA freq b se p n freq_geno bJ bJ_se pJ LD_r 11 chr11_61783884_T_C 61783884 C 0.3607 -0.02106 0.00307867 1.05189e-11 1954.02 0.356929 -0.02106 0.00311454 1.36264e-11 0 19 chr19_54140834_G_C 54140834 C 0.5911 -0.0248173 0.00307978 1.3362e-15 1842.63 0.586241 -0.0671509 0.00503278 1.30661e-40 0.774139 19 chr19_54160472_C_G 54160472 G 0.498 0.0216669 0.00308338 2.90552e-12 1790.62 0.493519 0.220196 0.00710561 7.56195e-211 0.889408 19 chr19_54161015_C_A 54161015 A 0.5182 -0.0214961 0.00307825 3.94162e-12 1799.79 0.514457 -0.165854 0.00701608 1.52841e-123 0
|
|
tlong
New Member
Posts: 5
|
Post by tlong on Feb 24, 2016 0:16:14 GMT
In this example, the most single significant SNP chr19_54173495_T_C with p-value 8.93331316279e-20 is not included in the independent SNP list.
|
|
|
Post by Jian Yang on Feb 24, 2016 5:42:02 GMT
One possible explanation is that chr19_54173495_T_C is no longer the most significant SNPs when fitted with other SNPs
BTW, your result doesn't look right to me. There are multiple SNPs that in high LD being picked up by model selection. This is highly unlikely to be true. Normally it's due to the reference alleles of some of the SNPs in this region are mislabelled.
|
|
tlong
New Member
Posts: 5
|
Post by tlong on Feb 24, 2016 18:44:14 GMT
Thank you for your reply!
So step (4) described in "Model selection" part of the NG2012 online methods is a kind of backward selection. In this case we know the most significant SNP is likely to be causal but not included in the model. Maybe there should be some kind of regularization on number of SNPs included in the model?
The highly correlated SNPs have opposite signs of effect size. Maybe that's the reason they are all kept in the model?
|
|
|
Post by Jian Yang on Feb 24, 2016 23:10:06 GMT
Re 1) Yes, that's very likely given the complicated LD structure.
Re 2) In theory, it's possible that there are multiple linked causal variants with masking effects. However, I think it's unlikely to be true. As I said in my previous post, it's likely that the reference alleles are mislabelled.
|
|