swp
New Member
Posts: 2
|
Post by swp on Apr 20, 2015 19:27:44 GMT
I know this has been addressed in previous threads and the typical explanation is sample size. However, I'm considering nearly 10,000 samples...seems like something other than sample size might be causing this. Any suggestions for how to address this?
|
|
|
Post by Zhihong Zhu on Apr 26, 2015 12:03:42 GMT
Hi there,
Did you perform any QC on the dataset? One possible reason is that the true signal might be removed by the QC. How about the estimate without QC?
Cheers, Zhihong
|
|
swp
New Member
Posts: 2
|
Post by swp on Apr 27, 2015 18:01:05 GMT
Yes...I filtered out SNPs with maf<1%, but that's it. The same dataset has been used in GWAS before and significant hits are detectable. Not sure that's the problem. Any other ideas?
|
|
|
Post by Zhihong Zhu on Apr 30, 2015 2:00:10 GMT
Hi there,
It is possible that some SNPs were genome-wide significant but the total SNP heritability is very small, nearly zero. Maybe trying estimating SNP heritability chromosome by chromosome to find out what's going on in each chromosome.
Cheers, Zhihong
|
|
|
Post by jaleal on Jun 4, 2015 0:12:00 GMT
I am trying to estimate additive and dominance variance for some SNPs. I can fit each grm separately, but when fitting together I get the "Error: the information matrix is not invertible." I tried switching to the EM algorithm with --reml-alg 2, but it failed to converge after 200 iterations and was quite slow in getting there.
The sample size is 6000 individuals with 1070 SNPS.
I realize that the dominance variance is low. Is that the only reason? because I have many other replicates with equivalent or lower dominance variances that went through no problem. This error occurs about in about 1/10000 simulated replicates.
I can send you the data as examples if there is an easy way to do that.
GCTA.add.hsq Source Variance SE V(G) 0.000460 0.000048 V(e) 0.002211 0.000042 Vp 0.002670 0.000060 V(G)/Vp 0.172143 0.015372 logL 15076.009 logL0 14893.753 LRT 364.513 df 1 Pval 0 n 6000
GCTA.domi.hsq Source Variance SE V(G) 0.000014 0.000010 V(e) 0.002553 0.000047 Vp 0.002567 0.000047 V(G)/Vp 0.005442 0.003768 logL 14894.016 logL0 14893.753 LRT 0.526 df 1 Pval 0.2342 n 6000
|
|
|
Post by Zhihong Zhu on Jun 4, 2015 7:29:27 GMT
Hi Jaleal, Yes, I think it could be because dominance is low. I estimated hA (additive heritability at all SNPs) and hD (dominance heritability at all SNPs) for 79 traits, where on average hD/hA = 0.2, and var(hD) / var(hA)=2. Now your hA is almost equal to zero, it is possible that dominance GRM is not invertible. (Please refer details to www.cell.com/ajhg/abstract/S0002-9297(15)00009-9)Cheers, Zhihong
|
|
|
Post by jaleal on Jun 4, 2015 14:58:58 GMT
Hey Zhihong,
Okay, but thousands of my runs have those characteristics and do not fail. This example is not even close to having the smallest values. For example, this run was successful. All of my hA are on the order 0.05-0.25.
Source Variance SE V(G1) 0.000236 0.000031 V(G2) -0.000005 0.000004 V(e) 0.002140 0.000041 Vp 0.002370 0.000048 V(G1)/Vp 0.099412 0.012285 V(G2)/Vp -0.002312 0.001730
Am I misinterpreting the meaning of orthogonality in this context? I took it to mean that the estimate of variance associated with a grm is independent any other grm's fit. I expected I could run the following code and get the same estimates of V(A) and V(D) whether I did two separate calls to --reml --grm or one call to --reml --mgrm. Indeed for many examples this appears to be true, within some error. However, in my initial example I can successfully run two separate calls to --grm, but it fails with --mgrm. If the dominance grm is not invertible, why can be it be successfully fit with --reml --grm?
Thanks, Jaleal
Code: gcta64 --bfile data --autosome --make-grm --out test_add
##Do the domimance thing ##Generates test_domi.*.grm.* gcta64 --bfile data --autosome --make-grm-d --out test_domi
#make the file echo "test_add" > add_domi_grm. txt echo "test_domi.d" >> add_domi_grm. txt
#pheno file cut -d" " -f 1,2,6 data.ped > test.phen
#fit the models gcta64 --reml --grm test_add. --pheno test.phen --out GCTA.add --reml-no-constrain --reml-alg 1 --reml-maxit 200 gcta64 --reml --grm test_domi.d --pheno test.phen --out GCTA.domi --reml-no-constrain --reml-alg 1 --reml-maxit 200 gcta64 --reml --mgrm add_domi_grm.txt --pheno test.phen --out GCTAoutput --reml-no-constrain --reml-alg 1 --reml-maxit 200
|
|
|
Post by Zhihong Zhu on Jun 5, 2015 5:33:18 GMT
Hi Jaleal
No, I don't think you misinterpret the meaning of orthogonality and your conclusion (hD is tiny) is right. But I think there may be something unusual in your dataset.
1. That hD is independent from hA doesn't mean the slope is exactly 0 if I plot hA vs hD, e.g. Figure S2 in that paper. So hD from joint analysis (fitting both GRM_A and GRM_D) could be slightly different from hD from separate analysis (fitting GRM_D only). And hD is ~0 when you fit GRM_D only, i.e. it is likely that the hD would be negative when you fit both GRM_A and GRM_D
2. The GRM is not invertible, suggesting the hD is largely < 0. Although basically we can estimate the hD, it is possible that GRM_D is not invertible when sample size is not large, because var(hD) is large. In the paper, I compared the SNP heritability (i.e. both hA and hD) estimated from Hapmap3 SNPs with the ones estimated from genotyped SNPs in ARIC (figure was not shown in the paper). In general, hA (or hD) estimated from Hapmap3 SNPs is consistent with the ones estimated from genotyped SNPs. Whereas variance of hD from two SNP sets > variance of hA from two SNP sets. So hD is more sensitive to the quality of the data. But I don't think it could change the conclusion.
Go back to your results, I'm sorry I didn't read your results carefully last time. The SEs for both hA and hD are too large. Too my experience, the SE for hA would be ~0.05, for hD would be ~0.07, since sample size of ARIC data in the paper is ~6600 (Table S3 in the paper). I'm not sure why the SE is so large, maybe because you didn't remove related samples or include rare SNPs. Please check your dataset.
Cheers, Zhihong
|
|
|
Post by jaleal on Jun 5, 2015 16:20:29 GMT
Thanks Zhihong,
Getting rid of rare variants (MAF<0.01) seemed to solve this issue.
-Jaleal
|
|