|
Post by Fernanda on Aug 28, 2014 10:11:23 GMT
Hi,
I'm doing chromosomal partitioning on my sample. Since the sample size is very small I'm fitting one chr at the time and including PCs as covariates to correct for pop. structure.
The V(G)/Vp values I get per crh are 0.## (e.g. 0.52, 0.32, 0.25...) meaning that the sum of such values goes above 1 (100%). Checking Yang et. al 2011 NG, the values per chr are ~0.0## (e.g. 0.005, 0.002...) even when the chromosomes were fitted separately (same way I am fitting mine), and therefore the sum of var explained by all chr equals the var explained by all autosomal SNPs (calculated without chr partitioning).
I know that some overestimation of V(G) could be due to relatedness, but I dont think that explains the pattern, because I checked the behavior of the V(G) estimate when increasing the # of PCs included, and it stabilizes after the 10th PC is included.
Any explanation for such huge values?
Thanks, F.
|
|
|
Post by Zhihong Zhu on Sept 2, 2014 13:03:25 GMT
It may be due to SE. If the SE is similar to the estimate, I don't think the estimate is significantly different from zero.
And I would use all SNPs to estimate the hg^2 explained by all the SNPs, because the error of hg^2 could be accumulated if summing up the estimates directly.
Cheers, Zhihong
|
|
|
Post by Fernanda on Sept 4, 2014 13:43:57 GMT
Thanks for the comment Zhihong. But I think the SE is not the problem. The LRT is highly significant for all the chr, the worse chromosome is p= 0.001 and still the estimates per chr are too big to make sense. Here is an example of the results I get:
h2(vG/vP) se pvalue chr1 0.48 0.12 3.79E-06 chr2 0.52 0.11 1.90E-06 chr3 0.44 0.12 2.03E-04 chr4 0.52 0.11 5.39E-07 chr5 0.47 0.12 3.28E-05 chr6 0.64 0.10 1.61E-08 chr7 0.58 0.10 7.66E-08
|
|
|
Post by Zhihong Zhu on Sept 6, 2014 1:59:43 GMT
Did you remove related samples? I think you can check the distribution of off-diagonal elements of GRM, because extremely large value (ie. > 1) may cause the REML algorithm unstable and the related samples will increase the estimates.
And I think you can also do a gwas fitting the same PCs as covariate and draw a qq plot to find if there is any inflation.
Cheers, Zhihong
|
|
|
Post by Jian Yang on Sept 8, 2014 12:15:21 GMT
Thanks for the comment Zhihong. But I think the SE is not the problem. The LRT is highly significant for all the chr, the worse chromosome is p= 0.001 and still the estimates per chr are too big to make sense. Here is an example of the results I get: h2(vG/vP) se pvalue chr1 0.48 0.12 3.79E-06 chr2 0.52 0.11 1.90E-06 chr3 0.44 0.12 2.03E-04 chr4 0.52 0.11 5.39E-07 chr5 0.47 0.12 3.28E-05 chr6 0.64 0.10 1.61E-08 chr7 0.58 0.10 7.66E-08 It seems very likely that you have a large proportion of close relatives in the sample. The estimate of h2chr will certainly be inflated in a single chromosome GREML analysis. I would suggest a 22-component GREML analysis. In Yang et al. 2011 Nat Genet, we should how you can quantify the confounding of population stratification and cryptic relatedness by comparing the estimate of h2chr from a single component GREML to those from a 22-component joint GREML analysis.
|
|
yang
New Member
Posts: 4
|
Post by yang on May 6, 2015 9:47:49 GMT
Thanks for your suggestions. I come across a similar problem in my GREML analysis, where h2 per chromosome are inflated. However, when I try to follow Yang et al. 2011 Nat Genet, and perform a 22-component GREML analysis, the information matrix is no longer invertible.
My samples (~10K) are from a homogeneous population, and I have already removed all close relatives (genetic relationship > 0.025). Any suggestion on how this problem can be solved will be much appreciated.
Yang
|
|
|
Post by Zhihong Zhu on May 10, 2015 3:32:10 GMT
Hi Yang,
How about fitting only one GRM (not 22 GRMs together) in the model? I think GRMs for some chromosomes (at least one) are not invertible. Then you could estimate h2 by fitting GRM without doing any QC for the chromosomes (one chromosome by one chromosome). If the GRM is invertible, QC may be too strong.
Cheers, Zhihong
|
|
yang
New Member
Posts: 4
|
Post by yang on May 11, 2015 8:52:54 GMT
Thanks Zhihong for your reply. The single GRM (all autosomes combined) is invertible, so I will try to use a less stringent QC as you suggested. Please can you also provide an explanation of this particular behaviour of GREML analysis? As it is not clear to me why fitting a joint 22-component analysis become possible when individual chromosome is not invertible. Thanks!
Yang
|
|
|
Post by Zhihong Zhu on May 11, 2015 23:35:37 GMT
Hi Yang,
Because GREML needs to inverse a V matrix ( information matrix) which is equal to sum(sigma * GRM), where sigma is the variance (e.g. genetic variation for each chromosome). If the sigma is negative and very small (e.g. -3), the V matrix is not invertible.
I think at least one of the 22 GRMs (22 chromosomes) are not invertible. I'm not sure the reason, but it may be due to too stringent QC. Please check which GRMs (in the 22 GRMs) are not invertible.
Cheers, Zhihong
|
|
yang
New Member
Posts: 4
|
Post by yang on May 13, 2015 8:30:04 GMT
Hi Zhihong,
I have double checked, every individual chr-GRM is invertible, so is the single GRM of all autosome. The 'not invertible' issue only appears when I'm doing a 22-component joint analysis. Like Fernanda pointed out in this thread, my problem seems to come from an overestimation of per chr V(G) /Vp (~0.49 for each chromosome).
I have also re-done my analysis by relaxing the QC, but it didn't solve the non-invertible problem either. However, in my analysis, I am particularly interested in the heritability explained due to rare variants (0.1%<MAF<1%). Thus, I did not remove rare variants in the GREML analysis as the conventional approach. As I understand, the genotypes have been standardised in GCTA, z = x/(2*p*(1-p)), which supposes the contribution of each SNP is equal. I can imagine the summarisation of these rare variants might cause the information matrix become more difficult to invert. Do you have any suggestion for using GCTA when one would like to include rare variants in the heritability analysis?
Many thanks,
Yang
|
|
|
Post by Zhihong Zhu on May 14, 2015 12:31:01 GMT
Hi Yang,
So the rare SNPs may cause the collinearity, i.e. the GRMs are not invertible in the joint analysis. I haven't doing any analysis on rare SNPs before, but can you please try estimating GRMs based on MAFs ( i.e. rare SNPs and common SNPs) and merging them together, chromosome by chromosome?
Cheers, Zhihong
|
|
yang
New Member
Posts: 4
|
Post by yang on May 15, 2015 13:40:21 GMT
Thanks Zhihong. I have done common and rare variants GREML analysis separately, and both of them converge. However, I am not sure whether simply adding individual h2 estimate together (assuming no effect from population structure) is the correct approach. For example, I have the following estimates for the common variant (MAF>5%) analysis: Source Variance SE V(G) 0.227790 0.003831 V(e) 0.000000 0.001061 Vp 0.227791 0.003608 V(G)/Vp 0.999999 0.004658
and the following for rare (0.1%<MAF<1%): Source Variance SE V(G) 0.160655 0.003799 V(e) 0.000000 0.003228 Vp 0.160655 0.002489 V(G)/Vp 0.999998 0.020095
But the combined single GRM analysis gives me: Source Variance SE V(G) 0.346998 0.019509 V(e) 0.000000 0.015634 Vp 0.346999 0.006870 V(G)/Vp 0.999999 0.045055
Thus the combined h2_common + h2_rare = 0.388 is greater than the single GRM estimate (0.346). I suspect this phenomenon follows similar argument of the joint 22-component analysis present in Yang et al 2011 -- there is an effect of population structure, common SNPs are correlated with rare SNPs such that h2_sep is an overestimate in the separate analysis. Since the joint analysis encounters the collinearity problem, what would be an appropriate way to overcome this overestimation from separate analysis?
Thanks,
Yang
|
|
|
Post by Zhihong Zhu on May 16, 2015 4:53:42 GMT
Hi Yang, Apart from the question you asked, I think 1. the SE of V(G)/VP is a little small in three scenarios; 2. it is impossible that V(G)/VP =1. I don't know what's the reason, but please check your data carefully. GCTA power calculator could provide you the approximate SE of SNP heritability, spark.rstudio.com/ctgg/gctaPower/. Given a sample size of 10K, SE is ~0.03 for common SNPs. The var(GRM), variance of off-digonal elements of GRM, would be larger for rare SNPs, ie. SE of SNPs heritability for rare SNPs would be a little smaller than 0.03. But your SE is ~0.005 for common SNPs, i.e. there may be some errors in estimating GRM. So please check you data carefully. In terms of your question, VG and VG/VP are estimates (which have SEs) not true parameters, so we can not sum them up. The whole SNP heritability can be estimated by fitting the combined single GRM, and the SNP heritability for rare and common SNPs can be estimated by fitting two GRMs together, as what you did. Cheers, Zhihong
|
|