|
Post by pingye on Jun 2, 2015 4:09:53 GMT
Hi GCTA developers,
I have 2 questions regarding the use of GCTA:
(1) My understanding is that by default the GRM is computed as WW'/N, where W is the standardized genotype design matrix and N is the number of SNPs used? I think if GRM is computed in this way then the estimated h^2 is the proportion of variance explained by the causal variants that are in LD with the genotyped SNPs we have? I also see in your 2010 Nat Genet paper you talked about adjusting the Ajk in order to estimate the total heritability by all causal variants (Multiply the Ajk by the beta). Is GCTA capable of doing that? If so then what options enable to do that? --grm-adj 0.1?
(2) If I pick the top 1,000 SNPs from a GWAS with a quantitative trait , and then use them to estimate the variance explained by the causal variants in LD with those 1,000 SNPs (using the same sample). Often time I see the estimated h^2 is very large. I think in this situation the estimated h^2 is biased upward? My thinking is that the sample already "saw" those 1,000 SNPs in GWAS analysis and "believed" them are in LD with causal SNPs (with relatively large effects). Then in the process of heritability estimation the sample tends to estimate the sigma^2(G) upward and sigma^2(e) downword? Similar to overfitting? Am I on the right track? I would like to hear your comments on this.
Thank you very much for your help and I appreciate your time!!
Best,
Pingye
|
|
|
Post by Zhihong Zhu on Jun 4, 2015 14:18:38 GMT
Hi there,
1 - 1, Yes, W is the standardised genotypes and N is the number of SNPs 1 - 2, Yes, the observed SNPs could capture the genetic variation at causal variation. The proportion of heritability captured by observed SNPs is related the averaged LD between observed SNPs and causal variants. 1 - 3, I don't think you need to do that. Basically, the observed SNPs (e.g. Hapmap2 SNPs or Hapmap3 SNPs) could capture most of genetic variation at causal variants.
2. I'm not sure about the SE of hG. Even in height, of which hg = 0.5, the most strongly associated 2,000 SNPs only explained ~20% of phenotypic variation. And sample size is ~250,000. So if it is a ploygenic phenotype, I don't think the hg of 1,000 SNPs would be very large.
PS: title of the publication: Defining the role of common variation in the genomic and biological architecture of adult human height.
Cheers, Zhihong
|
|