Can I run a GREML analysis in a small sample? | Complex Trait Genetics Forum

Can I run a GREML analysis in a small sample? Sept 15, 2015 1:06:08 GMT

Post by Jian Yang on Sept 15, 2015 1:06:08 GMT

It is not recommended to run a GCTA-GREML analysis in a small sample. When the sample size is small, the sampling variance (standard error squared) of the estimate is large (see GCTA-GREML power calculator), so the estimate of SNP-heritability (h2-SNP) will fluctuate a lot and could even hit the boundary (0 or 1). Therefore, when the sample size is small, it is not surprising to observe an estimate of SNP-heritability being 0 or 1 (with a large standard error).

If the estimate hits the boundary (0 or 1), the phenotypic variance-covariance matrix (V) will often become invertible and you will see error message
"Error: the variance-covaraince matrix V is not positive definite"
or the REML analysis is not converged with an error message
"Log-likelihood not converged”

Q1: How many samples are required for a GCTA-GREML analysis?
A1: For unrelated individuals and common SNPs, you will need at least 3160 unrelated samples to get a SE down to 0.1 (see Visscher et al. 2014 PLoS Genet). For GREML analysis with multiple GRMs and/or GRM(s) computed from 1000G imputed data, a much larger sample size is required (see Yang et al. 2015 Nat Genet).

Q2: Why do I need a small standard error (SE)?
A2: The 95% confidence interval (CI) is approximately h2-SNP estimate +- 1.96 * SE. If the SE is too large, the 95% CI will cover the whole parameter space (from 0 to 1) so that you won't be able to make any meaningful inference from the estimate.