SNP-based heritability from imputed genotype data

anbai
New Member

Posts: 10

SNP-based heritability from imputed genotype data Aug 6, 2023 1:10:19 GMT

Quote

Post by anbai on Aug 6, 2023 1:10:19 GMT

Hi,

First, thank you for this wonderful software!
I have two questions: 1) SNP-based heritability, and 2) fastGWA using imputed genotype data.

Question 1: I am estimating the h2 of a quantitative trait by following this tutorial: yanglab.westlake.edu.cn/software/gcta/#GREMLinWGSorimputeddata

I followed the exact steps, except that I ran Step 1 per chromosome - because run all 22 autosomal chromosomes take too long. I then merge the output file together.

The program ran without error, but the results are unexpected: h2~= 0

I also merged the multiple GRMs into a single one as input displayed below: 'segment_based_ld_score_stratfied'

Here is the log:

*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.2 beta Linux
* (C) 2010-present, Jian Yang, The University of Queensland
* Please report bugs to Jian Yang <jian.yang.qt@gmail.com>
*******************************************************************
Analysis started at 17:40:43 EDT on Sat Aug 05 2023.
Hostname: ***

Accepted options:
--reml
--grm segment_based_ld_score_stratfied
--pheno pheno_normalized_residualized_h2.phen
--keep keep_for_gcta_h2.txt
--thread-num 8
--mpheno 1
--grm-adj 0
--out h2

Note: the program will be running on 8 threads.

Reading IDs of the GRM from [segment_based_ld_score_stratfied.grm.id].
33743 IDs read from [segment_based_ld_score_stratfied.grm.id].
Reading the GRM from [segment_based_ld_score_stratfied.grm.bin].
Reading the number of SNPs for the GRM from [segment_based_ld_score_stratfied.grm.N.bin].
GRM for 33743 individuals are included from [segment_based_ld_score_stratfied.grm.bin].
Reading phenotypes from [pheno_normalized_residualized_h2.phen].
Non-missing phenotypes of 25784 individuals are included from [pheno_normalized_residualized_h2.phen].
25784 individuals are kept from [keep_for_gcta_h2.txt].
Adjusting the GRM for sampling errors ...

25784 individuals are in common in these files.

Performing REML analysis ... (Note: may take hours depending on sample size).
25784 observations, 1 fixed effect(s), and 2 variance component(s)(including residual variance).
Calculating prior values of variance components by EM-REML ...
Updated prior values: 0.495765 0.971934
logL: -16812
Running AI-REML algorithm ...
Iter. logL V(G) V(e)
1 -13043.33 0.00000 0.63638 (1 component(s) constrained)
2 -14287.52 0.00000 0.70855 (1 component(s) constrained)
3 -13624.12 0.00000 0.91135 (1 component(s) constrained)
4 -12850.05 0.00000 0.93494 (1 component(s) constrained)
5 -12825.20 0.00000 0.95211 (1 component(s) constrained)
6 -12812.96 0.00000 0.96440 (1 component(s) constrained)
7 -12807.02 0.00000 0.97308 (1 component(s) constrained)
8 -12804.17 0.00000 0.97914 (1 component(s) constrained)
9 -12802.81 0.00000 0.98336 (1 component(s) constrained)
10 -12802.17 0.00000 0.98627 (1 component(s) constrained)
11 -12801.86 0.00000 0.99262 (1 component(s) constrained)
12 -12801.60 0.00000 0.99266 (1 component(s) constrained)
13 -12801.60 0.00000 0.99266 (1 component(s) constrained)
Log-likelihood ratio converged.

Calculating the logLikelihood for the reduced model ...
(variance component 1 is dropped from the model)
Calculating prior values of variance components by EM-REML ...
Updated prior values: 0.99266
logL: -12801.59520
Running AI-REML algorithm ...
Iter. logL V(e)
1 -12801.60 0.99266
Log-likelihood ratio converged.

Summary result of REML analysis:
Source Variance SE
V(G) 0.000001 0.000464
V(e) 0.992658 0.008755
Vp 0.992659 0.008743
V(G)/Vp 0.000001 0.000468

Sampling variance/covariance of the estimates of variance components:
2.155698e-07 -2.140670e-07
-2.140670e-07 7.664834e-05

Another question, I also want to run fastGWA by following this tutorial: yanglab.westlake.edu.cn/software/gcta/#fastGWA

So, for the full-dense GRM, I used the one above (segment_based_ld_score_stratfied) to generate the sparse GRM. Now the program is running for one day, and the step stays here (using 8 threads):

After matching all the files, 44875 individuals to be included in the analysis.
Estimating the genetic variance (Vg) by fastGWA-REML (grid search)...

Any suggestions for my two questions?

Thanks

anbai
New Member

Posts: 10

SNP-based heritability from imputed genotype data Aug 6, 2023 11:07:25 GMT

Quote

Post by anbai on Aug 6, 2023 11:07:25 GMT

I think the problem may be because of merging the multiple GRMs. I used the code to do this:

for quan in {1..4}; do
cat ${output_dir}/segment_based_ld_score_stratfied_snpquantile_${quan}.grm.id > ${output_dir}/segment_based_ld_score_stratfied.grm.id
cat ${output_dir}/segment_based_ld_score_stratfied_snpquantile_${quan}.grm.bin > ${output_dir}/segment_based_ld_score_stratfied.grm.bin
cat ${output_dir}/segment_based_ld_score_stratfied_snpquantile_${quan}.grm.N.bin > ${output_dir}/segment_based_ld_score_stratfied.grm.N.bin
done

I confused this with "--make-grm-part". The former split the N SNPs into four parts by quantiles, but the latter split by people, right?

I know GCTA has a parameter "-mgrm", but do you have an example code to merge these multiple GRMs?

Thanks

Last Edit: Aug 6, 2023 14:01:06 GMT by anbai

anbai
New Member

Posts: 10

SNP-based heritability from imputed genotype data Aug 6, 2023 14:00:15 GMT

Quote

Post by anbai on Aug 6, 2023 14:00:15 GMT

I reran the h2 estimate by using the -mgrm option, and I got some reasonable result:

Source Variance SE
V(G1) 0.096528 0.023989
V(G2) 0.088521 0.014297
V(G3) 0.055769 0.009791
V(G4) 0.030261 0.005524
V(e) 0.721661 0.024656
Vp 0.992739 0.008822
V(G1)/Vp 0.097234 0.024120
V(G2)/Vp 0.089168 0.014335
V(G3)/Vp 0.056177 0.009825
V(G4)/Vp 0.030482 0.005541

Sum of V(G)/Vp 0.273061 0.024641
logL -12692.462
logL0 -12700.705
LRT 16.487
df 1
Pval 2.4490e-05
n 25784

Does it make sense to simplily add the four V(G_1-4)/Vp from the four GRMs for the finally V(G)/GP???

Compared to a previous study using the same population, my h2 estimate is higher than that reported in that previous study. 0.27+-0.02 vs. 0.19+-0.02.

Thanks for your feedback

Last Edit: Aug 6, 2023 14:00:52 GMT by anbai

anbai
New Member

Posts: 10

SNP-based heritability from imputed genotype data Aug 6, 2023 19:40:28 GMT

Quote

Post by anbai on Aug 6, 2023 19:40:28 GMT

Updates for the fastGWA results. Interestingly, I used the GRM from the first quantile (segment_based_ld_score_stratfied_quantile_1.*) to make it sparse (--sparse-cutoff 0.05). Then I fit this sparse GRM into fastGWA command. The results make sense - similar to the Plink linear regression model's results, but with more significant P-values!

Can you explain more about this? Is it solid to estimate the GRM (or the sparse GRM) by using only a subset of the SNPs (i.e., the first quantile of the SNPs based on the LD scores)?

Thanks