Post by moldach on Jan 9, 2019 20:12:58 GMT
I'm running into this apparently common error and trying to understand/troubleshoot it.
I ran GREML first on two cohorts from a GWAS (caucasians and africans) separately but ran into issues of high SE (and p-value) because of the low sample size (this was done because a post-doc who requested the analysis wanted to compare these against LDHUB which calculates SNP heritability based on GWAS summary statistics for a specific ethnic group). I also wanted to combine these two and run GREML with ethnicity as a covariate.
It's my understanding that if the variance-covariance V is negative definite this could be cause by a low sample number; however, this shouldn't be the case for positive definite. Furthermore, "for unrelated individuals and common SNPs, you will need at least 3160 unrelated samples to get a SE down to 0.1 (see Visscher et al. 2014 PLoS Genet). Once I combine Caucasian and Africans groups (2061 + 1131 = 3192) I should have sufficient power?
The advice here (gcta.freeforums.net/thread/191/gcta-reml-error) suggests that there may be something wrong with the GRMs and to check for any missing values in the GRM(s) with the following R function:
So in GCTA::GREML Step #3 we created multiple GRMs (e.g.):
So I check for missing values in all the GRMs like this (e.g. for GRM 1):
and the --reml-bendV option (gcta.freeforums.net/thread/243/variance-covaraince-matrix-positive-definite) which seems to run (doesn't given any errors) but only produces the .log file and not the .hsq (results)
I ran GREML first on two cohorts from a GWAS (caucasians and africans) separately but ran into issues of high SE (and p-value) because of the low sample size (this was done because a post-doc who requested the analysis wanted to compare these against LDHUB which calculates SNP heritability based on GWAS summary statistics for a specific ethnic group). I also wanted to combine these two and run GREML with ethnicity as a covariate.
I combined the binary files (.bed, .bim, .fam) of the two cohorts together with plink merge before using GCTA::GREML.
plink --bfile AA --bmerge EA --make-bed --out merged
plink --bfile EA --exclude merged.missnp --make-bed --out EA_tmp
plink --bfile AA --exclude merged.missnp --make-bed --out AA_tmp
plink --bfile EA_tmp --bmerge AA_tmp --make-bed --out merged
Next I continued with the commands from the documentation (including Step2 option#1): cnsgenomics.com/software/gcta/#GREMLinWGSorimputeddata until the "REML analysis with multiple GRMs" when I get "Error: the variance-covariance matrix V is not positive definite"
gcta64 --reml --mgrm multi_GRMs.txt --pheno phen.txt --out test
It's my understanding that if the variance-covariance V is negative definite this could be cause by a low sample number; however, this shouldn't be the case for positive definite. Furthermore, "for unrelated individuals and common SNPs, you will need at least 3160 unrelated samples to get a SE down to 0.1 (see Visscher et al. 2014 PLoS Genet). Once I combine Caucasian and Africans groups (2061 + 1131 = 3192) I should have sufficient power?
The advice here (gcta.freeforums.net/thread/191/gcta-reml-error) suggests that there may be something wrong with the GRMs and to check for any missing values in the GRM(s) with the following R function:
ReadGRMBin=function(prefix, AllN=F, size=4){
sum_i=function(i){
return((1+i)*i/2)
}
BinFileName=paste(prefix,".grm.bin",sep="")
NFileName=paste(prefix,".grm.N.bin",sep="")
IDFileName=paste(prefix,".grm.id",sep="")
id = read.table(IDFileName)
n=dim(id)[1]
BinFile=file(BinFileName, "rb");
grm=readBin(BinFile, n=n*(n+1)/2, what=numeric(0), size=size)
NFile=file(NFileName, "rb");
if(AllN==T){
N=readBin(NFile, n=n*(n+1)/2, what=numeric(0), size=size)
}
else N=readBin(NFile, n=1, what=numeric(0), size=size)
i=sapply(1:n, sum_i)
return(list(diag=grm[i], off=grm[-i], id=id, N=N))
}
So in GCTA::GREML Step #3 we created multiple GRMs (e.g.):
gcta64 --bfile merged --extract snp_group1.txt --make-grm --out merged_group1
So I check for missing values in all the GRMs like this (e.g. for GRM 1):
grmb1 <- ReadGRMBin("merged_group1")
anyNA(grmb1$diag)
[1] FALSE
I assume I'm checking these GRMs correctly and since I get FALSE for each of the four GRMs these should be okay, right?
I tried two other suggestions as well. The --reml-no-constrain option (gcta.freeforums.net/thread/190/variance-covaraince-matrix-error), which gives the same error:
gcta64 --reml-no-constrain --mgrm multi_GRMs.txt --pheno merged.phen --out merged
and the --reml-bendV option (gcta.freeforums.net/thread/243/variance-covaraince-matrix-positive-definite) which seems to run (doesn't given any errors) but only produces the .log file and not the .hsq (results)
gcta64 --reml-bendV --mgrm multi_GRMs.txt --pheno merged.phen --out merged
I'm currently trying to get around the problem using the Fisher scoring approach at this step:
Any feedback is greatly appreciated!
gcta64 --bfile merged --extract snp_group1.txt --make-grm-alg 1 --out alg1_group1
Any feedback is greatly appreciated!