Post by user987 on Jun 4, 2019 2:10:20 GMT
I'm using R to simulate GWAS data, and then using the mixed.solve() function from the "rrBLUP" package to find REML estimates of the variance components and heritability. This is the mixed.solve() function: www.rdocumentation.org/packages/rrBLUP/versions/4.6/topics/mixed.solve
But then I noticed that GCTA has standard errors for the REML estimates of the h^2 and the variance components, and I'd like to get the standard errors to compute confidence intervals. So I'm trying to convert my genotype matrix in R to GRM to input the GRM in GCTA. Is this the correct way to do it?
If Z, an nxp matrix, is a scaled genotype matrix where n is the number of individuals and p is the number of SNPs, then the GRM is grm_data = ZZ' / p.
Then, use the write_GRMBin() function from the "OmicKriging" package to write a binary GRM format recently introduced by GCTA. This is the function: www.rdocumentation.org/packages/OmicKriging/versions/1.4.0/topics/write_GRMBin .
In particular, do
1.) In R, do:
write_GRMBin(grm_data, n.snps=p, prefix="grm_data")
write.table(cbind(1:n, 1:n), file = 'grm_data.grm.id' , quote=F, row.names=F, col.names=F)
2.) Then, in GCTA, do this to get the REML estimates of the variance components, h^2, and their standard errors:
/gcta_1.91.6beta/gcta64 --grm grm_data --pheno pheno_data --reml --reml-est-fix --out gcta_reml_est
Is this the correct way to convert the genotype matrix in R to a usable format for GCTA to get REML estimates and their standard errors?
Also, is the confidence interval the REML estimate +/- Z_{\alpha/2} * standard error, where Z_{\alpha/2} is the z score for percentile?
----
P.S., this is the write_GRMBin() function:
write_GRMBin <- function(X, n.snps = 0.0, prefix, size = 4) {
sum_i <- function(i){
return(sum(1:i))
}
## file connections
NFileName <- paste(prefix,".grm.N.bin",sep="")
IDFileName <- paste(prefix,".grm.id",sep="")
BinFileName <- paste(prefix,".grm.bin",sep="")
## pull sample ids and dimension of GRM
id <- rownames(X)
n <- length(id)
## pull diagonal elements
diag.elem <- diag(X)
## pull upper triangular elements (column major order)
off.diag.elem <- X[upper.tri(X, diag=FALSE)]
## collapse GRM into vector
i <- sapply(1:n, sum_i)
collapsed.grm <- vector(mode="numeric", length = n*(n+1)/2)
collapsed.grm <- diag.elem
collapsed.grm[-i] <- off.diag.elem
## write binary files
BinFile <- file(BinFileName, "wb")
NFile <- file(NFileName, "wb")
writeBin(con = BinFile, collapsed.grm, size = size)
writeBin(con = NFile, rep(n.snps, n*(n+1)/2), size = size )
close(BinFile)
close(NFile)
## write sample ID file -- we are dropping sample family IDs here
write.table(cbind(id, id), file = IDFileName, col.names=FALSE,
row.names=FALSE, quote=FALSE)
}
But then I noticed that GCTA has standard errors for the REML estimates of the h^2 and the variance components, and I'd like to get the standard errors to compute confidence intervals. So I'm trying to convert my genotype matrix in R to GRM to input the GRM in GCTA. Is this the correct way to do it?
If Z, an nxp matrix, is a scaled genotype matrix where n is the number of individuals and p is the number of SNPs, then the GRM is grm_data = ZZ' / p.
Then, use the write_GRMBin() function from the "OmicKriging" package to write a binary GRM format recently introduced by GCTA. This is the function: www.rdocumentation.org/packages/OmicKriging/versions/1.4.0/topics/write_GRMBin .
In particular, do
1.) In R, do:
write_GRMBin(grm_data, n.snps=p, prefix="grm_data")
write.table(cbind(1:n, 1:n), file = 'grm_data.grm.id' , quote=F, row.names=F, col.names=F)
2.) Then, in GCTA, do this to get the REML estimates of the variance components, h^2, and their standard errors:
/gcta_1.91.6beta/gcta64 --grm grm_data --pheno pheno_data --reml --reml-est-fix --out gcta_reml_est
Is this the correct way to convert the genotype matrix in R to a usable format for GCTA to get REML estimates and their standard errors?
Also, is the confidence interval the REML estimate +/- Z_{\alpha/2} * standard error, where Z_{\alpha/2} is the z score for percentile?
----
P.S., this is the write_GRMBin() function:
write_GRMBin <- function(X, n.snps = 0.0, prefix, size = 4) {
sum_i <- function(i){
return(sum(1:i))
}
## file connections
NFileName <- paste(prefix,".grm.N.bin",sep="")
IDFileName <- paste(prefix,".grm.id",sep="")
BinFileName <- paste(prefix,".grm.bin",sep="")
## pull sample ids and dimension of GRM
id <- rownames(X)
n <- length(id)
## pull diagonal elements
diag.elem <- diag(X)
## pull upper triangular elements (column major order)
off.diag.elem <- X[upper.tri(X, diag=FALSE)]
## collapse GRM into vector
i <- sapply(1:n, sum_i)
collapsed.grm <- vector(mode="numeric", length = n*(n+1)/2)
collapsed.grm <- diag.elem
collapsed.grm[-i] <- off.diag.elem
## write binary files
BinFile <- file(BinFileName, "wb")
NFile <- file(NFileName, "wb")
writeBin(con = BinFile, collapsed.grm, size = size)
writeBin(con = NFile, rep(n.snps, n*(n+1)/2), size = size )
close(BinFile)
close(NFile)
## write sample ID file -- we are dropping sample family IDs here
write.table(cbind(id, id), file = IDFileName, col.names=FALSE,
row.names=FALSE, quote=FALSE)
}