|
Post by ST89 on Sept 11, 2014 21:59:08 GMT
I have an dataset with ~10 million genotyped/imputed variants for ~2500 people and was hoping to run MLM analysis on it. My question is, does the GRM have to be estimated from the exact same set of variants as that being tested? Would having such a high number of variants in any way bias or introduce excessive noise into the estimate? And does it make sense to compute the GRM on a pruned set of variants, and use that to test the full set of variants? Also, GCTA crashes when handling this dataset, returning "std::bad_alloc" (possibly because of the size of the dataset?)
|
|
|
Post by Jian Yang on Sept 12, 2014 9:47:45 GMT
# MLM based association analysis - If you have already computed the GRM gcta64 --mlma --bfile test --grm test --pheno test.phen --out test --thread-num 10
The GRM can be computed from a subset of SNPs, e.g. all the common SNPs in HapMap3. This is what I would do.
|
|