MLMA on imputed data

MLMA on imputed data Sept 11, 2014 21:59:08 GMT

Quote

Post by ST89 on Sept 11, 2014 21:59:08 GMT

I have an dataset with ~10 million genotyped/imputed variants for ~2500 people and was hoping to run MLM analysis on it. My question is, does the GRM have to be estimated from the exact same set of variants as that being tested? Would having such a high number of variants in any way bias or introduce excessive noise into the estimate? And does it make sense to compute the GRM on a pruned set of variants, and use that to test the full set of variants? Also, GCTA crashes when handling this dataset, returning "std::bad_alloc" (possibly because of the size of the dataset?)

Post by ST89 on Sept 11, 2014 21:59:08 GMT

Post by Jian Yang on Sept 12, 2014 9:47:45 GMT