I'm calculating a GRM on 40K samples imputed up to 1KG phase 1. I split the data up in chromosomes, but the required memory for the larger chromosomes is still > 160GB of RAM, which is more than than our cluster will let me use for > 2 hours. I could try to split chromosome 1-8 up in smaller pieces, but I wonder how this would affect the accuracy of the GRM once they are merged again. Does anyone have experience in merging multiple GRMs in to one big GRM and know how this affects h2 estimates? Thanks,
PLINK 1.9 has a memory-efficient --make-grm-bin implementation which is compatible with GCTA. For 40k samples, 32 GB RAM should be sufficient, and there should be no need to do any chromosome splitting. You can combine --make-grm-bin with --parallel to get the job done with even less memory, if necessary.