I have 1000 Genomes phase 3 imputed data for +/- 40K individuals (10-12M SNPs) in 5MB/500individuals chunks of IMPUTE2 dosage format and would like to use GCTA and make a GRM. Before I start and merge all chunks of imputed data to mach format I would like to know whether it's feasible to run such a calculation. Can you give me an estimate on the required memory and CPU hours?
I have the opportunity to use our HPC and parallelise the calculation using up to > 1000 nodes (15GB memory per node) at a time via a submit host. Are there ways to parallelise the calculation other than using the --thread-num option (i.e. split the 40K individuals in smaller chunks)?