Hi, I am running GCTA MLMA analysis in tandem with GSEA pathway analysis. My dataset has about 1000 cases, 1000 controls and 4.9 million SNPs.
GSEA requires P+1 sets of SNP-level association statistics, one using real patient labels and P with permuted labels. Using these, it computes pathway enrichment by comparing ranked-sum statistic using real labels, with the null distribution from the permuted labels.
I am currently running P+1 calls to GCTA ( --mlma-loco), which computes the GRM 1+P times. Running this on our compute cluster (30Gb RAM, 1 thread) results in most of my jobs being aborted, probably due to out-of-memory errors.
I'm wondering if I can instead compute GRM once, and use the same GRM for the real as well as the label-permuted MLMA analyses (i.e one-time computation of GRM provided as input for all 1+P MLMA analyses?). The GRM depends only on genotypes and not on the case/control status of patients, is that correct?
Separately I am wondering if other pathway enrichment strategies have been tried with GCTA (e.g. MAGMA).