How much memory do I need to run a GREML analysis?

How much memory do I need to run a GREML analysis? Sept 21, 2015 1:22:32 GMT

Post by Jian Yang on Sept 21, 2015 1:22:32 GMT

1. Making a GRM

This process involves genotype data in PLNK format, a SNP genotype matrix, a GRM, and a n x n matrix of the number of SNPs used for GRM calculation.

Size of the n x m genotype matrix in PLINK binary format (2 bits per genotype) = m * n / 4

Size of GRM in double precision float = n * n * 8 bytes

n x n matrix for the number of SNPs used to calculate GRM in single precision = n * n * 4 bytes

Size of SNP genotype matrix in single precision float = m * n * 4 bytes, where m is the number of SNPs

Total memory usage ~= m * n / 4 + m * n * 4 + n * n * 8 + n * n * 4 = (4.25 * m + 12 * n) * n bytes

This is usually very large for 1000G imputed data in particular. I would recommend running the the analysis per chromosome and then merging the GRMs.

2. REML analysis

The REML process is a bit complicated. It involves a number of n x n matrices, e.g. GRM, variance-covariance V matrix, the projection P matrix and temporary matrices for V inverse calculation.

Total memory usage ~= (t + 4) * n * n * 8 bytes, where t is the number of genetic components (i.e. the number of GRMs) fitted in the model.

Note that these calculations haven't taken into account vectors and the other matrices of smaller size. Therefore, to submit a job to a computer cluster I would request 20% more memory than the predicted amount.