Running GREML in UKBiobank

jonic
New Member

Posts: 7

Running GREML in UKBiobank Sept 19, 2017 12:15:42 GMT

Quote

Post by jonic on Sept 19, 2017 12:15:42 GMT

Hi,

We are trying to run GREML analyses using the full UKBiobank data.

We have successfully made the GRM for our purposes (300K+) using --make-grm-part, but when we try to run REML (using either the combined GRM or the parts) we are getting seg faults. We are using 380Gb of RAM - is this not enough RAM to load a 300K GRM? Is there anything we can do to reduce the memory usage?

Command line:

gcta_1.90.0beta/gcta64 \

--reml \

--grm [path to 300K GRM] \

--pheno [path to pheno] \

--covar [path to covar] \

--qcovar [path to qcovar] \

--prevalence 0.283 \

--out [path to output] \

--thread-num 20 \

--keep [path to list of IDs with data]

Output:

Note: the program will be running on 20 threads.



Reading IDs of the GRM from [path to full GRM].

385768 IDs read from [path to full .grm.id].

terminate called after throwing an instance of 'std::bad_alloc'

  what():  std::bad_alloc

Aborted (core dumped)

stephwen
New Member

Posts: 1

Running GREML in UKBiobank Nov 1, 2017 14:08:11 GMT

Quote

Post by stephwen on Nov 1, 2017 14:08:11 GMT

I was having similar problems (same error message: core dump with std::bad_alloc) when using GCTA with the --make-grm-gz option.
Basically what I did was try out different combinations of

memory allocation in the job scheduler I'm using
--thread-num

I got it working by lowering the --thread-num value to 2.

Hope this helps anyone stumbling on this thread through a google search.

Last Edit: Nov 1, 2017 14:09:03 GMT by stephwen

zhilizheng
New Member

Posts: 13

Running GREML in UKBiobank Nov 10, 2017 0:52:25 GMT

Quote

Post by zhilizheng on Nov 10, 2017 0:52:25 GMT

The problem will be fixed in GCTA version 1.91.1 which will release in a few days. We will have full test on other kernel version of Linux. We have confirmed that the new GCC compiler caused this strange problem. The pthread package which used by openmp in GCTA result in the memory leakage in every thread. Thus we can observe that the more thread to use, the faster the problem will be recurrence. Huge data also get faster to run into the problem.

We have fixed the problem by change the link manner of the underlying package. The new version will use a little less memory and much faster in calculation speed.

zhilizheng New Member Posts: 13	Running GREML in UKBiobank Nov 29, 2017 3:26:27 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by zhilizheng on Nov 29, 2017 3:26:27 GMT This has been released. Now it shall solve this problem. Comment if there are still some problems.

vw260
New Member

Posts: 1

Running GREML in UKBiobank Jan 21, 2018 6:22:41 GMT kh likes this

Quote

Post by vw260 on Jan 21, 2018 6:22:41 GMT

Hi,

I seem to be running into the same issue. I've updated it to GCTA v 1.91.1 (beta). I'm using data from the UK Biobank (N ~ 150,000). GRMs were calculated individually by chromosome and merged. GRMs for chr 5 - 22 were calculated using the older version of GCTA, and for chromosomes 1 - 5 were calculated using GCTA 1.91.1. The GRMs were merged across all the chromosomes using GCTA v 1.91.1 to generate a genome-wide GRM.

Now, I'm trying to run GREML using the following commands (in GCTA v 1.91.1).

--reml \
--grm [path to merged GRM]
--pheno [path to phenotype]
--qcovar [path to qcovar file]
--covar [path to covar file]
--out [path to results file]
--thread-num 10

However, I'm getting the following error message:

"41 quantitative variable(s) included as covariate(s).
2 discrete variable(s) included as covariate(s).
134942 individuals are in common in these files.

Performing REML analysis ... (Note: may take hours depending on sample size).
134942 observations, 148 fixed effect(s), and 2 variance component(s)(including residual variance).
Calculating prior values of variance components by EM-REML ...
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)"

Any suggestions?

Edit: I've tried it with higher number of threads (30), and while it progresses for a bit longer, I still get the same message (Aborted (core dumped)) in the EM-REML stage. Any idea on how much memory this requires and if I can reduce the memory load in any other way? Equally, is there any way I can meta-analyse the heritability estimates after subsetting it into discrete, managable chunks of say 30K and running GREML on these distinct chunks?

Thanks!

Last Edit: Jan 21, 2018 8:43:44 GMT by vw260

ruth
New Member

Posts: 1

Running GREML in UKBiobank Mar 4, 2018 22:07:12 GMT

Quote

Post by ruth on Mar 4, 2018 22:07:12 GMT

Hi,
We are also trying to run various GCTA commands on the UK Biobank data. We are in the process of creating GRMs for each of the chromosomes. However we could not create these as whole chromosomes, as we also got the std::bad_alloc error. We are now using the --make-grm-part to make each of the chromosomes in 20 parts.
We have a couple of chromosomes through this process and have merged the parts in to whole chromosome binary GRMs. To test if we could run the next phase of our analysis, we tried running --reml-pred-rand on one chromosome GRM. However, again we were getting the following error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)

We are running GCTA version 1.91.2beta.
My questions are: Is there a solution to running these large GRMs analyses? Is it statistically sound to run --reml-pred-rand on parts of a chromosome? Computationally it seems to run the --reml-pred-rand over the 20 parts of the chromosome using the --mgrm-bin command, but I do not understand the background statistics to determine if this is an ok thing to do.

Cheers

jxs1996 New Member Posts: 1	Running GREML in UKBiobank Jun 9, 2021 9:32:38 GMT jxs1996 likes this Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by jxs1996 on Jun 9, 2021 9:32:38 GMT the same problem with me!!! that's cool

Post by jonic on Sept 19, 2017 12:15:42 GMT

Post by stephwen on Nov 1, 2017 14:08:11 GMT

Post by zhilizheng on Nov 10, 2017 0:52:25 GMT

Post by zhilizheng on Nov 29, 2017 3:26:27 GMT

Post by vw260 on Jan 21, 2018 6:22:41 GMT

Post by ruth on Mar 4, 2018 22:07:12 GMT

Post by jxs1996 on Jun 9, 2021 9:32:38 GMT