sample size limit in GCTA?

mckeller
New Member

Posts: 3

sample size limit in GCTA? Apr 14, 2016 22:54:59 GMT

Quote

Post by mckeller on Apr 14, 2016 22:54:59 GMT

Hi Jian,

We're trying to run some interaction analyses in the UK Biobank but it is failing with ~60K samples. We have plenty of RAM to do this (1TB). I think there are some vector size limits in C++ - maybe that's the issue? What is the max sample size we can run GCTA with? We could use BOLT-REML but it does not have the ability to estimate interaction effects. For now, we're splitting the sample into smaller subsamples of 40K. But even that large of a sample size leads to large SE's for some of the analyses we'd like to perform.

Matt

Last Edit: Apr 14, 2016 22:55:23 GMT by mckeller

Jian Yang Administrator Posts: 362	sample size limit in GCTA? Apr 15, 2016 7:17:13 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Jian Yang on Apr 15, 2016 7:17:13 GMT We have applied a 3-component GREML analysis to the UKB data. It works fine. What kind of error message did you get?

agwills
New Member

Posts: 2

sample size limit in GCTA? Apr 15, 2016 22:48:00 GMT

Quote

Post by agwills on Apr 15, 2016 22:48:00 GMT

Hi Jian,

I am working with Matt on this issue, and it seems like this is a memory allocation problem on our side of things. We have hopefully fixed this (as GCTA appears to now be running smoothly), but will let you know if we do get any errors or problems.

Thank you,
Amanda

agwills
New Member

Posts: 2

sample size limit in GCTA? May 5, 2016 23:01:16 GMT

Quote

Post by agwills on May 5, 2016 23:01:16 GMT

Hi Jian,

Univariate analyses have been working fine using 100k+ sample sizes, however, I am getting the following error message when running bivariate reml analysis:

"line 12: 75730 Segmentation fault"

I was able to run the bivariate analysis successfully on about 50k individuals, but the analysis has repeatedly failed when I upped the sample to about 70k individuals. From the log, the program seems to fail right after reporting the number of cases and controls for each trait.

I used the following options:
--grm --reml-bivar 2 3 --keep --pheno --qcovar --covar --reml-bivar-prevalence --thread-num --out

Any help with figuring this out would be much appreciated!

Amanda

ukucam
New Member

Posts: 4

sample size limit in GCTA? Aug 29, 2017 8:09:25 GMT

Quote

Post by ukucam on Aug 29, 2017 8:09:25 GMT

May 5, 2016 23:01:16 GMT agwills said:

Hi Jian,

Univariate analyses have been working fine using 100k+ sample sizes, however, I am getting the following error message when running bivariate reml analysis:

"line 12: 75730 Segmentation fault"

I was able to run the bivariate analysis successfully on about 50k individuals, but the analysis has repeatedly failed when I upped the sample to about 70k individuals. From the log, the program seems to fail right after reporting the number of cases and controls for each trait.

I used the following options:
--grm --reml-bivar 2 3 --keep --pheno --qcovar --covar --reml-bivar-prevalence --thread-num --out

Any help with figuring this out would be much appreciated!

Amanda

Dear Amanda,
Did you manage to find a solution to this problem?

Jian Yang Administrator Posts: 362	sample size limit in GCTA? Sept 1, 2017 1:32:50 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Jian Yang on Sept 1, 2017 1:32:50 GMT The new version should work with n > 100K.

ukucam
New Member

Posts: 4

sample size limit in GCTA? Sept 10, 2017 9:25:58 GMT

Quote

Post by ukucam on Sept 10, 2017 9:25:58 GMT

Sept 1, 2017 1:32:50 GMT Jian Yang said:

The new version should work with n > 100K.

This sounds excellent!
I have been attempting to run the new version with sample size ~40k & ~96k SNPs and the memory requirement seems to be high.
Is there anyway to estimate an upper bound on the memory requirement for a bivariate REML run?

Post by mckeller on Apr 14, 2016 22:54:59 GMT

Post by Jian Yang on Apr 15, 2016 7:17:13 GMT

Post by agwills on Apr 15, 2016 22:48:00 GMT

Post by agwills on May 5, 2016 23:01:16 GMT

Post by ukucam on Aug 29, 2017 8:09:25 GMT

Post by Jian Yang on Sept 1, 2017 1:32:50 GMT

Post by ukucam on Sept 10, 2017 9:25:58 GMT