|
Post by Jaden on Apr 30, 2014 17:42:41 GMT
Hi, I have about 3 million imputed genome-wide SNPs and I need to estimate the genetic relationship using GCTA. I think I shouldn't use all the snps. That would be too many. Should I use the snps genotyped with good quality score such as R2>0.95 or something? How many snps I should include in order to get a good estimate of genetic relationship. Also, if I have about 1 million SNPs imputed data, how many memory and how many threads I need to use to maximize the computing efficiency and shorten the computing time while estimating genetic relationship matrix? Thank you for your help. Have a good day.
|
|
|
Post by Zhihong Zhu on May 1, 2014 11:24:45 GMT
Hi,
Yes, the SNPs need to be QCed before doing the analysis. Usually, I just included the Hapmap3 SNPs with MAF > 0.01, p value of HWE > 1e-6, and imputation R^2 > 0.6. The memory used in the computation is also related to the sample size. If the sample size is ~6k, GRM can be generated chromosome by chromosome within 1 hours, using 20/30 threads, ~20*22 G memory (for each chromosome).
Cheers, Zhihong
|
|
|
Post by Jaden on May 1, 2014 16:13:35 GMT
Thanks for your answer. I got an error when I ran make-grm for a data with 8400 samples and about 800000 SNPs.
The error is like this: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
Is this because I did not provide enough ram or thread to run? I requested 10 threads and 40G. It used about 23G to read in the files and then it crashed.
Thanks again.
|
|
|
Post by chrchang on May 2, 2014 11:16:33 GMT
As a temporary measure, you can use PLINK 1.9's --make-grm, which is much more memory-efficient--it should be able to handle 8400 samples x 800k SNPs even when restricted to 2 GB RAM.
|
|
|
Post by Zhihong Zhu on May 2, 2014 13:43:25 GMT
Yes, memory is insufficient. Generating GRM (by GCTA) chromosome by chromosome will be more efficient, and less memory is needed, ~20G for each chromosome. In terms of the whole genome, > 150G memory are required.
Only a little memory is required by PLINK2, But I'm not sure how long it will take to do that. My guess is that GCTA may be running faster than PLINK2, because I think GCTA put every matrix in memory, while PLINK2 may spend some time on I/O.
|
|
|
Post by Jaden on May 2, 2014 17:06:59 GMT
Thanks for the answer. So --make-grm probably is more computationally intensive. But if I used other programs such as PLINK2 to estimate genetic relationship, can it be read into GCTA and used for next step? Thank you
|
|
|
Post by chrchang on May 2, 2014 17:44:12 GMT
Thanks for the answer. So --make-grm probably is more computationally intensive. But if I used other programs such as PLINK2 to estimate genetic relationship, can it be read into GCTA and used for next step? Thank you Yes, the files generated by PLINK2 --make-grm/--make-grm-gz are compatible with GCTA.
|
|
|
Post by Jaden on May 5, 2014 19:57:03 GMT
"Yes, memory is insufficient. Generating GRM (by GCTA) chromosome by chromosome will be more efficient, and less memory is needed, ~20G for each chromosome."-Will this create separate genetic relationship matrix for each chr? how do you combine them at the end for other use then? Thank you.
|
|
|
Post by Jian Yang on May 6, 2014 1:15:52 GMT
You can use --mgrm followed by the --make-grm option to merge the GRMs into a single GRM.
|
|