|
Post by aquaqua on Apr 23, 2018 19:30:01 GMT
Hello,
I want to use GCTA to generate a grm for UKBiobank data. I generated 200 parts for each chromosome, so in total I got 200*22 grm parts.
Then I combine grms using following order: First, I combine grms for all chromosomes, for example, I combine all grm-part 1 from chr1-chr22 use --mgrm. And I end up with 200 grm parts. Second, I combine those 200 grm parts using the codes provided in the Manu:
cat test.part_200_*.grm.id > test.grm.id cat test.part_200_*.grm.bin > test.grm.bin cat test.part_200_*.grm.N.bin > test.grm.N.bin
Everything works until here. Thanks for adding this useful new feature to generate grm by parts!
Problem appears from here. My final goal is to select all family clusters using --grm-singleton 0.4. However, it gives me error message that The GRM id and GRM binary is not matching [test]
The code I use is /gcta64 --grm test --grm-singleton 0.4 --out testing
Above code works if I try with each single part instead of the whole combined grm.
I would really appreciate if anyone can help me with this.
Thanks a lot, Bowen
|
|
|
Post by zhilizheng on May 4, 2018 2:02:08 GMT
Hi Bowen,
Can you give the file size of these 3 files? wc -l test.grm.id ls -l test.grm.bin ls -l test.grm.N.bin
It seems you might do a wrong steps.
For large data, we can do in this way, without the --mgrm to merge (update to latest 1.91.4) # assume you genotype file is geno_chr1 ...22 for index in `seq 1 22` do echo geno_chr$index >> beds.txt done
gcta64 --mbfile beds.txt --make-grm-part 200 1 --out test --threads 5 ... #nohup or submit in a job array 1 to 200 gcta64 --mbfile beds.txt --make-grm-part 200 200 --out test --threads 5 # make them run parallel
Zhili
|
|