|
Post by xyyin666666 on Jan 21, 2016 21:37:30 GMT
Hi Jian,
I estimated the grm in ~ 3800 samples case-control GWAS data. After applying the grm cutoff at 0.025 as suggested, it removed almost the majority of the samples (about 3700), and only 40 samples left. There are indeed sample cryptic relatives in my data, but I am quite sure there is that much. And I checked the column of .grm.gz file, and found most of absolute pairwise relationship were less than 0.025, How did the cut off threshold throw most the samples? It is noted that this is a multiethnic data. Does the population structure affect this result?
The code I used is as: gcta64 --bfile test --make-grm-gz --autosome --out test; gcta64 --grm-gz test --grm-cutoff 0.025 --make-grm-gz --out test-cutoff
Thanks,
Xianyong
|
|
|
Post by Jian Yang on Jan 22, 2016 3:09:01 GMT
You might check the distribution of the off-diagonal elements of the GRM. It's likely that the number of SNPs used to calculate GRM is small so that the sampling variation is large.
|
|
|
Post by xyyin666666 on Jan 22, 2016 13:00:28 GMT
Thanks, Jian. Yes, I checked the off-diagonal GRM and most of the relationship coefficient absolute are more than 0.025. But the number of SNPs I used are more than 800,000. And Why the off-diagonal value is different from those in .grm.gz file?
|
|
|
Post by Jian Yang on Jan 24, 2016 2:50:45 GMT
Another possibility is that the data are not well QCed.
|
|
|
Post by xyyin666666 on Jan 24, 2016 14:38:34 GMT
I did the general QC including MAF, call rate per SNP and per sample, and HWE test. The only concern is that this data contains genotype data for multiple ancestries. I am not sure whether I should remove the population outliers and only focus on one specific ethnicity.
The second thought is to only keep samples with relationship threshold < 0.025 by the values in .grm.gz file.
Thanks
|
|
|
Post by Jian Yang on Jan 26, 2016 6:17:29 GMT
That might be a problem. I would apply a threshold of 0.05 to remove relatedness in each ethnic group separately.
|
|
|
Post by xyyin666666 on Jan 27, 2016 12:45:55 GMT
Yes, you are right. I ran the analyses in each individual population data and got it through. I used the PCA analysis to differentiate the ancestry for each individual and then built the Plink binary data for each ancestry, applied the 0.025 threshold in each individual data. But what do you mean by applying 0.025 in each ethnic group separately? Do you mean I can merge the ethnic data together after applying this less stringent threshold?
|
|
|
Post by Jian Yang on Jan 28, 2016 5:31:26 GMT
A threshold of 0.025 is a bit too stringent. I tend to use 0.05 these days. These threshold only applies to GRM estimated from common SNPs (the variance of GRM becomes much larger when rare variants are included).
You can merge the data if you subsequent analysis requires that.
|
|
|
Post by xyyin666666 on Jan 28, 2016 12:35:12 GMT
Thank you. I get it. In the case of that I have pedigree samples in my data (the proportion of pedigree samples is ~ 5%), I am attempting to estimate the broad sense h^2 using the --bK parameter. I am not sure whether this sort of small proportion of family samples would affect the estimation results.
|
|