grm cutoff move most of the samples

xyyin666666
New Member

Posts: 9

grm cutoff move most of the samples Jan 21, 2016 21:37:30 GMT

Quote

Post by xyyin666666 on Jan 21, 2016 21:37:30 GMT

Hi Jian,

I estimated the grm in ~ 3800 samples case-control GWAS data. After applying the grm cutoff at 0.025 as suggested, it removed almost the majority of the samples (about 3700), and only 40 samples left. There are indeed sample cryptic relatives in my data, but I am quite sure there is that much. And I checked the column of .grm.gz file, and found most of absolute pairwise relationship were less than 0.025, How did the cut off threshold throw most the samples? It is noted that this is a multiethnic data. Does the population structure affect this result?

The code I used is as: gcta64 --bfile test --make-grm-gz --autosome --out test;
gcta64 --grm-gz test --grm-cutoff 0.025 --make-grm-gz --out test-cutoff

Thanks,

Xianyong

Jian Yang Administrator Posts: 362	grm cutoff move most of the samples Jan 22, 2016 3:09:01 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Jian Yang on Jan 22, 2016 3:09:01 GMT You might check the distribution of the off-diagonal elements of the GRM. It's likely that the number of SNPs used to calculate GRM is small so that the sampling variation is large.

xyyin666666
New Member

Posts: 9

grm cutoff move most of the samples Jan 22, 2016 13:00:28 GMT

Quote

Post by xyyin666666 on Jan 22, 2016 13:00:28 GMT

Thanks, Jian. Yes, I checked the off-diagonal GRM and most of the relationship coefficient absolute are more than 0.025. But the number of SNPs I used are more than 800,000. And Why the off-diagonal value is different from those in .grm.gz file?

Jian Yang Administrator Posts: 362	grm cutoff move most of the samples Jan 24, 2016 2:50:45 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Jian Yang on Jan 24, 2016 2:50:45 GMT Another possibility is that the data are not well QCed.

xyyin666666
New Member

Posts: 9

grm cutoff move most of the samples Jan 24, 2016 14:38:34 GMT

Quote

Post by xyyin666666 on Jan 24, 2016 14:38:34 GMT

I did the general QC including MAF, call rate per SNP and per sample, and HWE test. The only concern is that this data contains genotype data for multiple ancestries. I am not sure whether I should remove the population outliers and only focus on one specific ethnicity.

The second thought is to only keep samples with relationship threshold < 0.025 by the values in .grm.gz file.

Thanks

Jian Yang Administrator Posts: 362	grm cutoff move most of the samples Jan 26, 2016 6:17:29 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Jian Yang on Jan 26, 2016 6:17:29 GMT That might be a problem. I would apply a threshold of 0.05 to remove relatedness in each ethnic group separately.

xyyin666666
New Member

Posts: 9

grm cutoff move most of the samples Jan 27, 2016 12:45:55 GMT

Quote

Post by xyyin666666 on Jan 27, 2016 12:45:55 GMT

Yes, you are right. I ran the analyses in each individual population data and got it through.
I used the PCA analysis to differentiate the ancestry for each individual and then built the Plink binary data for each ancestry, applied the 0.025 threshold in each individual data. But what do you mean by applying 0.025 in each ethnic group separately? Do you mean I can merge the ethnic data together after applying this less stringent threshold?

Jian Yang
Administrator

Posts: 362

grm cutoff move most of the samples Jan 28, 2016 5:31:26 GMT

Quote

Post by Jian Yang on Jan 28, 2016 5:31:26 GMT

A threshold of 0.025 is a bit too stringent. I tend to use 0.05 these days. These threshold only applies to GRM estimated from common SNPs (the variance of GRM becomes much larger when rare variants are included).

You can merge the data if you subsequent analysis requires that.

xyyin666666
New Member

Posts: 9

grm cutoff move most of the samples Jan 28, 2016 12:35:12 GMT

Quote

Post by xyyin666666 on Jan 28, 2016 12:35:12 GMT

Thank you. I get it.
In the case of that I have pedigree samples in my data (the proportion of pedigree samples is ~ 5%), I am attempting to estimate the broad sense h^2 using the --bK parameter. I am not sure whether this sort of small proportion of family samples would affect the estimation results.

Post by xyyin666666 on Jan 21, 2016 21:37:30 GMT

Post by Jian Yang on Jan 22, 2016 3:09:01 GMT

Post by xyyin666666 on Jan 22, 2016 13:00:28 GMT

Post by Jian Yang on Jan 24, 2016 2:50:45 GMT

Post by xyyin666666 on Jan 24, 2016 14:38:34 GMT

Post by Jian Yang on Jan 26, 2016 6:17:29 GMT

Post by xyyin666666 on Jan 27, 2016 12:45:55 GMT

Post by Jian Yang on Jan 28, 2016 5:31:26 GMT

Post by xyyin666666 on Jan 28, 2016 12:35:12 GMT