QC for case-control study

rr
New Member

Posts: 3

QC for case-control study Jul 28, 2016 7:35:27 GMT

Quote

Post by rr on Jul 28, 2016 7:35:27 GMT

Hi!

I am currently trying to reproduce the results in the Lee et al (2011) paper using the WTCCC-CD data (5k individuals, 500k SNPs).
Starting with the QC, I did the following (as in the paper) using the following commands:

exclude SNPs with MAFs < 0.01: --maf 0.01
exclude SNPs with missing rates > 0.05: --geno 0.05
exclude SNPs whose p-values were < 0.05 for the H-W test: --hwe 0.05
exclude individuals with missing rates > 0.01: --mind 0.01
exclude pairs with an estimated relationship of > 0.05: --grm-cutoff 0.05

First, I made a filtered bed file by performing the first 4 options in plink (all individuals were retained, SNP count down to 391335).
Then I used the new bed file for estimating grm.
Finally, I ran reml with the last option.
This resulted to the following:

Source Variance SE
V(G) 0.231562 0.015219
V(e) 0.000000 0.013046
Vp 0.231562 0.005012
V(G)/Vp 0.999999 0.056338
logL 1247.802
logL0 1061.235
LRT 373.135
df 1
Pval 0
n 4677

In the paper, there were 3833 individuals and 322142 SNPs retained.
I'm not really sure whether my approach in QC was correct. Could you please point out if I missed anything?

Thank you very much!

RR

Jian Yang Administrator Posts: 362	QC for case-control study Jul 29, 2016 1:02:43 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Jian Yang on Jul 29, 2016 1:02:43 GMT You might also remove SNPs significant difference in missingness rate between cases and controls.

rr
New Member

Posts: 3

QC for case-control study Aug 2, 2016 7:54:15 GMT

Quote

Post by rr on Aug 2, 2016 7:54:15 GMT

Thank you for the suggestion.
I did as you advised, and instead of using --grm-cutoff, find all pairs with estimated relationship > 0.05 and removed the pair from the list of individuals (Not sure but in the description of --grm-cutoff, it says remove one of a pair of individuals. Does it remove only one individual in the pair? Or removes both individuals?).
I also removed SNPs with CHR = 0 or 23 in the bim file.
In the end, I got 363393 SNPs and 4289 individuals (still different from the paper).
I was thinking, wouldn't it matter if I remove some individuals/SNPs first, then run the stats for missingness or allele frequencies? (i.e., missing rate/MAF might change?)
If that is the case, what kind of approach would you suggest? For example, is it recommended to filter everything at once, like running the options --maf, --hwe, --geno, or --mind at the same time? Or should it be sequential? Like run --maf first, then run --hwe on the output. Then run --geno on the second output, and so on?

Post by rr on Jul 28, 2016 7:35:27 GMT

Post by Jian Yang on Jul 29, 2016 1:02:43 GMT

Post by rr on Aug 2, 2016 7:54:15 GMT