Population structure in GWAS simulation | Complex Trait Genetics Forum

Population structure in GWAS simulation Jul 20, 2017 12:43:26 GMT

Quote

Post by lshall on Jul 20, 2017 12:43:26 GMT

Hello,

I have been using the GCTA GWAS simulation function to simulate phenotypes using a two disease prevalence estimates (0.10 and 0.25) and four heritability estimates (0.20, 0.40, 0.60, 0.80). I provided a SNP list, but allowed GCTA to assign SNP effect sizes (taken from a normal distribution), e.g.

gcta64 --bfile GCTA_GWAS_sim_input --simu-cc 1742 5225 --simu-causal-loci snplist.txt --simu-hsq 0.8 --simu-k 0.25 --out GCTA_sim_k25_h80

When I run PCA on the simulated phenotype, I am consistently finding that there are significant principal components coming out (the same PCs required at a given prevalence). And when I run GWAS, I consistently get a peak at the HLA.

This seemed quite unusual given that I assumed a) population stratification wouldn't be simulated, therefore I did not expect to see any; b) that the PCs were the same for the 4 simulated traits at a given prevalence.

It is worth noting that the genotype data I am using is from a case-control study of an auto-immune disease. But I feel this should be irrelevant if GCTA is assigning the effect sizes to produce this simulated phenotype. Or am I missing something in that regard?

My questions are:
Does the disease prevalence have any bearing on how SNP effects are assigned by GCTA?
Is there a means of seeing which SNPs are influencing the PC loadings the most?
Are there any suggestions for how to prevent this scenario when using GCTA?

Thanks in advance for any advice.

Lynsey