Post by eoind on Dec 15, 2015 16:56:18 GMT
Hi all,
I want to do an association analysis between 3,000 samples (divided into 60 dog breeds) and 97,000 SNPs. the trait of interest (a percentage) is quantative. Specifically, I want to ensure that I account for population stratification to prevent any SNP-trait associations due to breed differences.
My data set is standard PLINK formatted, the species is dog, and the phenotype file looks like this:
FAMID INDID Percent
PFZ1E02 PFZ1E02 0.0190609818983
PFZ1C03 PFZ1C03 0.0316493529134
PFZ1F03 PFZ1F03 0.0316493529134
PFZ1G03 PFZ1G03 0.0316493529134
I ran this set of commands:
1. Make a genetic relationship matrix, accounting for the fact that I'm using dog breeds:
./gcta_1/gcta64 --bfile Data --make-grm --autosome-num 38 --autosome
2. Calculate the first 20 principal component axes:
./gcta_1/gcta64 --grm gcta.grm --pca 20 --out eigen
3. Include the principal component analyses into the association analysis, to identify association between trait and SNPs, accounting for breed differences.
/gcta_1/gcta64 --mlma --bfile Data --qcovar eigen.eigenvec --grm gcta --pheno pheno.dat2 --out test_gcta
The problem is that in my output, almost all of my SNPs are significant (P < 1e-10).
Sample of output file:
Chr SNP bp A1 A2 Freq b se p
1 BICF2P1383091 212740 A G 0.0208885 0.0972874 0.000891171 0
1 BICF2G630707908 273487 A G 0.161405 0.0190486 0.000217773 0
1 BICF2P41862 390563 A G 0.123784 -0.044304 0.000407207 0
1 BICF2G630707932 420036 G A 0.465413 -0.0385957 0.000249306 0
....I've clearly done something wrong, as I do not think it is possible for almost all of my 97,000 SNPs to have very very low p values for association with my trait, after accounting for population stratification.
Could someone please tell me the correct command that I should have used? The command should: Read in a bed/bim file, and a phenotype file (formatted as in above example), correct for any potential population stratification and conduct an analysis between my quantative trait and set of SNPs.
Thanks.
I want to do an association analysis between 3,000 samples (divided into 60 dog breeds) and 97,000 SNPs. the trait of interest (a percentage) is quantative. Specifically, I want to ensure that I account for population stratification to prevent any SNP-trait associations due to breed differences.
My data set is standard PLINK formatted, the species is dog, and the phenotype file looks like this:
FAMID INDID Percent
PFZ1E02 PFZ1E02 0.0190609818983
PFZ1C03 PFZ1C03 0.0316493529134
PFZ1F03 PFZ1F03 0.0316493529134
PFZ1G03 PFZ1G03 0.0316493529134
I ran this set of commands:
1. Make a genetic relationship matrix, accounting for the fact that I'm using dog breeds:
./gcta_1/gcta64 --bfile Data --make-grm --autosome-num 38 --autosome
2. Calculate the first 20 principal component axes:
./gcta_1/gcta64 --grm gcta.grm --pca 20 --out eigen
3. Include the principal component analyses into the association analysis, to identify association between trait and SNPs, accounting for breed differences.
/gcta_1/gcta64 --mlma --bfile Data --qcovar eigen.eigenvec --grm gcta --pheno pheno.dat2 --out test_gcta
The problem is that in my output, almost all of my SNPs are significant (P < 1e-10).
Sample of output file:
Chr SNP bp A1 A2 Freq b se p
1 BICF2P1383091 212740 A G 0.0208885 0.0972874 0.000891171 0
1 BICF2G630707908 273487 A G 0.161405 0.0190486 0.000217773 0
1 BICF2P41862 390563 A G 0.123784 -0.044304 0.000407207 0
1 BICF2G630707932 420036 G A 0.465413 -0.0385957 0.000249306 0
....I've clearly done something wrong, as I do not think it is possible for almost all of my 97,000 SNPs to have very very low p values for association with my trait, after accounting for population stratification.
Could someone please tell me the correct command that I should have used? The command should: Read in a bed/bim file, and a phenotype file (formatted as in above example), correct for any potential population stratification and conduct an analysis between my quantative trait and set of SNPs.
Thanks.