|
Post by tallnutt on May 27, 2015 6:23:32 GMT
Hi,
I apologise if my question is a bit simplistic. I have a very basic problem that I thought GCTA could address but I cannot find anyhting clearly written in the PLINK or GCTA manual to address it.
I have a set of 127000 from 44 bacterial strains, in .vcf format. I also have growth measurements from 10 different treatments. Can GCTA determine which SNPs have the highest correlation with each trait, also interactions between traits, i.e. whether treatments correlate with the same SNPs.. and finally interaction between SNPs?
I apologise again for my ignorance, but I am new to SNP analysis and want to analyse my data as thoroughly as possible but I am not familiar with PLINK (on which GCTA seems to be dependent).
Thanks for your help,
Theo
|
|
|
Post by Zhihong Zhu on May 28, 2015 0:07:23 GMT
Hi Theo, If my understanding is correct, what you are going to do is running association analysis between measurements and bacterial SNPs. There are two ways to achieve that, 1. Running the regression analysis in R one SNP by one SNP, y = x + covariates + e, y - measurements, x - SNP genotypes, covariates - treatments or other factors you want to fit in the model. 2. by PLINK2. There is a option, "--linear" or "--logistic" which could do the same stuff. Please check the description at pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#glm. Before doing that, you might need to transform your data file from .vcf format to plink file format. Cheers, Zhihong
|
|