Post by federico on Oct 11, 2021 9:58:32 GMT
I have some survival data where the outcome is coded as 0 = alive, 1 = dead.
I am trying to understand a general rule for how SNPs are coded and betas are calculated. For instance, I have a SNP where I have alleles G (the most common), and allele A (the minor allele).
If I do a rough mean of survival on genotype I see
AA = 0.6041667
AG = 0.4766889
GG = 0.4230221
(this is a very rough way of looking at it, but individuals with a AA genotype have a higher mortality however I look at it).
If I get the PLINK (PLINK 1.90) binary data, I see that the same individuals have their genotypes coded as 0/1/2 and the mean value for each genotype is
1 = 0.4766889
2 = 0.4230221
that is, AA = 0, AG = 1, GG = 2. PLINK 1.90 seems to code the 0/1/2 genotypes as counts of the most common allele (I am quite sure I read that is the case on PLINK's documentation). The problem I have is that the beta coefficient I get from GCTA is 0.0676 -- and this does not seem to make any sense to me, since the mean value goes down along the genotypes -- It would make sense if GCTA switched the genotype coding around, turning them as counts of the minor allele.
Is that correct (that GCTA codes/orders the genotypes based on the minor allele)? If that is the case the beta makes sense (and to be fair coding the genotypes as counts of the minor allele seems to make more sense). Obviously clarifying this point allows me to understand which allele does what.