I am running --cojo-slct on separate chromosomes, and noticed that in my output I have a few independent signals in one locus, but none of them are the SNP which has the lowest p-value in that locus (also it is not in the badsnps file). Isn't the SNP with the lowest p-value picked first and then condition for it, and select the next one, etc.?
This are the commands I am using: gcta64 --bfile file --cojo-slct --cojo-p 5e-8 --maf 0.01 --chr 1 --cojo-file sumstats.txt --out results-cojo
I have run into this exact problem. I think the cause is that OR, BETA, or SE in the sumstats file is not stored with sufficient precision so that for instance 2*pnorm(log(OR)/SE, lower.tail=F) != P. In some sumstats output the SE column for example might be rounded to only 4 digits. If a larger P-value SNP has a slightly larger effect size than the smallest p-value SNP but the standard errors round to the same value, then the larger effect size SNP would be chosen
--cojo-slct seems to be getting the P-value of each SNP by re-calculating it from the b and se columns rather than taking it from the p column. This can be checked by looking at the P column in the cojo.jma and cojo.cma files: they do not match the inputted P-values but they do equal 2*pnorm(b/se, lower.tail=F)
The solution I've found is to calculate a more exact value for the se column in the input file based on the effect size and original p-value, e.g. se=log(OR)/qnorm(P/2, lower.tail=F).
Last Edit: Nov 12, 2021 15:13:58 GMT by waddington