Post by ajschork on Dec 12, 2013 19:59:32 GMT
Hi,
Thank you for providing this software, it is really a helpful tool. I have been using the simulation functions to generate quantitative phenotypes from real genotyping data using predefined markers and per allele effect sizes ("--simu-qt --simu-causal-loci" functions). I have come across an inconsistency in the program/documentation.
In the documentation the function states the phenotype is generated according to this model:
1) yj = sum(xij*bi) + ej
xij = number of reference alleles for the i-th causal variant of the j-th individual
bj = allelic effect of the i-th causal variant
ej = the residual effect generated from a normal distribution with mean of 0 and variance of va(sum(xij*bi))(1 / h2 - 1)
It appears that the program actually computes a phenotype according to standardized genotype counts such that
2) yj = sum(zij*bi) + ej
zij = ( xij-mean(xij) ) / sd(xij)
all other terms the same.
the results files (.phen / .par) are then inconsistent, because, the .phen file contains the phenotype as generated from the second model, but the meta-data in the .par file suggests that model 1 was used. This model change has the effect of changing the per allele effects, scaling up the effects of rare alleles relative to more common alleles, with respect to their contributions to the simulated phenotype.
As an example, I ran the following test:
I selected 1 SNP to be causal, assigned it an effect size of 1, and set the desired heritability to 1.
The phenotype (.phen) should just be generated as the genotype count (0,1,2) but came out as the standardized genotype count (zij instead of xii). The corresponding .par file still reported the per allele effect as 1, and Qsq consistent with model 1.
I am not sure which is the intended approach.
Thanks again for providing this tool,
-A.
Thank you for providing this software, it is really a helpful tool. I have been using the simulation functions to generate quantitative phenotypes from real genotyping data using predefined markers and per allele effect sizes ("--simu-qt --simu-causal-loci" functions). I have come across an inconsistency in the program/documentation.
In the documentation the function states the phenotype is generated according to this model:
1) yj = sum(xij*bi) + ej
xij = number of reference alleles for the i-th causal variant of the j-th individual
bj = allelic effect of the i-th causal variant
ej = the residual effect generated from a normal distribution with mean of 0 and variance of va(sum(xij*bi))(1 / h2 - 1)
It appears that the program actually computes a phenotype according to standardized genotype counts such that
2) yj = sum(zij*bi) + ej
zij = ( xij-mean(xij) ) / sd(xij)
all other terms the same.
the results files (.phen / .par) are then inconsistent, because, the .phen file contains the phenotype as generated from the second model, but the meta-data in the .par file suggests that model 1 was used. This model change has the effect of changing the per allele effects, scaling up the effects of rare alleles relative to more common alleles, with respect to their contributions to the simulated phenotype.
As an example, I ran the following test:
I selected 1 SNP to be causal, assigned it an effect size of 1, and set the desired heritability to 1.
The phenotype (.phen) should just be generated as the genotype count (0,1,2) but came out as the standardized genotype count (zij instead of xii). The corresponding .par file still reported the per allele effect as 1, and Qsq consistent with model 1.
I am not sure which is the intended approach.
Thanks again for providing this tool,
-A.