### Post by elansary on Jan 6, 2016 15:07:29 GMT

Dear,

I have a huge data set of a certain disease (the samples number are around 50,000 and the ration of controls to cases is 3:1 "more controls"). I am going to create a GBLUP model based on your paper "http://www.sciencedirect.com/science/article/pii/S0002929710005987". I am going to divided my data set into training set and test set and Finally I have a new data set which is different from the training set and test set which is my main goal to predict the genetic liability of the binary disease on it. the steps that I used was as follows:

My data set is a mach dosage compressed files

1) calculate the GRM matrix based on all the samples

2) Manipulation of the GRM matrix by just keeping the training set individuals

3) you the training set GRM matrix to call the reml model and calculate the BLUP solutions of the individuals

4) call the blup-snp function to get the SNP effect values as mention in your paper this will give me the u = (W' A-1 g )/ N

I wanted to double check all my work by just comparing the breeding values in the *.ind.blp file (the column of the total genetic value) against the breeding value that I would be able to calculate by using SNP effects generated in step 4 above (g_training set = W * u ) I also tried g_training set = X * u But both ways produce breeding values which are very very different and not even in the -1,1 range of the liability like the ones in the *.ind.blp file.

Could you please tell me how to calculate the liability or breeding value for the training set and also how to calculate liability or breeding value for a new cohort.

The W I calculate based on your paper as "W is a standardized genotype matrix with the ij wij = (xij - 2pi)/sqrt(2pi * (1-pi)), where xij is the number of copies of the reference allele for the ith SNP of the jth individual and pi is the frequency of the reference allele"

Last question you mention in the manual to use plink --score to get new liabilities but plink will just do the multiplication of X * u but in the original paper you mention that the new liability of the new cohort will be W * u so how to tell plink to convert the X to the W as My data is dosage data and once I use the --dosage option the data must be in the range of 0 . 2 but the W can have different values that the dosage count range so how to make plink do the prediction and does --blup-snp produce u = (W' A-1 g )/ N or produce this (open the link ars.els-cdn.com/content/image/1-s2.0-S0002929710005987-si35.gif).

Thank you so much for all the great work on my huge software and I hope you would answer me quickly thanks again

Mahmoud Elansary

Unit of Animal Genomics

GIGA-R & Faculty of Veterinary Medicine

University of Liège (B34)

1 Avenue de l'Hôpital

4000-Liège (Sart Tilman)

Belgium

Tel: +32-(0)4-366.41.57

Fax: +32-(0)4-366.41.98

mahmoud.elansary@ulg.ac.be

"Do you really need to print this mail? Think green!"

I have a huge data set of a certain disease (the samples number are around 50,000 and the ration of controls to cases is 3:1 "more controls"). I am going to create a GBLUP model based on your paper "http://www.sciencedirect.com/science/article/pii/S0002929710005987". I am going to divided my data set into training set and test set and Finally I have a new data set which is different from the training set and test set which is my main goal to predict the genetic liability of the binary disease on it. the steps that I used was as follows:

My data set is a mach dosage compressed files

1) calculate the GRM matrix based on all the samples

2) Manipulation of the GRM matrix by just keeping the training set individuals

3) you the training set GRM matrix to call the reml model and calculate the BLUP solutions of the individuals

4) call the blup-snp function to get the SNP effect values as mention in your paper this will give me the u = (W' A-1 g )/ N

I wanted to double check all my work by just comparing the breeding values in the *.ind.blp file (the column of the total genetic value) against the breeding value that I would be able to calculate by using SNP effects generated in step 4 above (g_training set = W * u ) I also tried g_training set = X * u But both ways produce breeding values which are very very different and not even in the -1,1 range of the liability like the ones in the *.ind.blp file.

Could you please tell me how to calculate the liability or breeding value for the training set and also how to calculate liability or breeding value for a new cohort.

The W I calculate based on your paper as "W is a standardized genotype matrix with the ij wij = (xij - 2pi)/sqrt(2pi * (1-pi)), where xij is the number of copies of the reference allele for the ith SNP of the jth individual and pi is the frequency of the reference allele"

Last question you mention in the manual to use plink --score to get new liabilities but plink will just do the multiplication of X * u but in the original paper you mention that the new liability of the new cohort will be W * u so how to tell plink to convert the X to the W as My data is dosage data and once I use the --dosage option the data must be in the range of 0 . 2 but the W can have different values that the dosage count range so how to make plink do the prediction and does --blup-snp produce u = (W' A-1 g )/ N or produce this (open the link ars.els-cdn.com/content/image/1-s2.0-S0002929710005987-si35.gif).

Thank you so much for all the great work on my huge software and I hope you would answer me quickly thanks again

Mahmoud Elansary

Unit of Animal Genomics

GIGA-R & Faculty of Veterinary Medicine

University of Liège (B34)

1 Avenue de l'Hôpital

4000-Liège (Sart Tilman)

Belgium

Tel: +32-(0)4-366.41.57

Fax: +32-(0)4-366.41.98

mahmoud.elansary@ulg.ac.be

"Do you really need to print this mail? Think green!"