Hi, I want to run mixed model for an association with some relatives. I calculated ibd between pairs in GCTA but when I compared the average IBD with some pairs for whom I already knew the pihat and I do not get similar results at all. For example 2nd degree relatives with a pihat calculated with plink have a pihat of 0.025, while with gcta I got and estimate of 8.66-04. In addition not all individuals that are in the grm.id file are present in the grm.grm file. Last but not least I am getting ibd estimates>1 for individuals with themselves. How can that be? I thought that the estimate form the grm.grm file was also pihat.
This are the commands I used to calculate the GRM:
Reading IDs of the GRM from [RS123chr8.grm.id]. 11496 IDs read from [RS123chr8.grm.id]. Reading the GRM from [RS123chr8.grm.bin]. Reading the number of SNPs for the GRM from [RS123chr8.grm.N.bin]. Pairwise genetic relationships between 11496 individuals are included from [RS123chr8.grm.bin]. Saving the genetic relationship matrix to the file [chr8grm.grm.gz] (in compressed text format). The genetic relationship matrix has been saved in the file [chr8grm.grm.gz] (in compressed text format). IDs for the GRM file [chr8grm.grm.gz] have been saved in the file [chr8grm.grm.id].
1. I think the IBD from plink is different from the grm from gcta, which looks like correlations between each pair of individuals. 2. The grm file stores the grm matrix, of which length is n*(n+1)/2, n - number of individuals, equal to number of lines in id file. For the detailed information, www.complextraitgenomics.com/software/gcta/estimate_grm.html Do you mean length of matrix in grm file is not n*(n+1)/2 ?
GRM from GCTA is different from IBD/IBS from PLINK. GRM can be referred as some sort of IBD from an ancestral population but rescaled using the current population as the base population, whereas the traditional definition of IBD usually use the founders of the pedigree as the base population. Please see this paper for more clarification. www.nature.com/nrg/journal/v11/n11/abs/nrg2865.html
Since GRM is re-scaled referring the current population as the based population, the average relatedness is 0 that’s why you observe negative values.
The diagonal elements of the GRM is 1 + F where F is an estimator of inbreeding coefficient. If an individuals is highly inbred, e.g. parents are relatives, you’d expect to see a large inbreeding coefficient so that 1 + F is larger than 1. However, if there is a population stratification, the ethnic outliers will also appear to be highly inbred.
Thank you for the replies. 1)Still if I use gcta to remove close relatives (>0.025) the program should exclude sibpairs and father/offspring and avunculars right? This is not the case. GCTA does not exclude the pairs that are first degree relatives (I only know that a few of them that are first degree relatives but I need to analyse a bigger dataset and that's why I am using gcta to do this). 2)The total number of pairs is indeed n*(n+1)/2, n. But I do not understand why some individuals that are printed in the id file are not present in any pair in the grm matrix.