|
Post by user987 on Oct 8, 2018 22:12:45 GMT
I did this:
plink --bfile filename --geno 0.01 --hwe 0.001 --maf 0.05 --autosome --rel-cutoff 0.025 --make-bed --out cleaned
And then got 304,259 variants after.
But the number of variants (i.e., the N) differed when I created a GRM:
./gcta64 --bfile cleaned --make-grm --out cleanedgrm
In particular, when I did this:
# R script to read the GRM binary file
ReadGRMBin=function(prefix, AllN=F, size=4){
sum_i=function(i){
return(sum(1:i))
}
BinFileName=paste(prefix,".grm.bin",sep="")
NFileName=paste(prefix,".grm.N.bin",sep="")
IDFileName=paste(prefix,".grm.id",sep="")
id = read.table(IDFileName)
n=dim(id)[1]
BinFile=file(BinFileName, "rb");
grm=readBin(BinFile, n=n*(n+1)/2, what=numeric(0), size=size)
NFile=file(NFileName, "rb");
if(AllN==T){
N=readBin(NFile, n=n*(n+1)/2, what=numeric(0), size=size)
}
else N=readBin(NFile, n=1, what=numeric(0), size=size)
i=sapply(1:n, sum_i)
return(list(diag=grm, off=grm[-i], id=id, N=N))
}
A = ReadGRMBin(prefix = 'cleanedgrm')
A$N
A$N was 304,172. Isn't it supposed to be the original number of variants, which is 304,259?
|
|
|
Post by zhilizheng on Oct 10, 2018 5:44:11 GMT
Because some samples have NA genotype, which is excluded from the counting.
|
|
|
Post by user987 on Oct 10, 2018 16:36:26 GMT
Because some samples have NA genotype, which is excluded from the counting. Well, if I take out SNPs that have NA in the genotype matrix, then there are a lot fewer variants, like ~26k...
|
|