Post by brentp on Mar 28, 2014 19:46:55 GMT
Hi, I am attempting to use GCTA on some methylation data. I would like to get the percent of disease variance in 192 samples explained by a pre-selected set of about 300 methylation probes which will have quite a bit of correlation. First, is it sensible to use just the subset or is there a reason to use all of the probes?
I have generated data in the MACH dosage format like this:
$ zless gcta-test-dose.gz | cut -f 1-10 -d" " | head
15-02-001-3->15-02-001-3 ML_DOSE 0.920 0.694 0.642 0.707 0.751 0.853 0.750 0.900
15-02-002-1->15-02-002-1 ML_DOSE 0.912 0.751 0.831 0.804 0.787 0.880 0.689 0.867
15-02-003-2->15-02-003-2 ML_DOSE 0.895 0.670 0.813 0.808 0.773 0.865 0.699 0.846
15-02-004-5->15-02-004-5 ML_DOSE 0.922 0.771 0.817 0.799 0.748 0.855 0.744 0.865
15-02-005-0->15-02-005-0 ML_DOSE 0.917 0.763 0.860 0.825 0.837 0.940 0.815 0.904
15-02-006-6->15-02-006-6 ML_DOSE 0.927 0.838 0.835 0.826 0.789 0.884 0.810 0.920
15-02-007-8->15-02-007-8 ML_DOSE 0.941 0.850 0.863 0.867 0.865 0.946 0.833 0.902
15-02-008-4->15-02-008-4 ML_DOSE 0.913 0.799 0.843 0.792 0.824 0.934 0.837 0.908
15-02-009-7->15-02-009-7 ML_DOSE 0.933 0.789 0.818 0.820 0.767 0.828 0.726 0.866
15-02-010-7->15-02-010-7 ML_DOSE 0.899 0.775 0.866 0.841 0.804 0.885 0.771 0.907
with the methylation values ranging from 0 to 1.
and the corresponding .info file looks like this:
$ zless gcta-test-info.gz | head
SNP Al1 Al2 Freq1 MAF Quality Rsq
chr1:1091165 0 1 1 1 1 1
chr1:1091211 0 1 1 1 1 1
chr1:1091294 0 1 1 1 1 1
chr1:1091325 0 1 1 1 1 1
chr1:2220461 0 1 1 1 1 1
chr1:2220500 0 1 1 1 1 1
chr1:2220528 0 1 1 1 1 1
chr1:2220783 0 1 1 1 1 1
chr1:6263727 0 1 1 1 1 1
Are those reasonable numbers for the columns in there since none of this is imputed?
And the phenotype data:
$ head gcta-test.pheno
15-02-001-3 15-02-001-3 1
15-02-002-1 15-02-002-1 1
15-02-003-2 15-02-003-2 1
15-02-004-5 15-02-004-5 1
15-02-005-0 15-02-005-0 1
15-02-006-6 15-02-006-6 1
15-02-007-8 15-02-007-8 0
15-02-008-4 15-02-008-4 0
15-02-009-7 15-02-009-7 1
15-02-010-7 15-02-010-7 1
I have run gcta like this:
gcta64 --dosage-mach-gz gcta-test-dose.gz gcta-test-info.gz --make-grm --out icac
gcta64 --grm icac --reml --out icac_test_out --pheno gcta-test.pheno --prevalence 0.02
Are those reasonable parameters for what I'm attempting?
How much does adding covariates affect the estimate of the variance explained?
thanks,
Brent
I have generated data in the MACH dosage format like this:
$ zless gcta-test-dose.gz | cut -f 1-10 -d" " | head
15-02-001-3->15-02-001-3 ML_DOSE 0.920 0.694 0.642 0.707 0.751 0.853 0.750 0.900
15-02-002-1->15-02-002-1 ML_DOSE 0.912 0.751 0.831 0.804 0.787 0.880 0.689 0.867
15-02-003-2->15-02-003-2 ML_DOSE 0.895 0.670 0.813 0.808 0.773 0.865 0.699 0.846
15-02-004-5->15-02-004-5 ML_DOSE 0.922 0.771 0.817 0.799 0.748 0.855 0.744 0.865
15-02-005-0->15-02-005-0 ML_DOSE 0.917 0.763 0.860 0.825 0.837 0.940 0.815 0.904
15-02-006-6->15-02-006-6 ML_DOSE 0.927 0.838 0.835 0.826 0.789 0.884 0.810 0.920
15-02-007-8->15-02-007-8 ML_DOSE 0.941 0.850 0.863 0.867 0.865 0.946 0.833 0.902
15-02-008-4->15-02-008-4 ML_DOSE 0.913 0.799 0.843 0.792 0.824 0.934 0.837 0.908
15-02-009-7->15-02-009-7 ML_DOSE 0.933 0.789 0.818 0.820 0.767 0.828 0.726 0.866
15-02-010-7->15-02-010-7 ML_DOSE 0.899 0.775 0.866 0.841 0.804 0.885 0.771 0.907
with the methylation values ranging from 0 to 1.
and the corresponding .info file looks like this:
$ zless gcta-test-info.gz | head
SNP Al1 Al2 Freq1 MAF Quality Rsq
chr1:1091165 0 1 1 1 1 1
chr1:1091211 0 1 1 1 1 1
chr1:1091294 0 1 1 1 1 1
chr1:1091325 0 1 1 1 1 1
chr1:2220461 0 1 1 1 1 1
chr1:2220500 0 1 1 1 1 1
chr1:2220528 0 1 1 1 1 1
chr1:2220783 0 1 1 1 1 1
chr1:6263727 0 1 1 1 1 1
Are those reasonable numbers for the columns in there since none of this is imputed?
And the phenotype data:
$ head gcta-test.pheno
15-02-001-3 15-02-001-3 1
15-02-002-1 15-02-002-1 1
15-02-003-2 15-02-003-2 1
15-02-004-5 15-02-004-5 1
15-02-005-0 15-02-005-0 1
15-02-006-6 15-02-006-6 1
15-02-007-8 15-02-007-8 0
15-02-008-4 15-02-008-4 0
15-02-009-7 15-02-009-7 1
15-02-010-7 15-02-010-7 1
I have run gcta like this:
gcta64 --dosage-mach-gz gcta-test-dose.gz gcta-test-info.gz --make-grm --out icac
gcta64 --grm icac --reml --out icac_test_out --pheno gcta-test.pheno --prevalence 0.02
Are those reasonable parameters for what I'm attempting?
How much does adding covariates affect the estimate of the variance explained?
thanks,
Brent