Post by reamog on Sept 7, 2023 20:40:29 GMT
Hello Community,
I'm currently encountering a challenging issue while attempting to run Genome-Wide Association Studies (GWAS) using the GCTA software on machdose files. I consistently receive a segmentation error. This problem has persisted across various versions of the software, including 1.94.1, 1.93.0beta, 1.93.3beta2, and 1.92.3beta3.
For context, my genotype data is divided into chunks of 100,000 nucleotides, and I'm working on the first chunk of chromosome 22. Interestingly, the code executed flawlessly on another High-Performance Computing (HPC) system.
I would be immensely grateful if someone could offer insights or suggestions on how to troubleshoot and resolve this segmentation error. This is the output I get:
chr=22
chunk=1
pheno=1
path=/data/path_to_data/
gcta_1.93.3beta2/gcta64 --grm /data/my_grm \
> --mlma --dosage-mach-gz /path/mach$chr\/AGR.$chr.$chunk.machdose.gz /path/mach$chr\/AGR.$chr.$chunk.machinfo.gz --chr $chr --maf 0.01 \
> --pheno /data/br_pheno.txt \
> --mpheno $pheno \
> --covar /data/br_cov_sex_age_season.txt \
> --out /path/output/tmp.$pheno.mlma.$chr.$chunk
*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.3 beta Linux
* (C) 2010-present, Jian Yang, The University of Queensland
* Please report bugs to Jian Yang <jian.yang.qt@gmail.com>
*******************************************************************
Analysis started at 12:29:30 EDT on Thu Sep 07 2023.
Hostname: helix.nih.gov
Accepted options:
--grm /data/brazil_grm
--mlma
--dosage-mach-gz /data/mach22/AGR.22.1.machdose.gz /data/mach22/AGR.22.1.machinfo.gz
--chr 22
--maf 0.01
--pheno /data/br_pheno.txt
--mpheno 1
--covar /data/br_cov_sex_age_season.txt
--out /data/output/tmp.1.mlma.22.1
Note: This is a multi-thread program. You could specify the number of threads by the --thread-num option to speed up the computation if there are multiple processors in your machine.
Reading map file of the imputed dosage data from [/data/mach22/AGR.22.1.machinfo.gz].
50000 SNPs to be included from [/data/mach22/AGR.22.1.machinfo.gz].
Warning: the option --chr, --autosome or --nonautosome is inactive for dosage data.
Reading dosage data from [/data/mach22/AGR.22.1.machdose.gz] in individual-major format (Note: may use huge RAM).
(Imputed dosage data for 1309 individuals detected).
Imputed dosage data for 1309 individuals are included from [/data/mach22/AGR.22.1.machdose.gz].
Calculating allele frequencies ...
Filtering SNPs with MAF > 0.01 ...
After filtering SNPs with MAF > 0.01, there are 49947 SNPs (53 SNPs with MAF < 0.01).
Reading phenotypes from [/data/br_pheno.txt].
Non-missing phenotypes of 753 individuals are included from [/data/br_pheno.txt].
Reading discrete covariate(s) from [/data/br_cov_sex_age_season.txt].
3 discrete covariate(s) of 741 individuals are included from [/data/br_cov_sex_age_season.txt].
Reading IDs of the GRM from [/data/brazil_grm.grm.id].
1309 IDs read from [/data/brazil_grm.grm.id].
Reading the GRM from [/data/brazil_grm.grm.bin].
GRM for 1309 individuals are included from [/data/brazil_grm.grm.bin].
739 individuals are in common in these files.
3 discrete variable(s) included as covariate(s).
Performing MLM association analyses (including the candidate SNP) ...
Performing REML analysis ... (Note: may take hours depending on sample size).
739 observations, 10 fixed effect(s), and 2 variance component(s)(including residual variance).
Calculating prior values of variance components by EM-REML ...
Updated prior values: 0.475754 0.473573
logL: -356.314
Running AI-REML algorithm ...
Iter. logL V(G) V(e)
1 -355.88 0.65580 0.28437
2 -355.23 0.92388 0.00000 (1 component(s) constrained)
3 -354.64 0.92727 0.00000 (1 component(s) constrained)
4 -354.64 0.92728 0.00000 (1 component(s) constrained)
Log-likelihood ratio converged.
Running association tests for 49947 SNPs ...
Segmentation fault (core dumped)
I'm currently encountering a challenging issue while attempting to run Genome-Wide Association Studies (GWAS) using the GCTA software on machdose files. I consistently receive a segmentation error. This problem has persisted across various versions of the software, including 1.94.1, 1.93.0beta, 1.93.3beta2, and 1.92.3beta3.
For context, my genotype data is divided into chunks of 100,000 nucleotides, and I'm working on the first chunk of chromosome 22. Interestingly, the code executed flawlessly on another High-Performance Computing (HPC) system.
I would be immensely grateful if someone could offer insights or suggestions on how to troubleshoot and resolve this segmentation error. This is the output I get:
chr=22
chunk=1
pheno=1
path=/data/path_to_data/
gcta_1.93.3beta2/gcta64 --grm /data/my_grm \
> --mlma --dosage-mach-gz /path/mach$chr\/AGR.$chr.$chunk.machdose.gz /path/mach$chr\/AGR.$chr.$chunk.machinfo.gz --chr $chr --maf 0.01 \
> --pheno /data/br_pheno.txt \
> --mpheno $pheno \
> --covar /data/br_cov_sex_age_season.txt \
> --out /path/output/tmp.$pheno.mlma.$chr.$chunk
*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.3 beta Linux
* (C) 2010-present, Jian Yang, The University of Queensland
* Please report bugs to Jian Yang <jian.yang.qt@gmail.com>
*******************************************************************
Analysis started at 12:29:30 EDT on Thu Sep 07 2023.
Hostname: helix.nih.gov
Accepted options:
--grm /data/brazil_grm
--mlma
--dosage-mach-gz /data/mach22/AGR.22.1.machdose.gz /data/mach22/AGR.22.1.machinfo.gz
--chr 22
--maf 0.01
--pheno /data/br_pheno.txt
--mpheno 1
--covar /data/br_cov_sex_age_season.txt
--out /data/output/tmp.1.mlma.22.1
Note: This is a multi-thread program. You could specify the number of threads by the --thread-num option to speed up the computation if there are multiple processors in your machine.
Reading map file of the imputed dosage data from [/data/mach22/AGR.22.1.machinfo.gz].
50000 SNPs to be included from [/data/mach22/AGR.22.1.machinfo.gz].
Warning: the option --chr, --autosome or --nonautosome is inactive for dosage data.
Reading dosage data from [/data/mach22/AGR.22.1.machdose.gz] in individual-major format (Note: may use huge RAM).
(Imputed dosage data for 1309 individuals detected).
Imputed dosage data for 1309 individuals are included from [/data/mach22/AGR.22.1.machdose.gz].
Calculating allele frequencies ...
Filtering SNPs with MAF > 0.01 ...
After filtering SNPs with MAF > 0.01, there are 49947 SNPs (53 SNPs with MAF < 0.01).
Reading phenotypes from [/data/br_pheno.txt].
Non-missing phenotypes of 753 individuals are included from [/data/br_pheno.txt].
Reading discrete covariate(s) from [/data/br_cov_sex_age_season.txt].
3 discrete covariate(s) of 741 individuals are included from [/data/br_cov_sex_age_season.txt].
Reading IDs of the GRM from [/data/brazil_grm.grm.id].
1309 IDs read from [/data/brazil_grm.grm.id].
Reading the GRM from [/data/brazil_grm.grm.bin].
GRM for 1309 individuals are included from [/data/brazil_grm.grm.bin].
739 individuals are in common in these files.
3 discrete variable(s) included as covariate(s).
Performing MLM association analyses (including the candidate SNP) ...
Performing REML analysis ... (Note: may take hours depending on sample size).
739 observations, 10 fixed effect(s), and 2 variance component(s)(including residual variance).
Calculating prior values of variance components by EM-REML ...
Updated prior values: 0.475754 0.473573
logL: -356.314
Running AI-REML algorithm ...
Iter. logL V(G) V(e)
1 -355.88 0.65580 0.28437
2 -355.23 0.92388 0.00000 (1 component(s) constrained)
3 -354.64 0.92727 0.00000 (1 component(s) constrained)
4 -354.64 0.92728 0.00000 (1 component(s) constrained)
Log-likelihood ratio converged.
Running association tests for 49947 SNPs ...
Segmentation fault (core dumped)