Post by Jian Yang on Jun 22, 2016 4:18:03 GMT
GCTA-fastBAT: a fast and flexible set-Based Association Test using GWAS summary data
This method performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. Please see Bakshi et al. (2016 Scientific Reports) for details about the method. This module is developed by Andrew Bakshi and Jian Yang.
Note: most other GCTA options are also valid in this analysis.
Examples
# Gene-based test
# Segment-based test (size of a segment = 100Kb)
# Set-based test with a customized set file (note that this can be used to test all SNPs involved in a pathway)
Options
--bfile test
Input SNP genotype data (in PLINK binary PED format) as the reference set for LD estimation. For a single-cohort based GWAS, the GWAS cohort itself can be used as the reference set. For a meta-analysis, you can use one of the largest participating cohorts as the reference set. If none of them are available, you might use data from the 1000 Genomes Project (you will need PLINK2 --vcf option to convert the data into PLINK binary PED format). Please see Figure 1 of Bakshi et al. 2016 for a comparison of results using different reference sets for LD.
--fastBAT assoc.txt
Input association p-values of a list of SNPs. This can be from a standard GWAS or from a meta-analysis.
Input file format
assoc.txt
--fastBAT-gene-list gene_list.txt
Input gene list with gene start and end positions.
Input file format
gene_list.txt (columns are gene ID, chromosome, left and right side boundary of the gene region)
Please click the link below to download the gene list file.
Gene list (hg18): glist-hg18.txt
Gene list (hg19): glist-hg19.txt
--fastBAT-set-list set.txt
Input set list with name and list of SNPs in the set.
Input file format
set.txt (set ID, followed by SNPs, then END, then blank space before next set)
This option provides an opportunity for you to customize your own sets of SNPs. For example, you can create a SNP set which contains all the 1KGP SNPs in genes involved in a pathway listed in the file below.
pathway list: c2.cp.v5.1.symbols.gmt (downloaded from Broad GSEA)
--fastBAT-seg 100
Perform fastBAT analysis based on segments of size 100Kb (default).
Other options
--fastBAT-wind 50
Used in conjunction with --fastBAT-gene-list to define a gene region. By default, a gene region is defined as +-50kb of UTRs of a gene.
--fastBAT-ld-cutoff 0.9
Threshold LD r-squared value for LD pruning. The default value is 0.9. You can turn off LD pruning by setting this value to 1.
--fastBAT-write-snpset
Write the sets of SNPs included in the analysis. The SNP sets will be saved in a text file in the same format as the input file of --fastBAT-set-list.
Output file format
Possible output file names:
test.gene.fbat (columns are
test.seg.fbat (columns are
).
test.fbat (columns are
References:
fastBAT method: Bakshi A., Zhu Z., Vinkhuyzen A.A.E., Hill W.D., McRae A.F., Visscher P.M., and Yang J. (2016). Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Scientific Reports 6, 32894.
GCTA Software: Yang J, Lee SH, Goddard ME and Visscher PM. GCTA: a tool for Genome-wide Complex Trait Analysis. Am J Hum Genet. 2011 Jan 88(1): 76-82. [PubMed ID: 21167468]
This method performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. Please see Bakshi et al. (2016 Scientific Reports) for details about the method. This module is developed by Andrew Bakshi and Jian Yang.
Note: most other GCTA options are also valid in this analysis.
Examples
# Gene-based test
gcta64 --bfile test --maf 0.01 --fastBAT assoc.txt --fastBAT-gene-list gene_list.txt --out test --thread-num 10
# Segment-based test (size of a segment = 100Kb)
gcta64 --bfile test --maf 0.01 --fastBAT assoc.txt --fastBAT-seg 100 --out test --thread-num 10
# Set-based test with a customized set file (note that this can be used to test all SNPs involved in a pathway)
gcta64 --bfile test --maf 0.01 --fastBAT assoc.txt --fastBAT-set-list set.txt --out test --thread-num 10
Options
--bfile test
Input SNP genotype data (in PLINK binary PED format) as the reference set for LD estimation. For a single-cohort based GWAS, the GWAS cohort itself can be used as the reference set. For a meta-analysis, you can use one of the largest participating cohorts as the reference set. If none of them are available, you might use data from the 1000 Genomes Project (you will need PLINK2 --vcf option to convert the data into PLINK binary PED format). Please see Figure 1 of Bakshi et al. 2016 for a comparison of results using different reference sets for LD.
--fastBAT assoc.txt
Input association p-values of a list of SNPs. This can be from a standard GWAS or from a meta-analysis.
Input file format
assoc.txt
SNP p
rs1001 0.0055
rs1002 0.0115
……
--fastBAT-gene-list gene_list.txt
Input gene list with gene start and end positions.
Input file format
gene_list.txt (columns are gene ID, chromosome, left and right side boundary of the gene region)
Chr Start End Gene
1 19774 19899 Gene1
1 34627 35558 Gene2
……
Please click the link below to download the gene list file.
Gene list (hg18): glist-hg18.txt
Gene list (hg19): glist-hg19.txt
--fastBAT-set-list set.txt
Input set list with name and list of SNPs in the set.
Input file format
set.txt (set ID, followed by SNPs, then END, then blank space before next set)
Set1
rs1234534
rs5827743
rs9737542
END
Set2
rs1252514
……
This option provides an opportunity for you to customize your own sets of SNPs. For example, you can create a SNP set which contains all the 1KGP SNPs in genes involved in a pathway listed in the file below.
pathway list: c2.cp.v5.1.symbols.gmt (downloaded from Broad GSEA)
--fastBAT-seg 100
Perform fastBAT analysis based on segments of size 100Kb (default).
Other options
--fastBAT-wind 50
Used in conjunction with --fastBAT-gene-list to define a gene region. By default, a gene region is defined as +-50kb of UTRs of a gene.
--fastBAT-ld-cutoff 0.9
Threshold LD r-squared value for LD pruning. The default value is 0.9. You can turn off LD pruning by setting this value to 1.
--fastBAT-write-snpset
Write the sets of SNPs included in the analysis. The SNP sets will be saved in a text file in the same format as the input file of --fastBAT-set-list.
Output file format
Possible output file names:
test.fastbat (set-based test)
test.seg.fastbat (segment-based test)
test.gene.fastbat (gene-based test)
test.gene.fbat (columns are
Gene: gene ID
Chr: chromosome
Start and End: left and right side boundaries of the gene region
No.SNPs: number of SNPs in the gene region
SNP_start and SNP_end: the SNP at the left and right side boundary of the gene region
Chisq(Obs): sum of chi-squared test-statstics of all SNPs in the gene region
Pvalue: gene-based test p-value
TopSNP.Pvalue: smallest single-SNP GWAS p-value in the gene region
TopSNP: the top associated GWAS SNP
).test.seg.fbat (columns are
Chr: chromosome
Start and End: left and right side boundaries of the segment
No.SNPs: number of SNPs in the gene region
SNP_start and SNP_end: the SNP at the left and right side boundary of the gene region
Chisq(Obs): sum of chi-squared test-statstics of all SNPs in the segment
Pvalue: segment-based test p-value
TopSNP.Pvalue: smallest single-SNP GWAS p-value in the segment
TopSNP: the top associated GWAS SNP
).
test.fbat (columns are
Set: set ID
No.SNPs: number of SNPs in the gene region
SNP_start and SNP_end: the SNP at the left and right side boundary of the gene region
Chisq(Obs): sum of chi-squared test-statstics of all SNPs in the set
Pvalue: segment-based test p-value
TopSNP.Pvalue: smallest single-SNP GWAS p-value in the segment
TopSNP: the top associated GWAS SNP
).References:
fastBAT method: Bakshi A., Zhu Z., Vinkhuyzen A.A.E., Hill W.D., McRae A.F., Visscher P.M., and Yang J. (2016). Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Scientific Reports 6, 32894.
GCTA Software: Yang J, Lee SH, Goddard ME and Visscher PM. GCTA: a tool for Genome-wide Complex Trait Analysis. Am J Hum Genet. 2011 Jan 88(1): 76-82. [PubMed ID: 21167468]