Functionalize Expression
(FuncExpression)
Lishuang Shen
1. Introduction:
FuncExpression is a web-based
resource for functional
interpretation of high-throughput genomics data. FuncExpression focus on two way integration of
PLANT gene functional information
and the large scale gene expression data.
Major Animal and Fungi species are supported.
FuncExpression consisted of 2 major function classes.
(1). Expression2Function aims to
provide gene function information to the gene lists, and allows further
cross-validation with gene expression data. from BarleyBase/PLEXdb. The gene
lists are obtained from microarray and other non-microarray genomics experiments. (2). Function2Expression
retrieve plant gene expression profiles according to gene functional annotations.
Both function classes are fully integrated with our microarray data numerical
analysis tools when applicable.
The gene function information include the
well-structured gene ontology classification, InterPro functional domain
prediction/annotation, metabolic pathways and gene family information.
In addition to interpreting microarray data, FuncExpression is a general purpose tool for
functional comparison of other types of
PLANT, FUNGAL, and ANIMAL gene name lists
generated from genomics, proteomics, or EST projects. This module can be used
independent of microarray data.
2. Gene Functional Annotations and Preparation
(1). Gene ontology:
FuncExpression supports 4 types GO annotations,
totaling 138 databases, compiled by BarleyBase. This is the major component of FuncExpression.
Microarray:
All Affymetrix human, animal and Prokaryotic gene expression analysis arrays listed at
Affymetrix Support Page are supported with Affymetrix Gene Ontology annotations.
All Affymetrix plant arrays except for Citrus are supported. with BarleyBase in-house annotations, based on
TAIR, Gramene, GOA and Uniprot annotations. 22K ATH1
(TAIR), 8K AG (TAIR)
and 57K Rice (GRAMENE)
are based on direct mapping to same species proteins. Otheras are
transitive annotations based on GOA sequence similarity between
GeneChip exemplars and UniProt protein entries. Users can choose
stringent (Expect <= 1e-20 in BLASTX) or loose (Expect <= 1e-5 in
BLASTX) annotations. Stringent annotation is recommended. These
include Barley1 (BarleyBase),
16K Grape (TIGR), Maize 18K, and 61K Soybean (BarleyBase)
GeneChips.
NSF 58K maize and 20K rice
spotted arrays and 18K fungal Fusarium
is also annotated with the second method;
Protein:
All 33 species listed at
Gene Ontology Consortium
GO Annotation
page Annotation page,
including Arabidopsis (TAIR, TIGR,
and GOA ),
Rice (Gramene), Pseudomonas syringae
DC3000, budding yeast, fission yeast. Animals including human, mouse, rat, C.
elegans, fly and zebrafish etc. The annotations are current as of January 2006;
Gene Index:
Electronic annotations for 16 plant
species (TIGR), 9 fungi
species (TIGR), current of March 2005;
ESTs and cDNAs:
Electronic
annotations for 16 plant
species (BarleyBase), 9 fungi
species (BarleyBase). Using
GenBank Accession numbers as input. They are transitive annotation based on TIGR Gene Index
annotations and the membership of ESTs in Gene Index, current of March 2005. Notice: This input type can not be meaningfully compared for p-values due to the highly redundant nature of the IDs. Use it only for roughly assigning sequences into GO classes.
(2). Metabolic pathways:
annotation for
proteins
of 14 species are downloaded from
KEGG. 56 Affymetrix platforms for Barley1, human, mouse and other species are are
electronically annotated by
BarleyBase based on KEGG annotation
by same species mapping to exemplar matching proteins, or by sequence similarity
to model species genes associated with pathways.
(3). Gene family (Discontinued):
annotation for Arabidopsis (TAIR).
(4). InterPro functional domain (Discontinued):
annotation for Arabidopsis proteins (TAIR).
3. Expression2Function -- Interpreting Expression Profiles
Under Gene Function Context
Multiple gene lists can be classified, compared and
visualized according to the gene ontology, metabolic pathway and gene family information of member genes. It allows further
cross-validation with expression data from related experiments, which is backed
with our comprehensive plant microarray expression data repository at
Barleybase/PLEXdb.
(1). Modules
A. Expression2GO -- Compare several types of gene lists for their distribution
in GO classes among the lists.
B. Expression2Pathway
-- Compare the ATH1 lists for their distribution in metabolic pathways.
C. Expression2GeneFamily -- Compare the ATH1 lists for their distribution among gene families.
D.
Expression2Domain --
Compare the ATH1 lists for distribution among functional domains from InterPro.
(2). Input-- Data Sources and
Formats
Source A. Gene lists from microarray experiments, conducted on
BarleyBase - supported platforms. The inputs are microarray element names, including exemplar
and probe set names. Supports GO, Pathway and Gene Family.
Source B. Gene lists from non-microarray experiments, including
genomics, proteomics, EST, and other high-throughput genomics experiments. The
inputs are Gene Index TC numbers, and protein accession numbers, and protein name.
Supports GO, Pathway, Interpro domain and Gene Family .
Input Preparation Method 1: Using gene
lists and data pre-saved within BarleyBase: (1). Compare multiple saved
lists; (2) Compare two pre-saved gene lists and their gene
subsets, including the intersection and difference subsets; and (3)
Compare gene lists from the clusters from a BarleyBase
clustering/partitioning result.
Input Preparation Method 2: Importing genes lists
from outside BarleyBase. Multiple gene lists (up to 10) can be input for comparison
with reference list, and between all the lists.
Data Format for Input Preparation Method 2:
-
No data formatting is needed for Input Preparation
Method 1.
-
For Input Preparation Method 2, please refer to a
Sample
Input for ATH1 GeneChip. The input is multiple lists denoted by a list
header line: "MY_LIST:###LIST_Name###", where LIST_Name can be any user
preferred name. After each header line, users can input multiple gene names
separated by comma, tab, or white space. Other free text input is supported,
though it may not always work accurately.
MY_LIST:###LIST1###
245306_at
245628_at
245637_at
......
MY_LIST:###LIST2###
252102_at
252123_at
252265_at
......
(3). Output-- Visualization and Tables
Detailed classification for each gene list in each
functional class is output as color-highlighted HTML tables. It includes the number of matches,
enrichment fold and p-values, and the names of matching genes.
Fisher's Exact Test and Hypergeometric Distribution are used to find significantly enriched
and depleted gene functional classes. Benjamini and Hochberg (BH) multiple test
correction is used to get FDR (false discovery rate). The p-values and FDRs are stored as tab-delimited
text files.
Two types of barplots are used to visualize the comparison
results:
A. Plot by gene functional classes for
all the gene lists including population reference list. The enrichment folds,
the number, and % of matches in each
gene list are plotted side-by-side, together with the reference list. The
information, actual match number, %, and enrichment fold of each list against reference list are
shown as legend in the plots.
B. Plot by gene lists for represented gene
functional classes. Barplot are for Enrichment Fold v.s. Reference List, or for
Percentage in Functional Classes. The enrichment folds, the number, and % of
matches in each
GO classes are plotted side-by-side, optionally sorted by enrichment fold. The
information, actual match number, %, and enrichment fold of each list against reference list are
shown as legend in the plots.
(4). Usage of Expression2Function
-> Choose Gene Function Type (for example, GO)
-> Choose your top level GO term
-> Select input type
-> Select species/platform combination from list
-> If applicable, change the annotations
threshold for BLASTX-based annotations
-> If applicable, change the Total and annotated
numbers of sequences of the Reference. This is for accurate enrichment
quantification and p-value calculation of users' own lists v.s.
reference, but not needed for comparison between users' own input
lists. Reference is defined as the global seuquence population where
you draw your input list from. You can use your own values to override
the values provided by FuncExpression if you have your own information
about the Reference size, number of GO annotated sequences in the
reference.
-> Select gene list input type and provide input
-> Press "Run" button
-> Check and save results. Cross-validate gene
list classification results with expression values if microarray data
is available by (a). Select target microarray experiment and (b) click
in the textboxes for gene names.
4. Function2Expression -- From Gene Functional Annotation to Expression Profile
This is a collection of gene list creation methods for retrieval of gene
expression data, based on several types of
gene functional annotations.
(1).
Modules
A.GO2Expression -- Browse and search Gene Ontology
tree, and retrieve probe sets or
genes from selected GO classes.
B.Pathway2Expression
-- Find probe
sets from Arabidopsis ATH1 GeneChip corresponding to enzymes from your interested metabolic or regulatory
pathways. Based on KEGG and
TAIR pathway data.
C.GeneFamily2Expression -- Find probe
sets from Arabidopsis ATH1 GeneChip corresponding to a given gene family
(Discontinued).
(2). Input
The input are selected GO terms, pathways, or gene families. Target
experiment must be defined or selected for retrieving expression values. Please follow instructions on corresponding pages.
(3). Output
The output are the qualifying gene lists, which can be further feed into
microarray data numerical analysis and visualization tools.
5. Change Log
June 11, 2006:
Added GO support to all Affymetrix Plant GeneChips (except for
Citrus), and NSF maize 58K and Rice 20K spotted array. Added Fusarium
18K GeneChip.
For BLASTX based annotations in plants and fusarium, added option for
choosing stringent and loose threshold in BLAST.
Added option for overriding Reference gene list total and annotated
sequence numbers for accurate enrichment and p-value calculation versus
reference list.
December 29, 2005:
Added pathways for multiple animal and microbe species with KEGG
annotations.
Added GO support to ALL Affymetrix Animal and microbe platforms. More
plant species supported.
Added GO support to all species annotations from Gene Ontology
Consortium annotation page.
March 2005:
FuncExpression is added to GO tool list at Gene Ontology Consortium
website.
Added GO and pathway supports for animal and fungal proteins.
November 14, 2004:
Prototype of FuncExpression was out. It supported GO, Gene family,
Interpro, and KEGG pathway analysis for Arabidopsis ATH1 22K and Barley1
22K GeneChips.
กก
FuncExpression is under ACTIVE development. Please regard it as a Beta
test version, and use caution in interpreting results.
Please send questions, feature request,
bug reports, and comments about this tool to the Lishuang Shen:
lshen@iastate.edu or
Shen_Lishang@yahoo.com.
Back to Top
|