RiCES Guide


Quick Start

All users who want to use this tool should check following points at least:

  1. General setting
  2. Specification of gene ID list
  3. Start analysis with "Apply" button

See following sections to find out detail.

What this tool does?

We have developed a novel tool that searches for cis-element candidates in the upstream, downstream, or coding regions of differentially regulated genes.

scheme of prometer region

RiCES lists possible cis-element motifs corresponding to genes of interest, and it will contribute to the deeper understanding of gene regulatory mechanisms in plants.

flowchart

Practice

The tool first accepts the list of genes that users interested in, and lists cis-element candidate motifs corresponding to the applied genes. The likelihood scores of the listed candidate motifs by association rule analysis. Finally remarkable cis-element candidates are selected and presented with some related information. snapshot 1

RiCES is a Web-application software. Users can operate RiCES by putting appropriate data to the form in http://hpc.irri.cgiar.org/tool/nias/ces. No special techniques are required.

Gene List

RiCES assumes that a user has already identified genes of interest from experimental analysis (e.g. clusters of coordinately regulated genes). RiCES recognizes GenBank accession numbers, identifiers of transcription units (TUs) as defined in the TIGR pseudomolecular assemblies, and several other major gene identification systems (see another page for detail). Using the list, it retrieves the set of associated upstream, downstream, or coding region sequences flanking the specified genes from available genomic sequence data.

Preliminary Cis-Element Candidate List

The second step of the analysis is the compilation of a list of motifs as candidate cis-elements by following methods:

  1. Motif searching
  2. Pre-compiled list of cis-elements known in plant
  3. User-defined motif list

Motif Searching

The first method depends on ab initio motif searching based on the supposition that if there are cis-elements playing important roles in the regulation of a given set of genes, they will be statistically overrepresented in the associated promoter sequences as conserved motifs that can be identified by using a suitable motif search program. There are several programs implementing several algorithms. We have chosen to use MEME, which is a publicly available motif discovery program supporting an expectation maximization algorithm. In our analysis algorithm, MEME is invoked to identify motifs 6 to 8 bp long that look highly conserved among promoter sequences of the selected genes. Users can modify some of the search parameters of the MEME program via the Web form.

Pre-Compiled List of Cis-Elements Known in Plant

The second method relies on the hypothesis that common, known cis-elements play important roles under the experimental conditions that gave rise to the list of genes specified by the user. Therefore, RiCES searches for matches to a pre-compiled list of known cis-elements.

Precedent databases of plant cis-elements are not exhaustive enough to distinguish 'core' motifs, which decide the function of cis-elements, from co-existing sequences in neighboring regions. As a result, many cis-element sequence data in these databases include superficial core motifs for which no evidence of functionality has been obtained. The use of such data prohibit effective informatic analysis.

We compiled a novel database of known cis-elements and incorporated it into RiCES. The cis-elements are collected from reports of experiments such as gel shift assays and footprint analyses, categorized by transcription factor, and documented with respect to known activity in the plant genome. Some cis-elements known only in organisms other than plants are also listed, in consideration of their possible, albeit unknown, roles in plants.

The database includes four types of cis-elements:

  1. G-box and E-box, which bind to common sequences such as bHLH or bZIP in many organisms.
  2. A-box, T-box, and GGTTTAG repeats, which bind to common sequences in many organisms, such as homeodomain and Myb.
  3. CArG boxes and GCC-box, which bind to plant MADS, zinc finger, and AP2/EREBP elements.
  4. Other cis-elements, binding only in animals, such as HSF, PcG, and HMG.

User-Defined Motif List

Users can specify sequences of cis-element candidates that they pay attention, instead of using meme or precompiled-list. The candidate nuclear sequences should be inputed in the "Motif List" box in "Optional Items" section in the application form, where one line should include only one sequence. The sequences should be expressed in regular expression.

Final Cis-Element Candidate List

Association rule analysis

The third step of the analysis is the likelihood evaluation of the cis-element candidates by association rule analysis, which is a data mining method designed to discover significant relationships between pairs of characteristics observed in data sets. Candidates showing the highest likelihood (specificity) are retained in the final cis-element candidate list.

The strategy depends on the idea that motifs overrepresented in the promoter region of the genes of interest could play specific roles in regulation of the expression of those genes. Implied cause-and-effect relationships documented as 'rules' are evaluated by using several well-known indices of likelihood, including support, confidence, and lift. On the basis of sample data sets, the lift index appeared to best discriminate significant relationships between experimental conditions and cis-element candidates. The lift index appeared to best discriminate significant relationships between experimental conditions and cis-element candidates. We set the default threshold of lift to 1.0, and the cis-element candidates are included in the final candidate list only if their lift value is higher than this threshold.

Output

The final cis-element candidate list is presented as an association table with the identifier of the submitted genes (TU identifiers based on TIGR gene model annotation are used in the current version) annotated with any available corresponding information from RiceCyc (http://www.gramene.org/pathway/) and Gene Ontology. RiCES also provides information on candidate motifs, including the positions of the element in the promoter regions of corresponding TUs, the sequence, and related information from AtcisDB. The position of the cis-element candidates is also presented in both text and graphics.

Optional Parameters

Reference List

Association rule analysis is based on simple arithmetic methods. The analysis is start from a ratio of the number of genes possessing and not possessing cirtain cis-elements in their prometer regions.

Although association rule analysis is simple and effective approach, we should note that this method tends to show false-positive results when the number of user-defined target genes is much smaller than that of reference gene list. In default, reference gene list is the whole available TUs stores in KOME database, which include up to 30,000 genes.

Users can try association rule analysis with smaller reference genes which are selected from the default reference gene set with certain conditions, such as possession of a known cis-element motif in upstream region. We have prepared several such reference sets, which can be selected easily by users in the form of the starting page.

Random Select

The size of reference gene list can be reduced by selecting 'random selection' option.

Miscellaneous Features

Regular Expression to Represent Ambiguous Sequences

The sequences of pre-compiled cis-element lists are stored in perl-compatible regular expression, to represent ambiguous sequence patterns. Results are also presented using this expression. Users should follow the syntax of this expression to define "user-defined candidate list".

Several examples:

See http://perldoc.perl.org/perlrequick.html to find out more about regular expression.

Example of regular expression for known cis-elements
Helix-turn-helix(HTH)(CTAATTG){2,3}
BBR/BPC((GA)+|(TC)+)
RAVCAACA[ACGT]*CACCTG

Combined Motif Analysis

We also tried to evaluate pairwise combinations of motifs in the preliminary candidate list, in consideration for possible protein-protein interactions of multiple transcription elements binding cis-elements, as previously illustrated by experimental evidence (Ulmasov et al., 1995; Ulmasov et al., 1999).

Currently this is achieved by generate "combined cis-element candidate list" from the preliminary candidate list. For example if the preliminary list consists of three candidates, "AACC", "ATAT", and "GCAT", then "AACC.*ATAT", "AACC.*GCAT", "ATAT.*GCAT", "ATAT.*AACC" etc. will be included in the combined cis-element candidate list. Complementary sequences are also took in consideration, and thus motifs such as "GGTT.*ATGC" will also be added to the combined list. The members of this list will be evaluated just like members of the original candidate list.

Users can try such analysis by clicking on "Yes" for "Do Combined motif analysis" option. Note that it would be quite time-consuming.


References


Last Update: 11 July, 2007