GAGG : algorithm (R code) allowing gene clustering

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
Higher Edu - Research dev card
  • Creation or important update: 23/04/10
  • Minor correction: 23/04/10
  • Index card author: Florian Salipante (IGF - Contrôle de l'apoptose et de la prolifération dans les systèmes neuronaux et endocriniens)
  • Theme leader: Christelle Dantec (CRBM)
General software features

GAGG (Genetic Algorithm for Gene Gathering) is a new statistical method which allows to detect differentially expressed genes and to cluster them according to their expression profiles. This is a factorial method based on integer encoding of the projection variables. It allows to take into account the multivariate aspect of data. It requires the use of a genetic algorithm, and combines several statistical methods, such as PCA or k-means. The code is implemented in R language and consists in 5 functions. A main function GAGG, three internal functions GAGG1, GAGG2 and GAGG3 and a function which allows to visualize genes profiles PlotProfiles.


Context in which the software is used

GAGG algorithm is used to realize genes clusters according to their expression profiles.


It is essentially intended to biologists, statisticians and bioinformaticians, who have a minimal prerequisite in the use of R software.

Statistical knowledge and in particular in principal component analysis can facilitate the understanding of the graphics, but are not indispensable because the groups are generated in a self organizing manner.

In the same way, default parameters are set for the genetic algorithm:
Tpop and Ngene parameters corresponding respectively to the population size and the number of generations, can be modified by the user. The more these values are high, the more the chance to converge to the optimal solution will be increasing, but the computational time will be increased too.

The algorithm allows to indifferently treat monocolor or bicolor microarrays, the pre-treatment of data is let to the user who can choose his normalization (Quantile normalization, loess, lowess etc..) and standardization technics.

The data matrix will be presented with genes in rows and experimental conditions in columns. If necessary, a pre-treatment step will be added to the algorithm later.

GAGG method gives good results for gene clustering, it uses a genetic algorithm which is greedy in computation, that implies a long execution time (several hours), in function of Tpop and Ngene parameters. At the beginning, a message asks to the user how many components he wants to compute (some information are given to help with this choice), most of the time two components are sufficient. .

The code source may be downloaded.

Publications related to software

An article will soon be published in the review CSDA