CIBB 2010: Home Plenary Talks

Plenary Talk #1

Raffaele Giancarlo:
Dipartimento di Matematica ed Informatica
Universita’ degli Studi di Palermo

Title:The Three Steps of Clustering in the Post-Genomic Era*
*Joint work with G. Lo Bosco, L. Pinello and F. Utro
Clustering is one of the most well known activities in scientific investigation
and the object of research in many disciplines, ranging from Statistics to Computer Science.
It can be summarized as a three step process: (a) Choice of a Distance Function;
(b) Choice of a Clustering Algorithm; (c) Choice of a Validation method.
Although such a purist approach to Clustering is hardly seen in many areas of Science,
genomic data require that level of attention if inferences made from Cluster Analysis
have to be of some relevance to Biomedical research.
Unfortunately, the high dimensionality of the data and their noisy nature makes Cluster
Analysis of genomic data particularly difficult.
In this talk, the state of the art on the subject will be presented, discussing specific
limitations of the steps involved in Clustering and possible ways to make progress.

Plenary Talk #2

Paulo J. Lisboa,
John Moores University, Liverpool,

Title: The continuum from bioinformatics to biostatistics*
*Joint work with D. Bacciu, I.H. Jarman, T.A. Etchells, S.J. Chambers, J. Whittaker and J. Garibaldi
The elucidation of biological networks regulating the metabolic basis of
disease is critical for understanding disease progression and identifying therapeutic targets.
This paper will highlight the need for multidisciplinary research across computational
intelligence methods andtraditional statistics, by reference to a data set of cytometric
protein expression markers for breast cancer. In particular, it will focus on the interplay
between robust clustering, visualisation by dimensionality reduction and modelling with directed acyclic graphs.

Plenary Talk #3

Gianluca Pollastri:

School of Computer Science and Informatics
University College Dublin.

Title: De Novo Protein Subcellular Localization Prediction by N-to-1 Neural Networks

Knowledge of the subcellular location of a protein provides valuable
information about its function and possible interaction with other proteins.
In the post-genomic era, fast and accurate predictors of subcellular location
are required if this abundance of sequence data is to be fully exploited.
We have developed a subcellular location predictor (SCL_pred) using high
throughput machine learning models trained on large non-redundant sets of
protein sequences.
The algorithm powering SCL_pred is a new Neural Network (N-to-1 Neural
Network, or N1-NN) which is capable of mapping whole sequences into single
(a functional class, in this work) without resorting to predefined transformations,
but rather by adaptively compressing the sequence into a hidden feature vector.
I will describe the model, and report on extensive benchmarking of SCL_pred against
other state-of-the-art predictors of subcellular location. The results are
favourable, moreover the N1-NN algorithm is fully general and may be applied to
a host of problems of similar shape, that is, in which a whole sequence needs to
be mapped into a fixed-size array of properties. The adaptive compression
operated by N1-NN may even shed light on the space of protein sequences.