LIST OF ABSTRACTS

CASCADING CLASSIFIERS

Ethem Alpaydin, Cenk Kaynak
Department of Computer Engineering
Bogazici University, Turkey

We propose a multistage recognition method built as a cascade of a multi-layer perceptron (MLP) and a k-nearest neighbor (k-NN) classifier. MLP, being a distributed method, generalizes to learn a "rule" and the k-NN, being a local method, learns the localized "exceptions" rejected by the "rule". Because the rule-learner handles a large percentage of the examples using a simple and general rule, only a small subset of the training set is stored as exceptions during training. Similarly during testing, most patterns are handled by the MLP and few are handled by k-NN thus causing only a small increase in memory and computation. A multistage method like cascading is a better approach than multiexpert methods like voting and stacking where all learners are used for all cases; the extra computation and memory for the second learner is unnecessary if we are sufficiently certain that the first one's response is correct. We discuss how such a system can be trained using cross validation. This method is tested on the real-world application of handwritten digit recognition.

* * *

INEXACT RECURSIVE PATTERN CLASSIFICATION
IN THE CLASS OF LOGICAL DECISION FUNCTIONS

V.B. Berikov, A.A. Vikentiev
Institute of Mathematics
Novosibirsk, Russia

An approach for pattern classification of inexact information is considered. This approach is based on the recursive analysis of the information. Proposed technique allows to find out logical regularities in initial information.

* * *

DETECTING A DATA SET STRUCTURE THROUGH THE USE OF
NONLINEAR PROJECTIONS AND STOCHASTIC OPTIMIZATION

Victor L. Brailovsky, Michael Har-even
Tel Aviv University,
Tel Aviv, Israel

Detecting a cluster structure is considered. This means solving either the problem of discovering a natural decomposition of data points into groups (clusters) or the problem of detecting clouds of data points of a specific form. In this paper both these problems are considered. To discover a cluster structure of a specific arrangement or a cloud of data of a specific form a class of nonlinear projections is introduced. Fitness functions that estimate to what extent a given subset of data points (in the form of the corresponding projection) represents a good solution for the first or the second problem are presented. To find a good solution one uses a search and optimization procedure in the form of Evolutionary Programming. The problems of cluster validity and robustness of algorithms are discussed. Examples of applications are presented.

* * *

COMBINING CLASSIFIERS FOR THE RECOGNITION OF
HANDWRITTEN DIGITS

M. van Breukelen, R.P.W. Duin, D.M.J. Tax
Delft University of Technology
Delft, The Netherlands

Classifiers can be combined to reduce classification errrors. Experiments were done on real data consisting of different sets of features of handwritten digits. Different types of classifiers were trained on these feature sets. The performances of these classifiers and combination rules were tested. The best results were acquired with the mean, median and product combination rules. The product was best for combining linear classifiers, the median for k-nearest neighbour classifiers. Training a classifier on all features did not result in less classification errors

* * *

FEATURE SELECTION IN MULTICLASS CASES:
A PROPOSAL AND AN EXPERIMENTAL INVESTIGATION

L. Bruzzone, S.B. Serpico
Department of Biophysical and Electronic
Engineering, University of Genoa,
Genova, Italy

In this paper, feature selection for classification of remote-sensing images is addressed. In particular, the attention is focused on feature selection approaches for multiclass cases. Several criteria proposed in literature and one proposed by the authors are compared with another. Experiments on two remote-sensing data sets are presented and discussed. These experiments confirm the interest of the proposed criterion which performs slightly better than all the others considered in the paper.

* * *

A STATISTICAL ESTIMATION PROCEDURE
FOR THERMOCHROMIC PAINT

R.C.Crida, A.J.Stoddart, A.Griffin and T.Windeatt
Electrical Engineering, University of Surrey
Guildford, England

In this paper we consider the problem of temperature estimation using thermochromic paint. We present a novel framework for recovering the temperature with an associated estimate of the covariance.

* * *

FUZZY CLUSTERING OF SPATIAL BINARY DATA

Van Mo Dang and Gerard Govaert
University of Technology, Compiegne
Compiegne CEDEX, France

An iterative fuzzy clustering method is proposed to partition a set of multivariate binary observation vectors located at neighboring geographic sites. The method described here applies in a binary setup a recently proposed algorithm, called Neighborhood EM, which seeks a partition that is both well clustered in the feature space and spatially regular. This approach is derived from the EM algorithm applied to mixture models viewed as an alternate optimization method. The criterion optimized by EM is penalized by a spatial smoothing term that favors classes having many neighbors. The resulting algorithm has a structure similar to EM, with an unchanged M-step and an iterative E-step. The criterion optimized by Neighborhood EM is closely related to a posterior distribution with a multilevel logistic Markov random field as prior. The application of this approach to binary data relies on a mixture of multivariate Bernoulli distributions. Experiments on simulated spatial binary data yield encouraging results.

* * *

FEATURELESS CLASSIFICATION

Robert P.W. Duin, Dick de Ridder and David Tax
Delft University of Technology
Delft, The Netherlands

In this paper the possibilities are investigated for training statistical pattern recognizers based on a distance represation of the objects instead of a feature representation. Distances or similarities are used between the unknown object to be classified with a selected subset of the training objects (the support objects). These distances are combined into linear or nonlinear classifiers. In this approach the problem of finding of discriminative features is replaced by finding good (dis)similarity measures. The proposal coresponds with determining classification functions in Hilbert space using an infinite feature set. It is a direct consequence of Vapnik's support vector classifier.

* * *

USING PROTOTYPE SELECTION AND ADAPTIVE VECTOR
QUANTIZATION TO OBTAIN REDUCED SETS OF PROTOTYPES
FOR CLASSIFICATION PURPOSES

F.J. Ferri
Dept. Informatica i Electronica, Universitat de Valencia,
Dr. Moliner, 50 46100 Burjassot, Spain

Prototype selection (PS) techniques have traditionally been applied prior to Nearest Neighbour (NN) classification rules both to improve its accuracy (editing) and to alleviate its computational burden (condensing). Methods based on selecting/discarding prototypes and methods based on adapting prototypes have been separately introduced to deal with this problem. Different approaches to this problem are considered in this paper and their main advantages and drawbacks are pointed out along with some suggestions for their joint application in some cases.

* * *

SELECTING THE BEST FEATURES IN THE CASE
OF DECREASING ROBUSTNESS

Jan Flusser and Tomas Suk
Institute of Information Theory and Automation
Prague, Czech Republic

This paper introduces a novel method for selecting a feature subset yielding an optimal trade-off between class separability and feature space dimensionality. We assume the following feature properties:
(a) the features are ordered into a sequence,
(b) robustness of the features decreases with an increasing order and
(c) higher-order features supply more detailed information about the objects.
We present a general algorithm how to find under those assumptions the optimal feature subset. Its performance is demonstrated experimentally in the space of moment-based descriptors of 1-D signals, which are invariant to linear filtering.

* * *

A NEW FUZZY ISODATA PROCEDURE FOR CLASSIFICATION
OF SYMBOLIC OBJECTS

K. Chidananda Gowda, T.V. Ravi, H.J. Nalini
S.J. College of Engineering
Mysore, India

A fuzzy ISODATA procedure for classification of symbolic objects is proposed. The procedure makes use of the similarity values in order to find out the class memberships. The descriptions of the clusters obtained are represented using a composite symbolic object. The proposed fuzzy ISODATA procedure is different from the conventional algorithm because it works on symbolic objects of complex types instead of simple numerical ones. The efficacy of the procedure is brought out by applying them on symbolic objects drawn from the domains of fat oil, microcomputers and botanical data sets. The experimental results are compared with the results of crisp versions.

* * *

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL
NETWORKS AS A SPECIAL CASE OF CONDITIONAL
EXPECTATION FORMULA

J. Grim
Department of Pattern Recognition,
Institute of Information Theory and Automation,
Prague, Czech Republic

Recently a new interesting architecture of neural networks called ``mixture of experts'' has been proposed as a tool of real multivariate approximation or classification. It is shown that, in some cases, the underlying problem of prediction can be solved by estimating the joint probability density of involved variables. Assuming the model of Gaussian mixtures we can explictly write the optimal minimum dispersion prediction formula which can be interpreted as a mixture-of-experts network. In this way the optimization problem reduces to standard estimation of normal mixtures by means of EM algorithm. The computational aspects are discussed in more detail.

* * *

NUMBER THEORETIC TRANSFORMS IN NEURAL NETWORK
IMAGE CLASSIFICATION.

T.P. Harte and R. Hanka
University of Cambridge, Cambridge, UK

Judicious use of number theory and abstract algebra can increase the efficiency of neural network pattern classifiers. We use the example of our neural network tissue classification of MRI of the breast to illustrate that efficient pattern recognition algorithms can be designed from algebraic fundamentals. The algorithm is based on number theoretic transform speed-up of an FFT-based convolution scheme.

* * *

PREDICTOR BASED ON FREQUENCY ANALYSIS OF THE LOCAL
CONFIGURATIONS USED FOR LOSSLESS IMAGE COMPRESSION

Vaclav Hlavac and Jaroslav Fojtik
Faculty of Electrical Engineering
Czech Technical University, Prague

The paper proposes the new method for lossless image compression that performs better than other techniques we are aware of. We developed further the Slessinger's idea to represent an image as residuals of a special local predictor. The predictor configurations in a binary image are grouped into a couples that differ in representative point only. Only residuals that correspond to the less frequent predictor from the couple is stored. An optimal predictor is based on the frequency of predictor configuration in the image. Two main extensions are proposed. (1) The method is generalized for grey-level image or images with even more degrees of freedom. (2) The method that works with addaptive estimator is proposed. The resulting FH-Adapt algorithm performs better than most of the current algorithms for the same purpose. The predictor can learn automatically from the compressed image and cope even with complicated fine textures. We are able to estimate the influence of the cells in a neighbourhood and use only significant ones for compression.

* * *

A COMPARISON OF LINEAR DIMENSION REDUCTION
ALGORITHMS USING DIFFERENT FEATURE EXTRACTION
METHODS

A. Jahn, J. M. Gloger, J. Franke
Daimler-Benz Research Center Ulm
Department of Text Understanding Ulm, Germany

This paper describes the behavior of linear algorithms for dimension reduction. The well known principal component analysis wil be compared to the linear discriminant analysis. The effects of sample set properties and feature extraction methods relating to dimension reduction are discussed. Some experimental results with different sample sets are presented for the both techniques.

* * *

STATISTICAL CLASSIFICATION TECHNIQUES:
OPTIMALITY AND ROBUSTNESS

Yurij Kharin
University of Minsk, Belarus

The problems of statistical classification of multivariate random observations are considered under distortions of the hypothetical models. Classification of types of distortions is presented. Estimates of the guaranteed upper risk (expected losses) of classification, critical distortion levels and robust decision rules are given for three types of distortions of hypothetical probability density functions.

* * *

A COMPARATIVE EVALUATION OF MEDIUM- AND
LARGE-SCALE FEATURE SELECTORS
FOR PATTERN CLASSIFIERS

Mineichi Kudo
Hokkaido University, Japan
Jack Sklansky
University of California, Irvine, USA

Needs of feature selection in medium and large problems (medium- and large-scale feature selection) increases in many fields including medical and image processing fields. Previous comparative studies of feature selection algorithms are not sufficient in problem size and in criterion function. In addition, no way has not shown to compare algorithms with different objectives. In this study, we propose a unified way to compare a large variety of algorithms. Our results on some medium and large problems show that the sequential floating algorithms promises for up to medium problems and genetic algorithms for medium and large problems.

* * *

DECOMPOSITION OF HIGH DIMENSIONAL PATTERN SPACES
FOR HIERARCHICAL CLASSIFICATION

Rajeev Kumar, Peter Rockett
University of Sheffield, UK

In this paper we present a novel approach to decomposing high dimensional spaces using a multiobjective genetic algorithm for identifying (near-)optimal subspaces for hierarchical classification. This strategy of pre-processing the data and explicitly optimising the partitions for subsequent mapping onro a hierarchical classifier is found to both reduce the learning complexity and the classification time with no degradation in overall classification error rate. Results of partitioning pattern spaces are presented and compared with various algorithms.

* * *

INTERPRETATION OF PATTERN CLASSIFICATION RESULTS
OBTAINED FROM A TEST SET

E. Nyssen
Free University of Brussel
Brussel, Belgium

The present paper presents and discusses a methodology for interpreting the results, obtained from the application of a pattern classifier to an independent test set. It addresses the problem of testing the random classification null hypothesis in the multiclass case, by introducing an exact probability technique. The discussion of this technique includes the presentation of an interval estimation technique for the probability of correct classification, which is slightly more accurate than the ones described in some statistics textbooks.

* * *

DECREASING DIMENSION OF FEATURE SPACE IN
CLASSIFICAL TASKS

Eva Ocelikova
Faculty of Electrical Engineering and Informatics,
Technical University of Kosice, Slovak Republic

A big number of monitored features in examined objects often complicates a technical realization of decision-making and extends the time necessary for providing a decision. It is possible to decrease dimensionality of the tasks and along with it not to decrease a quality of decision-making. The main subject of this contribution relates to some of the possible approaches. Basis of these methods stays in finding a linear transformation of original m-dimensional space of features into a new n-dimensional feature space, where n

* * *

LEARNING OBJECT REPRESENTATIONS
BY CLUSTERING BANANA WAVELET RESPONSES

Gabi Peters and Norbert Krueger
Ruhr-University of Bochum, Germany

For object recognition systems it is essential to have an internal representation of the object to be recognized. We introduce a system which learns such a representation from training images of an object class. Every image is preprocessed with banana wavelets. The result is a description of local areas of an image in terms of curved lines. These features are the input to a clustering algorithm which learns to seperate features specific for an object class from features generated accidentally by variations in background or illumination. This leads to a representation of an object class which can be visualized in form of a line drawing. The representation is sparse, for the most part free of redundancies and independent of varying backgrounds and illuminations in the training images. It comprises representative features only, and has already been utilized successfully for object recognition tasks.

* * *

TRANSFORMATION OF STRUCTURAL PATTERNS
OR DESCRETE EVENTS ? AN APPLICATION
OF STRUCTURAL METHODS IN DISCRETE EVENT SYSTEMS.

Jiri Pik
Institute of Information Theory and Automation
Prague, Czech Republic

An interesting analogy can be found between recognition of noisy, distored, or incomplete structural patterns and analysis, modelling and control of actual discrete event systems, where different types of uncertainty can occur.

* * *

INTRINSIC DIMENSIONALITY AND SMALL SAMPLE
PROPERTIES OF STATISTICAL CLASSIFIERS

Sarunas Raudys
Institute of Mathematics and Informatics
Vilnius, Lithuania

Known theoretical results concerning dimensionality sample size relationship are analyzed and show that for the parametric Euclidean distance (nearest means) classifier, non-parametric Parzen window and minimal empirical error classifiers, as well as a non-linear single-layer perceptron not the real, but an intrinsic dimensionality of the data should be taken into account while determining the small sample properties. An exact definition the "intrinsic dimensionality" depends both, on a true distribution of the pattern classes, and on the type of the classifier used.

* * *

CONSTRUCTION OF A NONLINEAR DISCRIMINATION
FUNCTION BASED ON THE MDL CRITERION

M.Sato, M.Kudo, J.Toyama and M.Shimbo
Hokkaido University
Sapporo, Japan

Although a nonlinear discrimination function may be superior to linear or quadratic classifiers, it is difficult to construct such a function. In this paper, we propose a method to construct a nonlinear discrimination function using Legenedre polynomials. The selection of an optimal set of Legendre polynomials is determined by the MDL (Minimum Description Length) criterion. Results using many real data show the effectiveness of this method.

* * *

THE USE OF TEXTURE AND SHAPE IN THE DISCRIMINATION
OF WEEDS AND CROPS

M.R. Scarr, C.C. Taylor, I.L. Dryden
Department of Statistics
University of Leeds, UK

Various discrimination techniques for the automatic classification and treatment of weeds in row crops are investigated. We discuss both parametric and non-parametric classifiers, and also the problem of predictor variable selection. The predictor variables include both texture and shape information. Texture is modelled using the Variogram and the second order Gaussian Markov Random Field (GMRF). Higher order GMRF's are also investigated, and Akaike's Information Criterion (AIC) is used to select an appropriate order model. Shape is described using simple measures based on the area and perimeter of a plant in a thresholded image. Classification results are then presented for a variety of row crops, using several different classifiers and combinations of suitably selected predictor variables.

* * *

SCALE DEPENDENCE OF NOISE INJECTION
IN PERCEPTRON TRAINING

M. Skurichina, R.P.W. Duin
Delft University of Technology
Delft, The Netherlands

In discriminant analysis one often runs into the small training sample size problem. One of the ways to overcome this problem is to add noise to the training objects. Sometimes the intrinsic dimensionality of the high dimensional data is smaller than dimensionality of the feature space, in which these data are represented. Moreover features can have large scale differences. In this case the injection of k-NN directed noise could be more preferable than the injection of the Gaussian noise. As the noise injection is data dependent, it is also dependent on data scaling. In this paper the effectiviness and the scale dependence of the injection of the Gaussian distributed noise and the k-NN directed noise on the performance of classifiers is studied for small training sample size.

* * *

PROBABILITY DISTRIBUTION OF TRANSFORMED
RANDOM VARIABLES WITH APPLICATION
TO NONLINEAR FEATURE EXTRACTION

Lubomir Soukup
Department of Image Processing
Institute of Information Theory and Automation
Prague, Czech Republic

A method for estimation of probability distribution of transformed random variables is presented. The proposed approach admits an approximation of the transformation of the random variables. The approximate probability density function (pdf) is corrected to obtain a resulting pdf which incorporates a prior knowledge of approximation errors. The corrected pdf is not contamined by any uncontrollable approximation. The method is applied to pattern recognition. It is shown that class conditional pdf of features can be easily computed even when the feature extraction was performed with nonlinear mapping of an input pattern.

* * *

COMPARISON BETWEEN PRODUCT AND MEAN CLASSIFIER
COMBINATION RULES

David M.J. Tax, Robert P.W. Duin and Martijn van Breukelen
Delft University of Technology
Delft, The Netherlands

To obtain better classification results, the outputs of an ensemble of classifiers can be comined insead of just choosing the best classifier. This combining is is often done by using a simple linear combination of the outputs of the classifiers or by using order statistics (using the order in the outputs for different classes). In this paper we will show that using the normalized product of the outputs of the classifiers can be more powerful for classification performance. We will show in which cases a product combination is to be preferred and where a cobination by averaging can be more useful. This will be supported by theoretical and experimental observations.

* * *

A SCRATCH REMOVAL METHOD

M. Haindl
Institute of Information Theory and Automation
Prague,
S. Simberova
Astronomical Institute,
Ondrejov, Czech Republic

This paper presents a new type of scratch removal algorithm based on a causal adaptive multidimensional prediction. The predictor use available information from the failed pixel surrounding due to spectral and spatial correlation of multispectral data but not any information from failed pixel itself. Predictor parameters cannot be directly identified so a special approximation is introduced.

* * *

PIECEWISE LINEAR CLASSIFIERS PRESERVING
HIGH LOCAL RECOGNITION RATES

H. Tenmoto, M. Kudo and M. Shimbo
Hokkaido University
Sapporo, Japan

We propose a new method to construct piecewise linear classifiers. This method constructs hyperplanes of a piecewise linear classifier so as to keep the correct recognition rate over a threshold for a training set. The threshold is determined automatically by the MDL (Minimum Description Length) criterion so as to avoid overfitting of the classifier to the training set. The proposed method showed better results in some experiments than a previous method.

* * *

THE BHATTACHARYYA METRIC AS AN ABSOLUTE SIMILARITY
MEASURE FOR FREQUENCY CODED DATA

N.A. Thacker, F.J. Aherne and P.I. Rockett
University of Sheffield, UK

This work highlights advantageous properties of the Bhattacharyya metric over the chi-squared statistic for comparing frequency distributed data. The original interpretation of the Bhattacharyya metric as a geometric similarity measure is reviewed and it is pointed out that this derivation is independent of the use of the Bhattacharyya measure as an upper-bound on the probability of misclassification in a 2-class problem. The affinity between the Bhattacharyya metric and the Matusita measures is described and we suggest use of the Bhattacharyya measure for comparing histogram data. We explain how the chi-squared statistics compensates for the implicit assumption of a Euclidean distance measure being the shortest path between two points in high dimensional space. By using the square-root transformation, the Bhattacharyya metric requires no such standardization and by its multiplicative nature has no singularity problems (from the denominator of the chi-squared statistic) with zero count-data.

* * *

SIMULATION AND FUSION OF ORDINAL DATA
IN A NONMETRIC MULTIDIMENSIONAL SCALING

L. Tsogo, M. H. Masson,
Centre de Recherches de Royallieu UMR CNRS 6599
Universite de Technologie de Compiegne, France
A. Bardot
Centre Technique Citroen, DRAS/RPA
V‚lizy-Villacoublay, France

It is of use to perform a Monte Carlo study in order to evaluate a Multidimensional Scaling (MDS) method. For nonmetric MDS, one may use ordinal data as dissimilarities. In this paper, we first propose a method to appropriately simulate ordinal data in such a study. Afterwards, we suggest an approach to fusion in an ordinal scale, interstimuli dissimilarities evaluations coming from different judges. Hence, it is found that the kind of fusion used is a mean to reduce the amount of error contained in data collected from judgesÝ evaluations. This data fusion is illustrated both on simulated and real data coming from 12 stimuli and 30 judges.

* * *

MODELLING CLUSTERS OF ARBITRARY SHAPE WITH
AGGLOMERATIVE - PARTIONAL CLUSTERING

E.W. Tyree and J.A. Long
Department of Business Computing Sytems
City University, Northamption Square
London EC1V OHB, United Kingdom

A problem with the modelling of clusters d- dimensional centroids is that centroids cannot relay much information about cluster shape - i.e. elongated, circular, irregular etc... The Agglomerative - Partitional Clustering (APC) methodology introduced here attempts to remedy this situation by joining together centroids coexisting within regions of relatively high density with line segments. Interconnected clusters are then modelled as the line segment as opposed to the original centroids. All interconnected centroids are treated as a single cluster.In addition, APC also allows the analyst to derive a hierarchical clustering tree based on inter cluster density as opposed to distance. Performance comparisons with other clustering techniques are given.

* * *

ABOUT THE MAXIMUM INFORMATION AND MAXIMUM
LIKELIHOOD PRINCIPLES IN NEURAL NETWORK

I. Vajda and J. Grim
Institute of Information Theory and Automation
Prague, Czech Republic

Neural networks with radial basis functions are considered, and the Shannon information in their output concerning input. The role of information preserving input transformations is discussed when the network is specified by the maximum information principle and by the maximum likelihood principle. A transformation is found which simplifies the input structure in the sense that it minimizes the entropy in the class of all information preserving transformations. Such transformation need not be unique - under some assumptions it may be any minimal sufficient statistics.

* * *

CONCEPTUAL BASE OF FEATURE SELECTION CONSULTING SYSTEM

P. Pudil, J. Novovicova, P. Somol, R. Vrnata
Institute of Information Theory and Automation
Prague, Czech Republic

The paper briefly reviews recent advances in the methodology of feature selection (FS) and the conceptual base of a consulting system for solving FS problems. It also attempts to provide a guideline which approach to choose with respect to the extent of a priori knowledge of the problem. The methods discussed here form the core of the software package being developed for solving FS problem. Two basic approaches are reviewed and the conditions under which they should be used are specified. One approach involves the use of the computationally effective Floating search methods. The alternative approach trades off the requirement for a priori information for the requirement of sufficient data to represent the distributions involved. Owing to its nature it is particularly suitable for cases when the underlying probability distributions are not unimodal. The approach attempts to achieve simultaneous feature selection and decision rule inference. According to the criterion adopted there are two variants allowing the selection of features either for optimal representation or discrimination.

return to the STIPR home page