We propose a multistage recognition method built as a cascade of a multi-layer perceptron (MLP) and a k-nearest neighbor (k-NN) classifier. MLP, being a distributed method, generalizes to learn a "rule" and the k-NN, being a local method, learns the localized "exceptions" rejected by the "rule". Because the rule-learner handles a large percentage of the examples using a simple and general rule, only a small subset of the training set is stored as exceptions during training. Similarly during testing, most patterns are handled by the MLP and few are handled by k-NN thus causing only a small increase in memory and computation. A multistage method like cascading is a better approach than multiexpert methods like voting and stacking where all learners are used for all cases; the extra computation and memory for the second learner is unnecessary if we are sufficiently certain that the first one's response is correct. We discuss how such a system can be trained using cross validation. This method is tested on the real-world application of handwritten digit recognition.
An approach for pattern classification of inexact information is considered. This approach is based on the recursive analysis of the information. Proposed technique allows to find out logical regularities in initial information.
Detecting a cluster structure is considered. This means solving either the problem of discovering a natural decomposition of data points into groups (clusters) or the problem of detecting clouds of data points of a specific form. In this paper both these problems are considered. To discover a cluster structure of a specific arrangement or a cloud of data of a specific form a class of nonlinear projections is introduced. Fitness functions that estimate to what extent a given subset of data points (in the form of the corresponding projection) represents a good solution for the first or the second problem are presented. To find a good solution one uses a search and optimization procedure in the form of Evolutionary Programming. The problems of cluster validity and robustness of algorithms are discussed. Examples of applications are presented.
Classifiers can be combined to reduce classification errrors. Experiments were done on real data consisting of different sets of features of handwritten digits. Different types of classifiers were trained on these feature sets. The performances of these classifiers and combination rules were tested. The best results were acquired with the mean, median and product combination rules. The product was best for combining linear classifiers, the median for k-nearest neighbour classifiers. Training a classifier on all features did not result in less classification errors
In this paper, feature selection for classification of remote-sensing images is addressed. In particular, the attention is focused on feature selection approaches for multiclass cases. Several criteria proposed in literature and one proposed by the authors are compared with another. Experiments on two remote-sensing data sets are presented and discussed. These experiments confirm the interest of the proposed criterion which performs slightly better than all the others considered in the paper.
In this paper we consider the problem of temperature estimation using thermochromic paint. We present a novel framework for recovering the temperature with an associated estimate of the covariance.
An iterative fuzzy clustering method is proposed to partition a set of multivariate binary observation vectors located at neighboring geographic sites. The method described here applies in a binary setup a recently proposed algorithm, called Neighborhood EM, which seeks a partition that is both well clustered in the feature space and spatially regular. This approach is derived from the EM algorithm applied to mixture models viewed as an alternate optimization method. The criterion optimized by EM is penalized by a spatial smoothing term that favors classes having many neighbors. The resulting algorithm has a structure similar to EM, with an unchanged M-step and an iterative E-step. The criterion optimized by Neighborhood EM is closely related to a posterior distribution with a multilevel logistic Markov random field as prior. The application of this approach to binary data relies on a mixture of multivariate Bernoulli distributions. Experiments on simulated spatial binary data yield encouraging results.
In this paper the possibilities are investigated for training statistical pattern recognizers based on a distance represation of the objects instead of a feature representation. Distances or similarities are used between the unknown object to be classified with a selected subset of the training objects (the support objects). These distances are combined into linear or nonlinear classifiers. In this approach the problem of finding of discriminative features is replaced by finding good (dis)similarity measures. The proposal coresponds with determining classification functions in Hilbert space using an infinite feature set. It is a direct consequence of Vapnik's support vector classifier.
Prototype selection (PS) techniques have traditionally been applied prior to Nearest Neighbour (NN) classification rules both to improve its accuracy (editing) and to alleviate its computational burden (condensing). Methods based on selecting/discarding prototypes and methods based on adapting prototypes have been separately introduced to deal with this problem. Different approaches to this problem are considered in this paper and their main advantages and drawbacks are pointed out along with some suggestions for their joint application in some cases.
This paper introduces a novel method for selecting
a feature subset yielding an optimal trade-off
between class separability and feature space dimensionality.
We assume the following feature properties:
(a) the features are ordered into a sequence,
(b) robustness of the features decreases
with an increasing order and
(c) higher-order features supply more detailed information about the
objects.
We present a general algorithm how to
find under those assumptions the optimal feature
subset. Its performance is demonstrated experimentally
in the space of moment-based descriptors of
1-D signals, which are invariant to linear filtering.
A fuzzy ISODATA procedure for classification of symbolic objects is proposed. The procedure makes use of the similarity values in order to find out the class memberships. The descriptions of the clusters obtained are represented using a composite symbolic object. The proposed fuzzy ISODATA procedure is different from the conventional algorithm because it works on symbolic objects of complex types instead of simple numerical ones. The efficacy of the procedure is brought out by applying them on symbolic objects drawn from the domains of fat oil, microcomputers and botanical data sets. The experimental results are compared with the results of crisp versions.
Recently a new interesting architecture of neural networks called ``mixture of experts'' has been proposed as a tool of real multivariate approximation or classification. It is shown that, in some cases, the underlying problem of prediction can be solved by estimating the joint probability density of involved variables. Assuming the model of Gaussian mixtures we can explictly write the optimal minimum dispersion prediction formula which can be interpreted as a mixture-of-experts network. In this way the optimization problem reduces to standard estimation of normal mixtures by means of EM algorithm. The computational aspects are discussed in more detail.
Judicious use of number theory and abstract algebra can increase the efficiency of neural network pattern classifiers. We use the example of our neural network tissue classification of MRI of the breast to illustrate that efficient pattern recognition algorithms can be designed from algebraic fundamentals. The algorithm is based on number theoretic transform speed-up of an FFT-based convolution scheme.
The paper proposes the new method for lossless image compression that performs better than other techniques we are aware of. We developed further the Slessinger's idea to represent an image as residuals of a special local predictor. The predictor configurations in a binary image are grouped into a couples that differ in representative point only. Only residuals that correspond to the less frequent predictor from the couple is stored. An optimal predictor is based on the frequency of predictor configuration in the image. Two main extensions are proposed. (1) The method is generalized for grey-level image or images with even more degrees of freedom. (2) The method that works with addaptive estimator is proposed. The resulting FH-Adapt algorithm performs better than most of the current algorithms for the same purpose. The predictor can learn automatically from the compressed image and cope even with complicated fine textures. We are able to estimate the influence of the cells in a neighbourhood and use only significant ones for compression.
This paper describes the behavior of linear algorithms for dimension reduction. The well known principal component analysis wil be compared to the linear discriminant analysis. The effects of sample set properties and feature extraction methods relating to dimension reduction are discussed. Some experimental results with different sample sets are presented for the both techniques.
The problems of statistical classification of multivariate random observations are considered under distortions of the hypothetical models. Classification of types of distortions is presented. Estimates of the guaranteed upper risk (expected losses) of classification, critical distortion levels and robust decision rules are given for three types of distortions of hypothetical probability density functions.
Needs of feature selection in medium and large problems (medium- and large-scale feature selection) increases in many fields including medical and image processing fields. Previous comparative studies of feature selection algorithms are not sufficient in problem size and in criterion function. In addition, no way has not shown to compare algorithms with different objectives. In this study, we propose a unified way to compare a large variety of algorithms. Our results on some medium and large problems show that the sequential floating algorithms promises for up to medium problems and genetic algorithms for medium and large problems.
In this paper we present a novel approach to decomposing high dimensional spaces using a multiobjective genetic algorithm for identifying (near-)optimal subspaces for hierarchical classification. This strategy of pre-processing the data and explicitly optimising the partitions for subsequent mapping onro a hierarchical classifier is found to both reduce the learning complexity and the classification time with no degradation in overall classification error rate. Results of partitioning pattern spaces are presented and compared with various algorithms.
The present paper presents and discusses a methodology for interpreting the results, obtained from the application of a pattern classifier to an independent test set. It addresses the problem of testing the random classification null hypothesis in the multiclass case, by introducing an exact probability technique. The discussion of this technique includes the presentation of an interval estimation technique for the probability of correct classification, which is slightly more accurate than the ones described in some statistics textbooks.
A big number of monitored features in examined objects often complicates a technical realization of
decision-making and extends the time necessary for
providing a decision. It is possible to decrease
dimensionality of the tasks and along with it not
to decrease a quality of decision-making. The main
subject of this contribution relates to some of
the possible approaches. Basis of these methods
stays in finding a linear transformation of original m-dimensional space of features into a new
n-dimensional feature space, where n
For object recognition systems it is essential to
have an internal representation of the object to
be recognized. We introduce a system which learns
such a representation from training images of an
object class. Every image is preprocessed with banana wavelets. The result is a description of local areas of an image in terms of curved lines.
These features are the input to a clustering algorithm which learns to seperate features specific
for an object class from features generated accidentally by variations in background or illumination. This leads to a representation of an object
class which can be visualized in form of a line
drawing. The representation is sparse, for the
most part free of redundancies and independent of
varying backgrounds and illuminations in the training images. It comprises representative features
only, and has already been utilized successfully
for object recognition tasks.
An interesting analogy can be found between recognition of noisy, distored, or incomplete structural patterns and analysis, modelling and control
of actual discrete event systems, where different
types of uncertainty can occur.
Known theoretical results concerning dimensionality sample size relationship are analyzed and show
that for the parametric Euclidean distance (nearest means) classifier, non-parametric Parzen window and minimal empirical error classifiers, as
well as a non-linear single-layer perceptron not
the real, but an intrinsic dimensionality of the
data should be taken into account while determining the small sample properties. An exact definition the "intrinsic dimensionality" depends both,
on a true distribution of the pattern classes, and
on the type of the classifier used.
Although a nonlinear discrimination function may
be superior to linear or quadratic classifiers, it
is difficult to construct such a function. In this
paper, we propose a method to construct a nonlinear discrimination function using Legenedre polynomials. The selection of an optimal set of Legendre
polynomials is determined by the MDL (Minimum Description Length) criterion. Results using many real data show the effectiveness of this method.
Various discrimination techniques for the automatic classification and treatment of weeds in row
crops are investigated. We discuss both parametric
and non-parametric classifiers, and also the problem of predictor variable selection. The predictor variables include both texture and shape information. Texture is modelled using the Variogram
and the second order Gaussian Markov Random Field
(GMRF). Higher order GMRF's are also investigated,
and Akaike's Information Criterion (AIC) is used
to select an appropriate order model. Shape is
described using simple measures based on the area
and perimeter of a plant in a thresholded image.
Classification results are then presented for
a variety of row crops, using several different
classifiers and combinations of suitably selected
predictor variables.
In discriminant analysis one often runs into the
small training sample size problem. One of the ways to overcome this problem is to add noise to the
training objects. Sometimes the intrinsic dimensionality of the high dimensional data is smaller
than dimensionality of the feature space, in which
these data are represented. Moreover features can
have large scale differences. In this case the injection of k-NN directed noise could be more preferable than the injection of the Gaussian noise.
As the noise injection is data dependent, it is
also dependent on data scaling. In this paper the
effectiviness and the scale dependence of the injection of the Gaussian distributed noise and the
k-NN directed noise on the performance of classifiers is studied for small training sample size.
A method for estimation of probability distribution of transformed random variables is presented.
The proposed approach admits an approximation of
the transformation of the random variables. The
approximate probability density function (pdf) is
corrected to obtain a resulting pdf which incorporates a prior knowledge of approximation errors.
The corrected pdf is not contamined by any uncontrollable approximation. The method is applied to
pattern recognition. It is shown that class conditional pdf of features can be easily computed even
when the feature extraction was performed with
nonlinear mapping of an input pattern.
To obtain better classification results, the outputs of an ensemble of classifiers can be comined
insead of just choosing the best classifier. This
combining is is often done by using a simple linear combination of the outputs of the classifiers
or by using order statistics (using the order in
the outputs for different classes). In this paper
we will show that using the normalized product of
the outputs of the classifiers can be more powerful for classification performance. We will show
in which cases a product combination is to be preferred and where a cobination by averaging can be
more useful. This will be supported by theoretical
and experimental observations.
This paper presents a new type of scratch removal algorithm
based on a causal adaptive multidimensional prediction.
The predictor use available information from the failed pixel surrounding due
to spectral and spatial correlation of multispectral data but not
any information from failed pixel itself.
Predictor parameters cannot be directly identified so a special
approximation is introduced.
We propose a new method to construct piecewise linear classifiers. This method constructs hyperplanes of a piecewise linear classifier so as to keep
the correct recognition rate over a threshold for
a training set. The threshold is determined automatically by the MDL (Minimum Description Length)
criterion so as to avoid overfitting of the classifier to the training set. The proposed method
showed better results in some experiments than
a previous method.
This work highlights advantageous properties of
the Bhattacharyya metric over the chi-squared statistic for comparing frequency distributed data.
The original interpretation of the Bhattacharyya
metric as a geometric similarity measure is reviewed and it is pointed out that this derivation is
independent of the use of the Bhattacharyya measure as an upper-bound on the probability of misclassification in a 2-class problem. The affinity
between the Bhattacharyya metric and the Matusita
measures is described and we suggest use of the
Bhattacharyya measure for comparing histogram data. We explain how the chi-squared statistics
compensates for the implicit assumption of a
Euclidean distance measure being the shortest path
between two points in high dimensional space. By
using the square-root transformation, the
Bhattacharyya metric requires no such
standardization and by its multiplicative nature
has no singularity problems (from the denominator
of the chi-squared statistic) with zero
count-data.
It is of use to perform a Monte Carlo study in order to evaluate a Multidimensional Scaling (MDS)
method. For nonmetric MDS, one may use ordinal data as dissimilarities. In this paper, we first
propose a method to appropriately simulate ordinal
data in such a study. Afterwards, we suggest an
approach to fusion in an ordinal scale, interstimuli dissimilarities evaluations coming from different judges. Hence, it is found that the kind of
fusion used is a mean to reduce the amount of error contained in data collected from judgesÝ evaluations. This data fusion is illustrated both on
simulated and real data coming from 12 stimuli and
30 judges.
A problem with the modelling of clusters d- dimensional centroids is that centroids cannot relay
much information about cluster shape - i.e. elongated, circular, irregular etc... The Agglomerative - Partitional Clustering (APC) methodology introduced here attempts to remedy this situation by
joining together centroids coexisting within regions of relatively high density with line segments.
Interconnected clusters are then modelled as the
line segment as opposed to the original centroids.
All interconnected centroids are treated as a single cluster.In addition, APC also allows the analyst to derive a hierarchical clustering tree based on inter cluster density as opposed to distance. Performance comparisons with other clustering
techniques are given.
Neural networks with radial basis functions are
considered, and the Shannon information in their
output concerning input. The role of information
preserving input transformations is discussed when
the network is specified by the maximum information principle and by the maximum likelihood principle. A transformation is found which simplifies
the input structure in the sense that it minimizes
the entropy in the class of all information preserving transformations. Such transformation need
not be unique - under some assumptions it may be
any minimal sufficient statistics.
The paper briefly reviews recent advances in the methodology of feature
selection (FS) and the conceptual base of a consulting system for
solving FS problems. It also attempts to provide a guideline which
approach to choose with respect to the extent of a priori knowledge of
the problem. The methods discussed here form the core of the software
package being developed for solving FS problem. Two basic approaches are
reviewed and the conditions under which they should be used are specified.
One approach involves the use of the computationally effective Floating
search methods. The alternative approach trades off the requirement for
a priori information for the requirement of sufficient data to represent
the distributions involved. Owing to its nature it is particularly
suitable for cases when the underlying probability distributions are not
unimodal. The approach attempts to achieve simultaneous feature selection
and decision rule inference. According to the criterion adopted there are
two variants allowing the selection of features either for optimal
representation or discrimination.
LEARNING OBJECT REPRESENTATIONS
Gabi Peters and Norbert Krueger
BY CLUSTERING BANANA WAVELET RESPONSES
Ruhr-University of Bochum, Germany
TRANSFORMATION OF STRUCTURAL PATTERNS
Jiri Pik
OR DESCRETE EVENTS ? AN APPLICATION
OF STRUCTURAL METHODS IN DISCRETE EVENT SYSTEMS.
Institute of Information Theory and Automation
Prague, Czech Republic
INTRINSIC DIMENSIONALITY AND SMALL SAMPLE
Sarunas Raudys
PROPERTIES OF STATISTICAL CLASSIFIERS
Institute of Mathematics and Informatics
Vilnius, Lithuania
CONSTRUCTION OF A NONLINEAR DISCRIMINATION
M.Sato, M.Kudo, J.Toyama and M.Shimbo
FUNCTION BASED ON THE MDL CRITERION
Hokkaido University
Sapporo, Japan
THE USE OF TEXTURE AND SHAPE IN THE DISCRIMINATION
M.R. Scarr, C.C. Taylor, I.L. Dryden
OF WEEDS AND CROPS
Department of Statistics
University of Leeds, UK
SCALE DEPENDENCE OF NOISE INJECTION
M. Skurichina, R.P.W. Duin
IN PERCEPTRON TRAINING
Delft University of Technology
Delft, The Netherlands
PROBABILITY DISTRIBUTION OF TRANSFORMED
Lubomir Soukup
RANDOM VARIABLES WITH APPLICATION
TO NONLINEAR FEATURE EXTRACTION
Department of Image Processing
Institute of Information Theory and Automation
Prague, Czech Republic
COMPARISON BETWEEN PRODUCT AND MEAN CLASSIFIER
David M.J. Tax, Robert P.W. Duin and Martijn van Breukelen
COMBINATION RULES
Delft University of Technology
Delft, The Netherlands
A SCRATCH REMOVAL METHOD
M. Haindl
Institute of Information Theory and Automation
Prague,
S. Simberova
Astronomical Institute,
Ondrejov, Czech Republic
PIECEWISE LINEAR CLASSIFIERS PRESERVING
H. Tenmoto, M. Kudo and M. Shimbo
HIGH LOCAL RECOGNITION RATES
Hokkaido University
Sapporo, Japan
THE BHATTACHARYYA METRIC AS AN ABSOLUTE SIMILARITY
N.A. Thacker, F.J. Aherne and P.I. Rockett
MEASURE FOR FREQUENCY CODED DATA
University of Sheffield, UK
SIMULATION AND FUSION OF ORDINAL DATA
L. Tsogo, M. H. Masson,
IN A NONMETRIC MULTIDIMENSIONAL SCALING
Centre de Recherches de Royallieu UMR CNRS 6599
Universite de Technologie de Compiegne, France
A. Bardot
Centre Technique Citroen, DRAS/RPA
V‚lizy-Villacoublay, France
MODELLING CLUSTERS OF ARBITRARY SHAPE WITH
E.W. Tyree and J.A. Long
AGGLOMERATIVE - PARTIONAL CLUSTERING
Department of Business Computing Sytems
City University, Northamption Square
London EC1V OHB, United Kingdom
ABOUT THE MAXIMUM INFORMATION AND MAXIMUM
I. Vajda and J. Grim
LIKELIHOOD PRINCIPLES IN NEURAL NETWORK
Institute of Information Theory and Automation
Prague, Czech Republic
CONCEPTUAL BASE OF FEATURE SELECTION CONSULTING SYSTEM
P. Pudil, J. Novovicova, P. Somol, R. Vrnata
Institute of Information Theory and Automation
Prague, Czech Republic
return to the STIPR home
page