Classification error in multiclass discrimination from Markov data

Abstract

As a model for an on-line classification setting we consider a stochastic process (X-n,Y-n)n, the present time-point being denoted by 0, with observables …,X-n,X-n+1,…, X-1, X0 from which the pattern Y0 is to be inferred. So in this classification setting, in addition to the present observation X0 a number l of preceding observations may be used for classification, thus taking a possible dependence structure into account as it occurs e.g. in an ongoing classification of handwritten characters. We treat the question how the performance of classifiers is improved by using such additional information. For our analysis, a hidden Markov model is used. Letting Rl denote the minimal risk of misclassification using l preceding observations we show that the difference k |Rl - Rl+k| decreases exponentially fast as l increases. This suggests that a small l might already lead to a noticeable improvement. To follow this point we look at the use of past observations for kernel classification rules. Our practical findings in simulated hidden Markov models and in the classification of handwritten characters indicate that using l=1, i.e. just the last preceding observation in addition to X0, can lead to a substantial reduction of the risk of misclassification. So, in the presence of stochastic dependencies, we advocate to use X-1,X0 for finding the pattern Y0 instead of only X0 as one would in the independent situation.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…