How random are a learner's mistakes?
Abstract
Given a random binary sequence X(n) of random variables, Xt, t=1,2,...,n, for instance, one that is generated by a Markov source (teacher) of order k* (each state represented by k* bits). Assume that the probability of the event Xt=1 is constant and denote it by β. Consider a learner which is based on a parametric model, for instance a Markov model of order k, who trains on a sequence x(m) which is randomly drawn by the teacher. Test the learner's performance by giving it a sequence x(n) (generated by the teacher) and check its predictions on every bit of x(n). An error occurs at time t if the learner's prediction Yt differs from the true bit value Xt. Denote by (n) the sequence of errors where the error bit t at time t equals 1 or 0 according to whether the event of an error occurs or not, respectively. Consider the subsequence () of (n) which corresponds to the errors of predicting a 0, i.e., () consists of the bits of (n) only at times t such that Yt=0. In this paper we compute an estimate on the deviation of the frequency of 1s of () from β. The result shows that the level of randomness of () decreases relative to an increase in the complexity of the learner.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.