Minimum Probabilistic Finite State Learning Problem on Finite Data Sets: Complexity, Solution and Approximations

Abstract

In this paper, we study the problem of determining a minimum state probabilistic finite state machine capable of generating statistically identical symbol sequences to samples provided. This problem is qualitatively similar to the classical Hidden Markov Model problem and has been studied from a practical point of view in several works beginning with the work presented in: Shalizi, C.R., Shalizi, K.L., Crutchfield, J.P. (2002) An algorithm for pattern discovery in time series. Technical Report 02-10-060, Santa Fe Institute. arxiv.org/abs/cs.LG/0210025. We show that the underlying problem is NP-hard and thus all existing polynomial time algorithms must be approximations on finite data sets. Using our NP-hardness proof, we show how to construct a provably correct algorithm for constructing a minimum state probabilistic finite state machine given data and empirically study its running time.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…