The complexity of some regex crossword problems
Abstract
In a typical regular expression (regex) crossword puzzle, you are given two nonempty lists R1,…,Rm and C1,…,Cn of regular expressions over some alphabet, and your goal is to fill in an m× n grid with letters from that alphabet so that the string formed by the ith row is in L(Ri), and the string formed by the jth column is in L(Cj), for all 1 i m and 1 j n. Such a grid is a solution to the puzzle. It is known that determining whether a solution exists is NP-complete. We consider a number of restrictions and variants to this problem where all the Ri are equal to some regular expression R, and all the Cj are equal to some regular expression C. We call the solution to such a puzzle an (R,C)-crossword. Our main results are the following: 1. There exists a fixed regular expression C over the alphabet \0,1\ such that the following problem is NP-complete: "Given a regular expression R over \0,1\ and positive integers m and n given in unary, does an m× n (R,C)-crossword exist?" This improves the result mentioned above. 2. The following problem is NP-hard: "Given a regular expression E over \0,1\ and positive integers m and n given in unary, does an m× n (E,E)-crossword exist?" 3. There exists a fixed regular expression C over \0,1\ such that the following problem is undecidable (equivalent to the Halting Problem): "Given a regular expression R over \0,1\, does an (R,C)-crossword exist (of any size)?" 4. The following problem is undecidable (equivalent to the Halting Problem): "Given a regular expression E over \0,1\, does an (E,E)-crossword exist (of any size)?"