String Matching with Inversions and Translocations in Linear Average Time (Most of the Time)
Abstract
We present an efficient algorithm for finding all approximate occurrences of a given pattern p of length m in a text t of length n allowing for translocations of equal length adjacent factors and inversions of factors. The algorithm is based on an efficient filtering method and has an (nm(α, β))-time complexity in the worst case and ((α, β))-space complexity, where α and β are respectively the maximum length of the factors involved in any translocation and inversion. Moreover we show that under the assumptions of equiprobability and independence of characters our algorithm has a (n) average time complexity, whenever σ = ( m / 1-ε m), where ε > 0 and σ is the dimension of the alphabet. Experiments show that the new proposed algorithm achieves very good results in practical cases.