B(eo)W(u)LF: Facilitating recurrence analysis on multi-level language
Abstract
Discourse analysis may seek to characterize not only the overall composition of a given text but also the dynamic patterns within the data. This technical report introduces a data format intended to facilitate multi-level investigations, which we call the by-word long-form or B(eo)W(u)LF. Inspired by the long-form data format required for mixed-effects modeling, B(eo)W(u)LF structures linguistic data into an expanded matrix encoding any number of researchers-specified markers, making it ideal for recurrence-based analyses. While we do not necessarily claim to be the first to use methods along these lines, we have created a series of tools utilizing Python and MATLAB to enable such discourse analyses and demonstrate them using 319 lines of the Old English epic poem, Beowulf, translated into modern English.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.