One typical approach to text structure is to view the text as an answer to a Question under Discussion (QUD). The text achieves a complete answer by dividing the central QUD into sub-questions and answering them in turn. The order in which the central question is divided into sub-questions, and the order in which the sub-questions are answered, mirrors the inner rationale and systematicity of the text structure. So far, theories of QUD structure have only been applied to the analysis of texts. The aim of this project is to test the validity of QUD approaches by testing whether an extracted question-subquestion hierarchy (QUD tree), representing text struture, can be used to recreate the original text.
The project proceeds in two phases: In the first phase (corpus annotation), a natural language corpus of newspaper articles is compiled and annotated with QUD-tree structures to represent their discourse structure and content. In the second phase (Natural Language Generation), texts are generated from these QUD-tree structures, and the similarity of the generated texts to the originals is evaluated. Special attention is given to discourse relations and their explicit discourse markers, non-at-issue content, evaluative and expressive adverbs, topic and focus, and sentence aggregation.