Discourse processing is a suite of Natural Language Processing (NLP) tasks to uncover linguistic structures from texts at several levels, which can support many text mining applications. This involves identifying the topic structure, the coherence structure, the coreference structure, and the conversation structure for conversational discourse. Taken together, these structures can inform text summarization, essay scoring, sentiment analysis, machine translation, information extraction, question answering, and thread recovery. The tutorial starts with an overview of basic concepts in discourse analysis – monologue vs. conversation, synchronous vs. asynchronous conversation, and key linguistic structures in discourse analysis. It then covers traditional machine learning methods along with the most recent works using deep learning, and compare their performances on benchmark datasets. For each discourse structure we describe, we show its applications in downstream text mining tasks.
Discourse Processing and Its Applications in Text Mining
Shafiq Joty, Giuseppe Carenini, Raymond Ng, and Gabriel Murray. In IEEE International Conference on Data Mining: Tutorial Abstracts (ICDM'18) , pages 1-2, 2018.
PDF Abstract BibTex Slides