Resources
Demos of LT applications
Syntactic Complexity Sign tagger
This finetuned BERT model is a more accurate version of the CRF model presented in Dornescu et al. (2013) to classify signs of syntactic complexity. In this setting, signs of syntactic complexity are a predefined set of conjunctions, relative clauses, and punctuation marks. The tagger classifies them in accordance with the scheme described by Evans and Orasan (2013). There are three broad groups of signs: coordinators, left boudnaries of subordinate clauses, and right boundaries of subordinate clauses. The classification scheme is fine-grained, with numerous sub-classes of each of these three broad groups. The classifier was trained on the annotated corpus listed below (see signs of syntactic complexity). Sign tagging is an essential part of the sentence analysis exploited in our approach to sentence rewriting (Evans et al., 2014).
Public Google Colab Notebooks
Text Simplification
This notebook allows you to simplify syntactically complex sentences. The method uses 3 finetuned BERT models: one to identify syntactic coordinators and the boundaries of subordinate constituents, one to identify the spans of compound clauses, and one to identify the spans of complex syntactic constituents (containing subordinate clauses) occurring in input sentences. Sentences containing complex NPs and compound clauses are converted into sequences of simpler sentences. Each simple sentence contains a maximum of one conjoin of a compound clause, the matrix NP of a complex NP, or one predication of the subordinate clause modifying a complex NP in the original sentence. The approach is based on that presented in Evans (2020) but the finetuned BERT models make identification of complex NPs and compound clauses more accurate.Sign Tagger
Signs are lexical-punctuational markers of syntactic complexity. The sign tagger identifies the coordinating or bounding functions of items such as punctuation marks (,, ;, :), coordinators (and, but, or), complementisers (that), relative and wh- pronouns (what, when, where, which, while, who), and bigrams consisting of a punctuation mark followed by one of the lexical signs of syntactic complexity (, and, ; but, , which, , who, , when, etc.) This notebook allows you to finetune a BERT model to classify signs of syntactic complexity with respect to an annotation scheme based on that presented in Evans and Orasan (2013) and Chapter 2 of Evans (2020).Annotated corpora
-
Signs of syntactic complexity
This resource comprises three collections of text from the genres/domains of news, patient healthcare information, and literature. In each case, a subset of signs of syntactic complexity has been annotated with information about their syntactic linking and bounding functions. The subset includes conjunctions, complementisers, wh-words, and punctuation marks. From this page, you can access the annotation scheme, the annotation guidelines, and the corpus.
-
Front page
General information about me.
-
Research projects
Information about my previous and current research activities.
-
Publications
Bibliographic information and electronic versions of my research papers and technical reports.