CAPTNS

Title: Computer Assisted Patient Note Scoring
Funding organisation: National Board of Medical Examiners
Ref: CAPTNS
Period: February 2010 - June 2011; Ongoing since February 2015
Representative publication: Brief reference in NBME Annual Report (2012)

This project is concerned with automatic assessment of student responses to the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills tests. These responses comprise patient notes taken by participants as a result of their interaction with actors playing the role of patients. The automatic scoring of these notes was built on know-how gained in the CAID project, applied to noisy sources containing a high level of linguistic variation with respect to spelling, abbreviation, and clinical "shorthand". This work was presented at the 2012 meeting of the National Council on Measurement in Education, held in Vancouver, Canada.

FIRST

Title: A Flexible Interactive Reading Support Tool
Funding organisation: European Commission (FP7-ICT)
Ref: FP7-287607
Period: October 2011 - September 2014
Representative publication: here

The goal of this project was to develop language technology to convert input documents into a more accessible form for readers with autistic spectrum disorder. This is done by removing obstacles to reading comprehension such as structural complexity (including long and complex sentences) and semantic ambiguity (including anaphora and figurative language). The text of the converted documents is supplemented with illustrative images, indicative summaries, document navigation aids, and pre-reading tasks aimed at improving reading comprehension. The system is personalisable to the needs of different users.

The software developed in the FIRST project is called OpenBook.

Machine Learning for the Study of Language Change

Title: Machine learning for the study of language change
Funding organisation: University of Wolverhampton
Ref: NA
Period: December 2012 - March 2013
Representative publication: doi: 10.1007/978-3-642-39593-2_24

In this work, a machine learning approach was applied to derive linguistic features that contribute to the success of a method to assign texts to categories representing different historical periods. The features that best discriminate between different categories of text are inferred to be salient for studies of language change. Experiments were conducted on the British portion of the ‘Brown family’ of corpora, using 30 different stylistic features. Performance of the classifier with feature selection using the Mann-Whitney U test and the CfsSubsetEval attribute selection algorithm was evaluated.

CAID

Title: Computer Aided Item Development
Funding organisation: National Board of Medical Examiners
Ref: CAID
Period: March 2007 - March 2011
Representative publication: doi: 10.1093/llc/fqr034

My involvement in this project concerned information extraction from clinical assessment items. The goal was to populate a database with information about clinical findings and the symptoms, anatomical locations, underlying body systems, and qualifying information associated with them. This research motivated my development of a systematic approach to text simplification.

BiRD

Title: Building Resource Databases for Researchers
Funding organisation: Economic and Social Research Council
Ref: RES-000-23-0010
Period: September 2003 - September 2006
Representative publication: doi: 10.1093/llc/fqm010

This project involved information extraction from email messages and specialised websites. The goal was to populate a database with information about employment vacancies, forthcoming conferences, and software and resources relevant to the field of computational linguistics. The experience of working on this project motivated my development of a method for named entity recognition in the open domain.

NERO

Title: Named Entity Recognition in the Open Domain
Funding organisation: University of Wolverhampton
Ref: NA
Period: August 1999 - July 2003
Representative publication: here

Named entity recognition often exploits specific resources for the detection of particular types of named entity. This makes developed systems ineffective in settings where the concepts/entities of interest are not known a priori. In this project, patterns proposed by Hearst (1992) are submitted as google queries to identify the hypernyms of all named entities occurring in a document. The hypernyms are clustered by their taxonomic similarity into general classes which correspond to particular concepts/types of entity. As a result, the system is able to identify the specific concepts most likely to be relevant in any document. The hypernym collection patterns, together with elements of the clusters, can then be used to tag the identified named entities accordingly.

MARS

Title: Mitkov's Anaphora Resolution System
Funding organisation: University of Wolverhampton
Ref: NA
Period: October 1998 - ongoing
Representative publication: doi:10.1007/3-540-45715-1_15

This project concerned the implementation, improvement, optimisation, and evaluation of Mitkov's (1998) knowledge-poor approach to anaphora resolution. This research motivated the development of systems to classify the function of the pronoun it (Evans, 2001) and to detect the animacy of noun phrases in English (Orasan and Evans, 2007).

Front page

General information about me.
Publications

Bibliographic information and electronic versions of my research papers and technical reports.
Resources

Information about language technologies and resources that I have helped to develop.