Text Mining

An alert reader will make connections between seemingly unrelated facts to generate new ideas or hypotheses. However, the burgeoning of published text means that even the most avid reader cannot hope to keep up with all the reading in a field, let alone adjacent fields. Nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature.

The Text Mining group offers solutions to the problem of data deluge by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. We develop software that analyses large collections of documents to discover previously unknown information. The information might be relationships or patterns that are buried in the document collection and which would otherwise be extremely difficult, if not impossible, to discover.

We apply our novel research methods to areas such as bioinformatics, systems biology, systems medicine, drug discovery, clinical trials, medical records, twitter analysis, social media and publishing.

We welcome students who want to study for an MPhil or PhD at the intersection of Natural Language Processing and Text Mining, in collaboration with other academic groups within the School and with other Faculties, with which we actively collaborate.



NaCTeM logoFACTA+  
FACTA+ search engine

A longstanding achievement of our text mining research is the National Centre for Text Mining (NaCTeM). This is the first publicly-funded text mining centre in the world. We have developed several text analytics tools and search services, including the following:

  • TerMine - a Term Management System which identifies key phrases in text.
  • AcroMine - an acronym dictionary used to find distinct expanded forms of acronyms from MEDLINE.
  • FACTA+ - a MEDLINE search engine for finding and visualising direct and indirect associations between concepts.
  • Europe PubMed Central - the European version of the PubMed Central repository, developed in collaboration with the European Bioinformatics Institute (EBI) and the British Library. NaCTeM is applying text mining solutions to enhance information retrieval and knowledge discovery.
  • Interoperable text mining platforms: U-Compare and Argo, both of which build upon the interoperable Unstructured Information Management Architecture (UIMA) to offer graphical environments to facilitate the rapid development of text mining workflows.
  • ASCOT -  search application that can help users to narrow down their search of clinical trial data efficiently, and assist in the creation of new protocols. See demonstration video.

For a full list of NaCTeM’s current and past projects, visit the NaCTeM website.


follow Follow NaCTeM on Twitter

Affiliated group members

Professor Sophia Ananiadou

Staff Profile

Prof Sophia Ananiadou

A Professor in Computer Science. She is the PI of several projects, including the National Centre for Text MiningEurope PubMed Central, and PathText. Prof Ananiadou is a founding member of the Special Interest Group, Biomedical Natural Language Processing (SIGBIOMED). She has been involved in the organisation of the BioNLP workshops and associated shared tasks since 2002.

Research interests: Advanced NLP techniques for biomedical and clinical text mining, bioinformatics, search systems based on semantic metadata, event extraction (EventMine), discourse analytics (negation, causality), large-scale terminological resource development (BioLexicon), annotated corpus creation meta-knowledge, anatomical entities and interoperable text mining platforms (U-CompareArgo).

Awards and achievements: IBM Innovation Award for interoperability in text mining in three consecutive years (2006, 2007, 2008), Daiwa Adrian Prize for her work in biomedical knowledge management, Japan Trust Award (1997). Her team achieved best performance in 2010 in the BioCreAtIvE III in protein-protein interaction (PPI) challenge.


follow Follow Sophia on Twitter
Prof Sophia Ananiadou's page.
▲ Up to the top