Main Page

From HealthNLP-ShaRe
Revision as of 12:04, 3 June 2014 by AMYVOGEL (Talk | contribs)

Jump to: navigation, search

Welcome to the ShARe Project

Welcome to the Shared Annotated Resources (ShARe) project.

Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR). The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing. Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S.

To accomplish this goal, we will address three specific aims each with a set of subaims:

Aim 1: Extend existing standards and develop a new consensus annotation schema for annotating clinical text in a way that is interoperable, extensible and usable

  • Develop annotation schemas for the linguistic and clinical annotations
  • Determine the reliance on clinical terminologies and ontological knowledge
  • Develop annotation guidelines for the linguistic and clinical annotations

Aim 2: Develop and evaluate a manual annotation methodology that is efficient and accurate then apply the methodology to annotate a set of publicly available clinical texts

  • Establish an infrastructure for collecting annotations of clinical text
  • Develop an Efficient Methodology for Acquiring Accurate Annotations
  • Annotate and Evaluate the Final Annotation Set.

Aim 3: Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible

  • Build a starter NLP toolkit using the Mayo NLP System
  • Design evaluation metrics for comparing automated annotations against the annotated corpus. Apply standard evaluation methods and develop new evaluation metrics for addressing complexities in evaluation from textual judgments, including no true gold standard and ways to compare frame-based annotations
  • Organize a multi-track shared evaluation of clinical NLP systems
  • Dissemination plan

Funding

The project described is supported by Grant Number R01GM090187 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences. The project period is September, 2010 - June, 2014.

Who We Are