Main Page

Welcome to the ShARe Project

Welcome to the Shared Annotated Resources (ShARe) project.

Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR). The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing. Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S.

To accomplish this goal, we will address three specific aims each with a set of sub-aims:

Aim 1: Extend existing standards and develop a new consensus annotation schema for annotating clinical text in a way that is interoperable, extensible and usable

Develop annotation schemas for the linguistic and clinical annotations

Determine the reliance on clinical terminologies and ontological knowledge

Develop annotation guidelines for the linguistic and clinical annotations

Aim 2: Develop and evaluate a manual annotation methodology that is efficient and accurate then apply the methodology to annotate a set of publicly available clinical texts

Establish an infrastructure for collecting annotations of clinical text

Develop an Efficient Methodology for Acquiring Accurate Annotations

Annotate and Evaluate the Final Annotation Set.

Aim 3: Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible

Incorporate modules in Apache cTAKES using the Mayo NLP System

Design evaluation metrics for comparing automated annotations against the annotated corpus. Apply standard evaluation methods and develop new evaluation metrics for addressing complexities in evaluation from textual judgments, including no true gold standard and ways to compare frame-based annotations

Organize a multi-track shared evaluation of clinical NLP systems

Dissemination plan

Funding

The project described is supported by Grant Number R01GM090187 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences. The project period is September, 2010 - June, 2014.

Who We Are

Columbia University

Noémie Elhadad (PI) [1]
Amy Vogel
Sharon Lipsky-Gorman

Boston Children's Hospital/Harvard Medical School

Guergana Savova (PI)[2]
Sameer Pradhan [3]
David Harris
Glenn Zaramba
Dmitriy Dligach

University of Utah

Wendy Chapman (PI) [4]
Brett South

University of Colorado

Martha Palmer
Tim O'Gorman
Philip Ogren

University of California San Diego

Danielle Mowery
Sumithra Vellupillai

In collaboration with

John Pestian (Cincinnati Children's Hospital Medical Center)
James Pustejovsky (Brandeis University)
Mark Mandel (University of Pennsylvania)
Stephane Meystre (University of Utah)

Advisory Board

Michael Becich
Christopher Chute
Carol Friedman (Columbia University)
George Hripcsak (Columbia University)
Lawrence Hunter
Isaac Kohane (Boston Children's Hospital/Harvard Medical School)
Lynette Hirschman
Martha Palmer (University of Colorado)

Publications and Presentations Crediting ShARe

2014

Meystre, Stephane; Boonsirisumpun, Narong; Elhadad, Noemie; Savova, Guergana; Chapman, Wendy. 2014. Poster: Standards-based data model for clinical documents and information in the Shared Annotated Resources (ShARe) project. AMIA Summit on Clinical Research Informatics, San Francisco, CA.
Mowery, Danielle L; Franc, Daniel; Ashfaq, Shazia; Zamora, Tania; Cheng, Eric; Chapman, Wendy W; Chapman, Brian E. (2014). Developing a Knowledge Base for Detecting Carotid Stenosis with pyConText. AMIA Symp Proc.
Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy; Savova, Guergana. (2014). Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. [5]
Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. (in press). Annotating the clinical text - MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
South, Brett R; Mowery, Danielle L; Suo, Ying; Ferrández, Oscar; Meystre, Stephane M; Chapman, Wendy W. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. (2014).J Biomed Inform. Special Issue: Medical Privacy. [6]
South, Brett R; Mowery, Danielle L; Leng, Jianwei; Meystre, Stephane M; Chapman, Wendy W. (2014) A System Usability Study Assessing a Machine-Assisted Interactive Interface to Support Annotation of Protected Health Information in Clinical Texts. AMIA Symp Proc.
Velupillai, Sumithra; Mowery, Danielle L; Christensen, Lee; Elhadad, Noemie; Pradhan, Sameer; Savova, Guergana; Chapman, Wendy W. Disease/Disorder Semantic Template Filling – Information Extraction Challenge in the ShARe/CLEF eHealth Evaluation Lab 2014. AMIA Symp Proc. 2014

2013

Chapman, Wendy; Denny, Joshua; Haug, Peter; Meystre, Stephane; Patrick, Jon; Savova, Guergana; Solti, Imre; Uzuner, Ozlem; Xu, Hua. 2013. Panel: Natural language processing working group pre-symposium. AMIA Fall Annual Symposium.
Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. Journal of the American Medical Informatics Association. Doi:10.1136/amiajnl-2013-001766. [7]
Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. AMIA Fall Annual Symposium.
Mowery, Danielle L; South, Brett R; Murtola, Laura-Maria; Salanterä, Sanna; Martinez David; Suominen, Hanna; Elhadad, Noemie; Pradhan, Sameer; Savova Guergana; Chapman, Wendy W. Task 2: ShARe/CLEF eHealth evaluation lab 2013. CLEF Proc. Valencia, Spain. 2013. [8]
Mowery, Danielle L; South, Brett R; Leng, Jianwei; Murtola, Laura-Maria; Danielsson-Ojala, Rita; Salanterä, Sanna; Chapman, Wendy W. Creating a reference standard of acronym and abbreviation annotations for the ShARe/CLEF eHealth challenge 2013. AMIA Symp Proc. Washington, DC. 2013.
Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy, and Savova, Guergana. 2013. Task 1: ShARe/CLEF eHealth Evaluation Lab 2013. Proceedings of the ShARE/CLEF Evaluation Lab 2013. [9]
Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2013. Panel: Shared resources, shared code, and shared activities in clinical natural language processing. AMIA Fall Annual Symposium.
Shaodian, Zhang and Elhadad, Noemie. 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of Biomedical Informatics. 46(6): 1088-1098. [10]
Suominen Hanna; Salantarä, Sanna; Velupillai, Sumithra; Chapman, Wendy W; Savova, Guergana; Elhadad, Noemie; Pradhan, Sameer; South, Brett R; Mowery, Danielle L, Leveling, Johannes; Kelly, Liadh; Goeuriot, Lorraine; Martinez, David; Zuccon, Guido. Overview of the ShARe/CLEF eHealth evaluation lab 2013. Springer LNCS. [11]

2012

Savova, Guergana. 2012. Shared Annotated Resources for the Clinical Domain. Invited presentation at the Natural Language Processing Working Group Pre-Symposium – doctoral consortium and a data workshop. AMIA Fall Symposium, Nov. 2012, Chicago IL
Savova, Guergana; Chapman, Wendy; Elhadad, Noemie. 2012. Shared Annotated Resources for the Clinical Domain. Invited presentation at the Natural Language Processing (NLP) Annotation workshop collocated with the 2nd annual IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, Sept. 2012. San Diego, CA

Shared NLP Tasks

CLEF/ShARe 2013: http://sites.google.com/site/shareclefehealth/
CLEF/ShARe 2014 (in collaboration with the THYME project): http://clefehealth2014.dcu.ie/task-2
SemEval 2014 Analysis of Clinical Text Task 7 (in collaboration with the THYME project): http://alt.qcri.org/semeval2014/task7/
SemEval 2015 Analysis of Clinical Text Task 14 (in collaboration with the THYME project): http://alt.qcri.org/semeval2015/task14/

Getting Access to the ShARe Corpus and Gold Standard Annotations

The ShARe corpus consists of deidentified clinical free-text notes from the MIMIC II database, version 2.5 (mimic.physionet.org). Notes were authored in the ICU setting and note types include discharge summaries, ECG reports, echo reports, and radiology reports (for more information about the MIMIC II database, please see the MIMIC User Guide: http://mimic.physionet.org/UserGuide/UserGuide.pdf).

Obtain a human subjects training certificate. If you do not have a certificate, you can take the CITI training course (http://www.citiprogram.org/Default.asp) or the NIH training course (http://phrp.nihtraining.com/users/login.php)
Go to the Physionet site: http://physionet.org/mimic2/mimic2_access.shtml
Click on the link for “creating a PhysioNetWorks account” (near middle of page) (http://physionet.org/pnw/login) and follow the instructions.
Go to this site and accept the terms of the DUA: http://physionet.org/works/MIMICIIClinicalDatabase/access.shtml
You will receive an email telling you to fill in your information on the DUA and email it back with your human subjects training certificate.

Main Page

Contents

Welcome to the ShARe Project

Funding

Who We Are

Publications and Presentations Crediting ShARe

Shared NLP Tasks

Getting Access to the ShARe Corpus and Gold Standard Annotations

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools