Difference between revisions of "Main Page"

From HealthNLP-ShaRe
Jump to: navigation, search
(Who We Are)
 
(13 intermediate revisions by one user not shown)
Line 5: Line 5:
 
Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR).  The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing. Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S.
 
Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR).  The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing. Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S.
  
To accomplish this goal, we will address three specific aims each with a set of subaims:  
+
To accomplish this goal, we will address three specific aims each with a set of sub-aims:  
  
 
'''Aim 1''': Extend existing standards and develop a new consensus annotation schema for annotating clinical text in a way that is interoperable, extensible and usable  
 
'''Aim 1''': Extend existing standards and develop a new consensus annotation schema for annotating clinical text in a way that is interoperable, extensible and usable  
Line 26: Line 26:
 
'''Aim 3''': Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible  
 
'''Aim 3''': Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible  
  
*Build a starter NLP toolkit using the Mayo NLP System  
+
*Incorporate modules in Apache cTAKES using the Mayo NLP System  
  
 
*Design evaluation metrics for comparing automated annotations against the annotated corpus. Apply standard evaluation methods and develop new evaluation metrics for addressing complexities in evaluation from textual judgments, including no true gold standard and ways to compare frame-based annotations  
 
*Design evaluation metrics for comparing automated annotations against the annotated corpus. Apply standard evaluation methods and develop new evaluation metrics for addressing complexities in evaluation from textual judgments, including no true gold standard and ways to compare frame-based annotations  
Line 45: Line 45:
 
'''Columbia University'''               
 
'''Columbia University'''               
 
                                                  
 
                                                  
*Noemie Elhadad (PI)
+
*Noémie Elhadad (PI) [http://people.dbmi.columbia.edu/noemie]
 
*Amy Vogel
 
*Amy Vogel
 
*Sharon Lipsky-Gorman
 
*Sharon Lipsky-Gorman
Line 51: Line 51:
 
'''Boston Children's Hospital/Harvard Medical School'''
 
'''Boston Children's Hospital/Harvard Medical School'''
  
*Guergana Savova (PI)
+
*Guergana Savova (PI)[http://www.chip.org/guergana-savova]
*Sameer Pradhan
+
*Sameer Pradhan [http://chip.org/sameer-pradhan]
 
*David Harris
 
*David Harris
 
*Glenn Zaramba
 
*Glenn Zaramba
Line 59: Line 59:
 
'''University of Utah'''
 
'''University of Utah'''
  
*Wendy Chapman (PI)
+
*Wendy Chapman (PI) [http://medicine.utah.edu/faculty/mddetail.php?facultyID=u0073209]
 
*Brett South
 
*Brett South
  
Line 73: Line 73:
 
*Sumithra Vellupillai
 
*Sumithra Vellupillai
  
 +
'''In collaboration with'''
  
 
+
*John Pestian (Cincinnati Children's Hospital Medical Center)
 +
*James Pustejovsky (Brandeis University)
 +
*Mark Mandel (University of Pennsylvania)
 +
*Stephane Meystre (University of Utah)
  
  
Line 82: Line 86:
 
<td style="vertical-align:top">
 
<td style="vertical-align:top">
  
'''In collaboration with'''
+
'''Advisory Board'''
  
*John Pestian (Cincinnati Children's Hospital Medical Center)
+
*Michael Becich
*James Pustejovsky (Brandeis University)
+
*Christopher Chute
*Mark Mandel (University of Pennsylvania)
+
*Carol Friedman (Columbia University)
*Stephane Meystre (University of Utah)
+
*George Hripcsak (Columbia University)
 +
*Lawrence Hunter
 +
*Isaac Kohane (Boston Children's Hospital/Harvard Medical School)
 +
*Lynette Hirschman
 +
*Martha Palmer (University of Colorado)
  
 
</td>
 
</td>
Line 98: Line 106:
  
 
*Meystre, Stephane; Boonsirisumpun, Narong; Elhadad, Noemie; Savova, Guergana; Chapman, Wendy. 2014. Poster: Standards-based data model for clinical documents and information in the Shared Annotated Resources (ShARe) project. AMIA Summit on Clinical Research Informatics, San Francisco, CA.
 
*Meystre, Stephane; Boonsirisumpun, Narong; Elhadad, Noemie; Savova, Guergana; Chapman, Wendy. 2014. Poster: Standards-based data model for clinical documents and information in the Shared Annotated Resources (ShARe) project. AMIA Summit on Clinical Research Informatics, San Francisco, CA.
*Mowery, Danielle L; South, Brett R; Leng, Jianwei; Christensen, Lee; Velupillai, Sumithra; Murtola, Laura-Murtola; Salanterä, Sanna; Suominen, Hanna; Martinez David; Elhadad, Noemie; Pradhan, Sameer; Savova Guergana; Chapman, Wendy W. (under review). ShARe/CLEF eHealth 2013 Challenge Task 2: Normalizing acronyms and abbreviations to aid patient understanding of clinical texts. J Am Med Inform Assoc. 2014.
+
*Mowery, Danielle L; Franc, Daniel; Ashfaq, Shazia; Zamora, Tania; Cheng, Eric; Chapman, Wendy W; Chapman, Brian E. (2014). Developing a Knowledge Base for Detecting Carotid Stenosis with pyConText. AMIA Symp Proc.
*Mowery, Danielle L; Franc, Daniel; Ashfaq, Shazia; Zamora, Tania; Cheng, Eric; Chapman, Wendy W; Chapman, Brian E. (under review). Developing a Knowledge Base for Detecting Carotid Stenosis with pyConText. AMIA Symp Proc. 2014
+
*Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy; Savova, Guergana. (2014). Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. [http://jamia.bmj.com/content/early/2014/08/21/amiajnl-2013-002544.abstract] 
*Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy; Savova, Guergana. (under review). Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. 2014.  
+
*Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. (in press). Annotating the clinical text - MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
*South, Brett R; Mowery, Danielle L; Suo, Ying; Ferrández, Oscar; Meystre, Stephane M; Chapman, Wendy W. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. (under review). J Biomed Inform. Special Issue: Medical Privacy. 2014.  
+
*South, Brett R; Mowery, Danielle L; Suo, Ying; Ferrández, Oscar; Meystre, Stephane M; Chapman, Wendy W. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. (2014).J Biomed Inform. Special Issue: Medical Privacy. [http://www.ncbi.nlm.nih.gov/pubmed/24859155]
*South, Brett R; Mowery, Danielle L; Leng, Jianwei; Meystre, Stephane M; Chapman, Wendy W. A System Usability Study Assessing a Machine-Assisted Interactive Interface to Support Annotation of Protected Health Information in Clinical Texts. (under review). AMIA Symp Proc. 2014
+
*South, Brett R; Mowery, Danielle L; Leng, Jianwei; Meystre, Stephane M; Chapman, Wendy W. (2014) A System Usability Study Assessing a Machine-Assisted Interactive Interface to Support Annotation of Protected Health Information in Clinical Texts. AMIA Symp Proc.  
*Velupillai, Sumithra; Mowery, Danielle L; Christensen, Lee; Elhadad, Noemie; Pradhan, Sameer; Savova, Guergana; Chapman, Wendy W. Disease/Disorder Semantic Template Filling – Information Extraction Challenge in the ShARe/CLEF eHealth Evaluation Lab 2014. AMIA Symp Proc. 2014 (under review)
+
*Velupillai, Sumithra; Mowery, Danielle L; Christensen, Lee; Elhadad, Noemie; Pradhan, Sameer; Savova, Guergana; Chapman, Wendy W. Disease/Disorder Semantic Template Filling – Information Extraction Challenge in the ShARe/CLEF eHealth Evaluation Lab 2014. AMIA Symp Proc. 2014  
  
 
'''2013'''
 
'''2013'''
  
 
*Chapman, Wendy; Denny, Joshua; Haug, Peter; Meystre, Stephane; Patrick, Jon; Savova, Guergana; Solti, Imre; Uzuner, Ozlem; Xu, Hua. 2013. Panel: Natural language processing working group pre-symposium. AMIA Fall Annual Symposium.
 
*Chapman, Wendy; Denny, Joshua; Haug, Peter; Meystre, Stephane; Patrick, Jon; Savova, Guergana; Solti, Imre; Uzuner, Ozlem; Xu, Hua. 2013. Panel: Natural language processing working group pre-symposium. AMIA Fall Annual Symposium.
*Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. Journal of the American Medical Informatics Association. Doi:10.1136/amiajnl-2013-001766. http://jamia.bmj.com/content/early/2013/10/03/amiajnl-2013-001766.full
+
*Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. Journal of the American Medical Informatics Association. Doi:10.1136/amiajnl-2013-001766. [http://jamia.bmj.com/content/early/2013/10/03/amiajnl-2013-001766.full]
 
*Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. AMIA Fall Annual Symposium.
 
*Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. AMIA Fall Annual Symposium.
*Mowery, Danielle L; South, Brett R; Murtola, Laura-Maria; Salanterä, Sanna; Martinez David; Suominen, Hanna; Elhadad, Noemie; Pradhan, Sameer; Savova Guergana; Chapman, Wendy W. Task 2: ShARe/CLEF eHealth evaluation lab 2013. CLEF Proc. Valencia, Spain. 2013. http://www.nicta.com.au/pub?doc=7265&filename=nicta_publication_7265.pdf
+
*Mowery, Danielle L; South, Brett R; Murtola, Laura-Maria; Salanterä, Sanna; Martinez David; Suominen, Hanna; Elhadad, Noemie; Pradhan, Sameer; Savova Guergana; Chapman, Wendy W. Task 2: ShARe/CLEF eHealth evaluation lab 2013. CLEF Proc. Valencia, Spain. 2013. [http://www.nicta.com.au/pub?doc=7265&filename=nicta_publication_7265.pdf]
 
*Mowery, Danielle L; South, Brett R; Leng, Jianwei; Murtola, Laura-Maria; Danielsson-Ojala, Rita; Salanterä, Sanna; Chapman, Wendy W. Creating a reference standard of acronym and abbreviation annotations for the ShARe/CLEF eHealth challenge 2013.  AMIA Symp Proc. Washington, DC. 2013.
 
*Mowery, Danielle L; South, Brett R; Leng, Jianwei; Murtola, Laura-Maria; Danielsson-Ojala, Rita; Salanterä, Sanna; Chapman, Wendy W. Creating a reference standard of acronym and abbreviation annotations for the ShARe/CLEF eHealth challenge 2013.  AMIA Symp Proc. Washington, DC. 2013.
*Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy, and Savova, Guergana. 2013. Task 1: ShARe/CLEF eHealth Evaluation Lab 2013. Proceedings of the ShARE/CLEF Evaluation Lab 2013. http://www.nicta.com.au/pub?doc=7264&filename=nicta_publication_7264.pdf
+
*Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy, and Savova, Guergana. 2013. Task 1: ShARe/CLEF eHealth Evaluation Lab 2013. Proceedings of the ShARE/CLEF Evaluation Lab 2013. [http://www.nicta.com.au/pub?doc=7264&filename=nicta_publication_7264.pdf]
 
*Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2013. Panel: Shared resources, shared code, and shared activities in clinical natural language processing. AMIA Fall Annual Symposium.
 
*Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2013. Panel: Shared resources, shared code, and shared activities in clinical natural language processing. AMIA Fall Annual Symposium.
*Shaodian, Zhang and Elhadad, Noemie. 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of Biomedical Informatics. 46(6): 1088-1098.
+
*Shaodian, Zhang and Elhadad, Noemie. 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of Biomedical Informatics. 46(6): 1088-1098. [http://www.ncbi.nlm.nih.gov/pubmed/23954592]
*Suominen Hanna; Salantarä, Sanna; Velupillai, Sumithra; Chapman, Wendy W; Savova, Guergana; Elhadad, Noemie; Pradhan, Sameer; South, Brett R; Mowery, Danielle L, Leveling, Johannes; Kelly, Liadh; Goeuriot, Lorraine; Martinez, David; Zuccon, Guido. Overview of the ShARe/CLEF eHealth evaluation lab 2013. Springer LNCS. http://link.springer.com/chapter/10.1007%2F978-3-642-40802-1_24
+
*Suominen Hanna; Salantarä, Sanna; Velupillai, Sumithra; Chapman, Wendy W; Savova, Guergana; Elhadad, Noemie; Pradhan, Sameer; South, Brett R; Mowery, Danielle L, Leveling, Johannes; Kelly, Liadh; Goeuriot, Lorraine; Martinez, David; Zuccon, Guido. Overview of the ShARe/CLEF eHealth evaluation lab 2013. Springer LNCS. [http://link.springer.com/chapter/10.1007%2F978-3-642-40802-1_24]
  
 
'''2012'''
 
'''2012'''
Line 127: Line 135:
 
*CLEF/ShARe 2014 (in collaboration with the THYME project): http://clefehealth2014.dcu.ie/task-2
 
*CLEF/ShARe 2014 (in collaboration with the THYME project): http://clefehealth2014.dcu.ie/task-2
 
*SemEval 2014 Analysis of Clinical Text Task 7 (in collaboration with the THYME project): http://alt.qcri.org/semeval2014/task7/
 
*SemEval 2014 Analysis of Clinical Text Task 7 (in collaboration with the THYME project): http://alt.qcri.org/semeval2014/task7/
*SemEval 2015 Analysis of Clinical Text Task 7 (in collaboration with the THYME project)
+
*SemEval 2015 Analysis of Clinical Text Task 14 (in collaboration with the THYME project): http://alt.qcri.org/semeval2015/task14/
  
 
== Getting Access to the ShARe Corpus and Gold Standard Annotations ==
 
== Getting Access to the ShARe Corpus and Gold Standard Annotations ==

Latest revision as of 12:23, 2 December 2015

Welcome to the ShARe Project

Welcome to the Shared Annotated Resources (ShARe) project.

Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR). The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing. Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S.

To accomplish this goal, we will address three specific aims each with a set of sub-aims:

Aim 1: Extend existing standards and develop a new consensus annotation schema for annotating clinical text in a way that is interoperable, extensible and usable

  • Develop annotation schemas for the linguistic and clinical annotations
  • Determine the reliance on clinical terminologies and ontological knowledge
  • Develop annotation guidelines for the linguistic and clinical annotations

Aim 2: Develop and evaluate a manual annotation methodology that is efficient and accurate then apply the methodology to annotate a set of publicly available clinical texts

  • Establish an infrastructure for collecting annotations of clinical text
  • Develop an Efficient Methodology for Acquiring Accurate Annotations
  • Annotate and Evaluate the Final Annotation Set.

Aim 3: Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible

  • Incorporate modules in Apache cTAKES using the Mayo NLP System
  • Design evaluation metrics for comparing automated annotations against the annotated corpus. Apply standard evaluation methods and develop new evaluation metrics for addressing complexities in evaluation from textual judgments, including no true gold standard and ways to compare frame-based annotations
  • Organize a multi-track shared evaluation of clinical NLP systems
  • Dissemination plan

Funding

The project described is supported by Grant Number R01GM090187 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences. The project period is September, 2010 - June, 2014.

Who We Are

Columbia University

  • Noémie Elhadad (PI) [1]
  • Amy Vogel
  • Sharon Lipsky-Gorman

Boston Children's Hospital/Harvard Medical School

  • Guergana Savova (PI)[2]
  • Sameer Pradhan [3]
  • David Harris
  • Glenn Zaramba
  • Dmitriy Dligach

University of Utah

  • Wendy Chapman (PI) [4]
  • Brett South

University of Colorado

  • Martha Palmer
  • Tim O'Gorman
  • Philip Ogren

University of California San Diego

  • Danielle Mowery
  • Sumithra Vellupillai

In collaboration with

  • John Pestian (Cincinnati Children's Hospital Medical Center)
  • James Pustejovsky (Brandeis University)
  • Mark Mandel (University of Pennsylvania)
  • Stephane Meystre (University of Utah)


Advisory Board

  • Michael Becich
  • Christopher Chute
  • Carol Friedman (Columbia University)
  • George Hripcsak (Columbia University)
  • Lawrence Hunter
  • Isaac Kohane (Boston Children's Hospital/Harvard Medical School)
  • Lynette Hirschman
  • Martha Palmer (University of Colorado)

Publications and Presentations Crediting ShARe

2014

  • Meystre, Stephane; Boonsirisumpun, Narong; Elhadad, Noemie; Savova, Guergana; Chapman, Wendy. 2014. Poster: Standards-based data model for clinical documents and information in the Shared Annotated Resources (ShARe) project. AMIA Summit on Clinical Research Informatics, San Francisco, CA.
  • Mowery, Danielle L; Franc, Daniel; Ashfaq, Shazia; Zamora, Tania; Cheng, Eric; Chapman, Wendy W; Chapman, Brian E. (2014). Developing a Knowledge Base for Detecting Carotid Stenosis with pyConText. AMIA Symp Proc.
  • Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy; Savova, Guergana. (2014). Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. [5]
  • Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. (in press). Annotating the clinical text - MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
  • South, Brett R; Mowery, Danielle L; Suo, Ying; Ferrández, Oscar; Meystre, Stephane M; Chapman, Wendy W. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. (2014).J Biomed Inform. Special Issue: Medical Privacy. [6]
  • South, Brett R; Mowery, Danielle L; Leng, Jianwei; Meystre, Stephane M; Chapman, Wendy W. (2014) A System Usability Study Assessing a Machine-Assisted Interactive Interface to Support Annotation of Protected Health Information in Clinical Texts. AMIA Symp Proc.
  • Velupillai, Sumithra; Mowery, Danielle L; Christensen, Lee; Elhadad, Noemie; Pradhan, Sameer; Savova, Guergana; Chapman, Wendy W. Disease/Disorder Semantic Template Filling – Information Extraction Challenge in the ShARe/CLEF eHealth Evaluation Lab 2014. AMIA Symp Proc. 2014

2013

  • Chapman, Wendy; Denny, Joshua; Haug, Peter; Meystre, Stephane; Patrick, Jon; Savova, Guergana; Solti, Imre; Uzuner, Ozlem; Xu, Hua. 2013. Panel: Natural language processing working group pre-symposium. AMIA Fall Annual Symposium.
  • Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. Journal of the American Medical Informatics Association. Doi:10.1136/amiajnl-2013-001766. [7]
  • Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. AMIA Fall Annual Symposium.
  • Mowery, Danielle L; South, Brett R; Murtola, Laura-Maria; Salanterä, Sanna; Martinez David; Suominen, Hanna; Elhadad, Noemie; Pradhan, Sameer; Savova Guergana; Chapman, Wendy W. Task 2: ShARe/CLEF eHealth evaluation lab 2013. CLEF Proc. Valencia, Spain. 2013. [8]
  • Mowery, Danielle L; South, Brett R; Leng, Jianwei; Murtola, Laura-Maria; Danielsson-Ojala, Rita; Salanterä, Sanna; Chapman, Wendy W. Creating a reference standard of acronym and abbreviation annotations for the ShARe/CLEF eHealth challenge 2013. AMIA Symp Proc. Washington, DC. 2013.
  • Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy, and Savova, Guergana. 2013. Task 1: ShARe/CLEF eHealth Evaluation Lab 2013. Proceedings of the ShARE/CLEF Evaluation Lab 2013. [9]
  • Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2013. Panel: Shared resources, shared code, and shared activities in clinical natural language processing. AMIA Fall Annual Symposium.
  • Shaodian, Zhang and Elhadad, Noemie. 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of Biomedical Informatics. 46(6): 1088-1098. [10]
  • Suominen Hanna; Salantarä, Sanna; Velupillai, Sumithra; Chapman, Wendy W; Savova, Guergana; Elhadad, Noemie; Pradhan, Sameer; South, Brett R; Mowery, Danielle L, Leveling, Johannes; Kelly, Liadh; Goeuriot, Lorraine; Martinez, David; Zuccon, Guido. Overview of the ShARe/CLEF eHealth evaluation lab 2013. Springer LNCS. [11]

2012

  • Savova, Guergana. 2012. Shared Annotated Resources for the Clinical Domain. Invited presentation at the Natural Language Processing Working Group Pre-Symposium – doctoral consortium and a data workshop. AMIA Fall Symposium, Nov. 2012, Chicago IL
  • Savova, Guergana; Chapman, Wendy; Elhadad, Noemie. 2012. Shared Annotated Resources for the Clinical Domain. Invited presentation at the Natural Language Processing (NLP) Annotation workshop collocated with the 2nd annual IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, Sept. 2012. San Diego, CA

Shared NLP Tasks

Getting Access to the ShARe Corpus and Gold Standard Annotations

The ShARe corpus consists of deidentified clinical free-text notes from the MIMIC II database, version 2.5 (mimic.physionet.org). Notes were authored in the ICU setting and note types include discharge summaries, ECG reports, echo reports, and radiology reports (for more information about the MIMIC II database, please see the MIMIC User Guide: http://mimic.physionet.org/UserGuide/UserGuide.pdf).

  1. Obtain a human subjects training certificate. If you do not have a certificate, you can take the CITI training course (http://www.citiprogram.org/Default.asp) or the NIH training course (http://phrp.nihtraining.com/users/login.php)
  2. Go to the Physionet site: http://physionet.org/mimic2/mimic2_access.shtml
  3. Click on the link for “creating a PhysioNetWorks account” (near middle of page) (http://physionet.org/pnw/login) and follow the instructions.
  4. Go to this site and accept the terms of the DUA: http://physionet.org/works/MIMICIIClinicalDatabase/access.shtml
  5. You will receive an email telling you to fill in your information on the DUA and email it back with your human subjects training certificate.