Main Page

From HealthNLP-Cancer

Revision as of 14:20, 26 July 2017 by Guergana (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

1 Public Site
2 Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project
3 Who We Are
4 Funding
5 Publications and presentations crediting DeepPhe
6 DeepPhe Software
7 DeepPhe Gold Set
8 Qualitative Interviews
9 Project materials/ WIKIs to tasks
10 Communication
11 Scrum Sprints
12 Meeting Notes
13 Licensing
14 Contact
15 Getting started

Public Site

Please visit our Cancer Deep Phenotype (DeepPhe) public site at http://deepphe.healthnlp.org.

Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project

Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project.

Cancer is a genomic disease, with enormous heterogeneity in its behavior. In the past, our methods for categorization, prediction of outcome, and treatment selection have relied largely on a morphologic classification of Cancer. But new technologies are fundamentally reframing our views of cancer initiation, progression, metastasis, and response to treatment; moving us towards a molecular classification of Cancer. This transformation depends not only on our ability to deeply investigate the cancer genome, but also on our ability to link these specific molecular changes to specific tumor behaviors. As sequencing costs continue to decline at a supra-Moore’s law rate, a torrent of cancer genomic data is looming. However, our ability to deeply investigate the cancer genome is outpacing our ability to correlate these changes with the phenotypes that they produce. Translational investigators seeking to associate specific genetic, epigenetic, and systems changes with particular tumor behaviors, lack access to detailed observable traits about the cancer (the so called ‘deep phenotype’), which has now become a major barrier to research.

We propose the advanced development and extension of a software platform for performing deep phenotype extraction directly from medical records of patients with Cancer, with the goal of enabling translational cancer research and precision medicine. The work builds on previous informatics research and software development efforts from Boston Children’s Hospital and University of Pittsburgh groups, both individually and together. Multiple software projects developed by our groups (some initially funded by NCI) that have already passed the initial prototyping and pilot development phase (eMERGE, THYME, TIES, ODIE, Apache cTAKES) will be combined and extended to produce an advanced software platform for accelerating cancer research. Previous work in a number of NIH-funded translational science initiatives has already demonstrated the benefits of these methodologies (e.g. Electronic Medical Record and Genomics (eMERGE), PharmacoGenomics Research Network (PGRN), SHARPn, i2b2). However, to date these initiatives have focused exclusively on select non-cancer phenotypes and have had the goal of dichotomizing patients for a particular phenotype of interest (for example, Type II Diabetes, Rheumatoid Arthritis, or Multiple Sclerosis). In contrast, our proposed work focuses on extracting and representing multiple phenotype features for individual patients, to build a cancer phenotype model, relating observable traits over time for individual patients.

Our first four development specific aims significantly extend the capability of our current software, focusing on challenging problems in biomedical information extraction. These aims support the development and evaluation of novel methods for cancer deep phenotype extraction:

Specific Aim 1: Develop methods for extracting phenotypic profiles. Extract patient’s deep phenotypes, and their attributes such as general modifiers (negation, uncertainty, subject) and cancer specific characteristics (e.g. grade, invasion, lymph node involvement, metastasis, size, stage)

Specific Aim 2: Extract gene/protein mentions and their variants from the clinical narrative

Specific Aim 3: Create longitudinal representation of disease process and its resolution. Link phenotypes, treatments and outcomes in temporal associations to create a longitudinal abstraction of the disease

Specific Aim 4: Extract discourses containing explanations, speculations, and hypotheses, to support explorations of causality

Our last two implementation specific aims focus on the design of the software to support the cancer research community, ensuring the usability and utility of our software. These aims support the design, dissemination and sharing of the products of this work to maximize impact on cancer research:

Specific Aim 5: Design and implement a computational platform for deep phenotype discovery and analytics for translational investigators, including integrative visual analytics.

Specific Aim 6: Advance translational research in driving cancer biology research projects in breast cancer, ovarian cancer, and melanoma. Include research community throughout the design of the platform and its evaluation. Disseminate freely available software.

Impact: The proposed work will produce novel methods for extracting detailed phenotype information directly from the EMR, the major source of such data for patients with cancer. Extracted phenotypes will be used in three ongoing translational studies with a precision medicine focus. Dissemination of the software will enhance the ability of cancer researchers to abstract meaningful clinical data for translational research. If successful, systematic capture and representation of these phenotypes from EMR data could later be used to drive clinical genomic decision support.

Who We Are

Boston Childrens Hospital/Harvard Medical School
- Guergana Savova (MPI)
- Dmitriy Dligach
- Timothy Miller
- Sean Finan
- David Harris
- Chen Lin

University of Pittburgh
- Rebecca Crowley Jacobson (MPI)
- Harry Hochheiser
- Roger Day
- Adrian Lee
- Robert Edwards
- John Kirkwood
- Kevin Mitchell
- Eugene Tseytlin
- Girish Chavan
- Liz Legowski (through Jan 2015)
- Melissa Castine

Funding

The project described is supported by Grant Number 1U24CA184407-01 from the National Cancer Institute at the US National Institutes of Health. This work is part of the NCI's Informatics Technology for Cancer Research (ITCR) Initiative (http://itcr.nci.nih.gov/) The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The project period is May 2014 - April, 2019.

Publications and presentations crediting DeepPhe

Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
Lin, Chen; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016
Timothy A. Miller, Sean Finan, Dmitriy Dligach, Guergana Savova. Robust Sentence Segmentation for Clinical Text. Abstract presented at the Annual Symposium of the American Medical Informatics Association, San Francisco, CA, 2015.
Hochheiser, Harry; Castine, Melissa; Harris, David; Savova, Guergana; Jacobson, Rebecca. 2016. An Information Model for Cancer Phenotypes. BMC Medical Informatics and Decision Making. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-016-0358-4
Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). December 2016, Houston, Texas, USA
Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain.
Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
Timothy A. Miller, Dmitriy Dligach, Chen Lin, Steven Bethard, Guergana Savova. Feature Portability in Cross-domain Clinical Coreference. Abstract presented at the Annual Symposium of the American Medical Informatics Association, Chicago, IL, 2016.
Timothy A. Miller, Steven Bethard, Hadi Amiri, Guergana Savova. Unsupervised Domain Adaptation for Clinical Negation Detection. Proceedings of the 16th Workshop on Biomedical Natural Language Processing. 2017.
Timothy A. Miller, Dmitriy Dligach, Steven Bethard, Chen Lin, and Guergana Savova. Towards generalizable entity-centric coreference resolution. Journal of Biomedical Informatics, 69; 251-258. 2017.

DeepPhe Software

The DeepPhe system will be available as part of Apache cTAKES at http://ctakes.apache.org/. It is also available at https://github.com/DeepPhe/DeepPhe.

DeepPhe software components will also be deployed in the TIES Software System for sharing and accessing deidentified NLP-processed data with tissue(http://ties.pitt.edu/) which is deployed as part of the TIES Cancer Tissue Network (TCRN) across multiple US Cancer Centers.

DeepPhe software development will be coordinated as per software development policies.

DeepPhe software documentation for developers is available in

DeepPhe Gold Set

Process for Deidentification of Source Documents.
Process for Deidentification of Source Documents.
Process for Deidentification of Source Documents.
Process for Selection of Gold Set Source Documents.
DeepPhe UPMC Training/Development/Test splits
- training set:
  - all documents for Breast Cancer patients 03, 11, 92, 93 for a total of 48 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev)
  - all development documents for Melanoma patients 05, 06, 18, 19, 25, 28, 30, 33, 34, 42, for a total of 233 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma)
- development set:
  - all documents for Breast Cancer patients 02, 21 for a total of 42 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev)
  - all development documents for Melanoma patients 07, 32, 43 for a total of 215 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma)
- test set:
  - all documents for Breast Cancer patients 01, 16 for a total of 41 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); for phenotyping level testing use \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\
  - all testing documents for Melanoma patients 02, 03, 11, 12, 14, 16, 24, 27, 41, 44 for a total of 229 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma)
- use the training set for developing the algorithms and the development set to report results and error analysis. The test set will be used only for the final evaluation to go in publications.
SEER Project Train/Dev/Test Splits
Clinical Genomics Gold Set

Qualitative Interviews

Project materials/ WIKIs to tasks

Liquid Planner link (project management): https://app.liquidplanner.com/space/26220/dashboard
Templates for describing stakeholders.
Software development policies and repositories.
Data Repository and Policies.
Adopted Standards and Conventions for NLP annotations (task 1.4.2)
Gold Set Selection
Entity Mention and Template Evaluation Statistics
Phenotype Evaluation Statistics
Modeling
- Phenotyping Rules
- Breast Cancer Model
- Melanoma Model
- Ovarian Cancer Model
- Cancer phenotype modeling notes
- Layered cancer phenotyping
  - Episode modeling
- FHIR modeling
- Domain Modeling Notes/Questions
  - Breast Cancer Domain Notes/Questions
- Validation of models with domain experts
- Competency questions to be used for validation of models.
- Representations of the models.
- Historical pages
  - CEM Cancer phenotype models: models describing the original CEM Models
- Value decomposition issues https://docs.google.com/document/d/1riAHoLRdEmp4Ah9Z8NXN-ABkcAW9nnfNXQ5_md5rgYs/edit

Presentations

How to effectively use LiquidPlanner for DeepPhe: https://www.dropbox.com/s/1f6nkhx3yxh4v9q/LiquidPlanner%20for%20Deep-Phe.pptx
DeepPhe Rule Driven Architectures: https://www.dropbox.com/s/hl70zkvjs1ftt5a/DeepPhe%20Rule%20Driven%20Architectures.pptx

Communication

Bi-weekly team meetings
Tools we use for communication are listed in our Communications Plan .

Scrum Sprints

Meeting Notes

August 27, 2015 Research Meeting
August 3, 2015 Modeling Meeting
July 20, 2015 Modeling Meeting
July 7, 2015 Bi-weekly team meeting
July 1, 2015 Scrum Sprint - 1
June 26, 2015 Software architecture meeting
June 23, 2015 Bi-weekly team meeting
June 9, 2015 Bi-weekly team meeting
May 12, 2015 Team meeting:DeepPhe demo
May 5, 2015 Team meeting:DeepPhe demo
April 28, 2015 Bi-weekly team meeting
April 13, 2015 Bi-weekly team meeting
March 17, 2015 Bi-weekly team meeting
February 23, 2015 Model prioritization meeting
February 17, 2015 Bi-weekly team meeting
February 3, 2015 Bi-weekly team meeting
January 28, 2015 BCH team meeting
January 20, 2015 Bi-weekly team meeting
January 6, 2015 Bi-weekly team meeting
December 9, 2014 BCH team meeting
December 9, 2014 Bi-weekly team meeting
November 20, 2014 BCH team meeting
November 11, 2014 Bi-weekly team meeting
November 11, 2014 BCH team meeting
November 4, 2014 BCH team meeting
November 3, 2014 PI meeting
October 27, 2014 Bi-weekly team meeting: Avillach's presentation on tranSMART, cTAKES and PCORI
October 14, 2014 Bi-weekly team meeting: agenda and notes
September 30, 2014 Bi-weekly team meeting: agenda and notes
September 2, 2014 Bi-weekly team meeting: agenda and notes
August 19, 2014 Bi-weekly team meeting: agenda and notes
August 5, 2014 Bi-weekly team meeting: agenda and notes
July 22, 2014 Bi-weekly team meeting: agenda and notes
July 15, 2014 Bi-weekly team meeting: agenda and notes
July 10, 2014 Hochheiser visit to Savova group
June 24, 2014 Bi-weekly team meeting: agenda and notes
June 10, 2014 Bi-weekly team meeting: agenda and notes
June 3, 2014 All hands kick-off meeting
May 08, 2014 NCIP collaboration with UT (Bermstram/Xu)

Licensing

Licensing policies for DeepPhe software and ontological models.

Contact

If you need assistance or if you have further questions about the project, feel free to e-mail Guergana.Savova@childrens.harvard.edu or to Rebecca Crowley Jacobson at rebeccaj@pitt.edu.

Getting started

Consult the User's Guide for information on using the wiki software.

Retrieved from "https://healthnlp.hms.harvard.edu/cancer/wiki/index.php?title=Main_Page&oldid=2553"

Main Page

Contents

Public Site

Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project

Who We Are

Funding

Publications and presentations crediting DeepPhe

DeepPhe Software

DeepPhe Gold Set

Qualitative Interviews

Project materials/ WIKIs to tasks

Communication

Scrum Sprints

Meeting Notes

Licensing

Contact

Getting started

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools