Difference between revisions of "Main Page"

From HealthNLP-Cancer
Jump to: navigation, search
(Scrum Sprints)
(Scrum Sprints)
(131 intermediate revisions by 8 users not shown)
Line 1: Line 1:
 +
== Public Site ==
 +
'''Please visit our Cancer Deep Phenotype (DeepPhe) ''public site'' at [http://healthnlp.hms.harvard.edu/deepphe/wiki http://deepphe.healthnlp.org].'''
 +
 +
  
 
== Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project ==
 
== Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project ==
Line 9: Line 13:
 
Our first four development specific aims significantly extend the capability of our current software, focusing on challenging problems in biomedical information extraction. These aims support the development and evaluation of novel methods for cancer deep phenotype extraction:
 
Our first four development specific aims significantly extend the capability of our current software, focusing on challenging problems in biomedical information extraction. These aims support the development and evaluation of novel methods for cancer deep phenotype extraction:
  
Specific Aim 1: Develop methods for extracting phenotypic profiles. Extract patient’s deep phenotypes, and their attributes such as general modifiers (negation, uncertainty, subject) and cancer specific characteristics (e.g. grade, invasion, lymph node involvement, metastasis, size, stage)
+
'''Specific Aim 1''': Develop methods for extracting phenotypic profiles. Extract patient’s deep phenotypes, and their attributes such as general modifiers (negation, uncertainty, subject) and cancer specific characteristics (e.g. grade, invasion, lymph node involvement, metastasis, size, stage)
  
Specific Aim 2: Extract gene/protein mentions and their variants from the clinical narrative
+
'''Specific Aim 2''': Extract gene/protein mentions and their variants from the clinical narrative
  
Specific Aim 3: Create longitudinal representation of disease process and its resolution. Link phenotypes, treatments and outcomes in temporal associations to create a longitudinal abstraction of the disease
+
'''Specific Aim 3''': Create longitudinal representation of disease process and its resolution. Link phenotypes, treatments and outcomes in temporal associations to create a longitudinal abstraction of the disease
  
Specific Aim 4: Extract discourses containing explanations, speculations, and hypotheses, to support explorations of causality
+
'''Specific Aim 4''': Extract discourses containing explanations, speculations, and hypotheses, to support explorations of causality
  
 
Our last two implementation specific aims focus on the design of the software to support the cancer research community, ensuring the usability and utility of our software. These aims support the design, dissemination and sharing of the products of this work to maximize impact on cancer research:
 
Our last two implementation specific aims focus on the design of the software to support the cancer research community, ensuring the usability and utility of our software. These aims support the design, dissemination and sharing of the products of this work to maximize impact on cancer research:
  
Specific Aim 5: Design and implement a computational platform for deep phenotype discovery and analytics for translational investigators, including integrative visual analytics.
+
'''Specific Aim 5''': Design and implement a computational platform for deep phenotype discovery and analytics for translational investigators, including integrative visual analytics.
  
Specific Aim 6: Advance translational research in driving cancer biology research projects in breast cancer, ovarian cancer, and melanoma. Include research community throughout the design of the platform and its evaluation. Disseminate freely available software.
+
'''Specific Aim 6''': Advance translational research in driving cancer biology research projects in breast cancer, ovarian cancer, and melanoma. Include research community throughout the design of the platform and its evaluation. Disseminate freely available software.
  
 
Impact: The proposed work will produce novel methods for extracting detailed phenotype information directly from the EMR, the major source of such data for patients with cancer. Extracted phenotypes will be used in three ongoing translational studies with a precision medicine focus. Dissemination of the software will enhance the ability of cancer researchers to abstract meaningful clinical data for translational research. If successful, systematic capture and representation of these phenotypes from EMR data could later be used to drive clinical genomic decision support.
 
Impact: The proposed work will produce novel methods for extracting detailed phenotype information directly from the EMR, the major source of such data for patients with cancer. Extracted phenotypes will be used in three ongoing translational studies with a precision medicine focus. Dissemination of the software will enhance the ability of cancer researchers to abstract meaningful clinical data for translational research. If successful, systematic capture and representation of these phenotypes from EMR data could later be used to drive clinical genomic decision support.
 
 
  
 
== Who We Are ==
 
== Who We Are ==
 
* Boston Childrens Hospital/Harvard Medical School
 
* Boston Childrens Hospital/Harvard Medical School
** Guergana Savova (MPI)
+
** Guergana Savova (PI)
** Dmitriy Dligach
+
 
** Timothy Miller
 
** Timothy Miller
 
** Sean Finan
 
** Sean Finan
 
** David Harris
 
** David Harris
 
** Chen Lin
 
** Chen Lin
 +
** James Masanz
 +
** past members -- Dmitriy Dligach (currently faculty at Loyola University, Chicago), Pei Chen
  
 
* University of Pittburgh
 
* University of Pittburgh
** Rebecca Crowley Jacobson (MPI)
+
** Harry Hochheiser (site PI)
** Harry Hochheiser
+
** Olga Medvedeva
** Roger Day
+
** Mike Davis
** Adrian Lee
+
** Zhou Yuan
** Robert Edwards
+
** past members - through June 2017: Rebecca Crowley Jacobson (MPI), Roger Day, Adrian Lee, Robert Edwards, John Kirkwood, Kevin Mitchell, Eugene Tseytlin, Girish Chavan, Melissa Castine; Liz Legowski (through Jan 2015)
** John Kirkwood
+
 
** Kevin Mitchell
+
* Vanderbilt University
** Eugene Tseytlin
+
** Jeremy Warner (site PI)
** Girish Chavan
+
** Alicia Beeghly-Fadiel
** Liz Legowski (through Jan 2015)
+
 
** Melissa Castine
+
* Dana-Farber Cancer Institute
 +
** Elizabeth Buchbinder
  
 
== Funding ==
 
== Funding ==
Line 57: Line 61:
  
 
== Publications and presentations crediting DeepPhe ==
 
== Publications and presentations crediting DeepPhe ==
* Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
+
# Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
* Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
+
# Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
* Chen, Lin; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
+
# Lin, Chen; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
 
+
# Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016
 
+
# Timothy A. Miller, Sean Finan, Dmitriy Dligach, Guergana Savova. Robust Sentence Segmentation for Clinical Text. Abstract presented at the Annual Symposium of the American Medical Informatics Association, San Francisco, CA, 2015.
 +
# Hochheiser, Harry; Castine, Melissa; Harris, David; Savova, Guergana; Jacobson, Rebecca. 2016. An Information Model for Cancer Phenotypes. BMC Medical Informatics and Decision Making. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-016-0358-4
 +
# Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). December 2016, Houston, Texas, USA
 +
# Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain.
 +
# Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
 +
# Timothy A. Miller,  Dmitriy Dligach, Chen Lin, Steven Bethard, Guergana Savova. Feature Portability in Cross-domain Clinical Coreference. Abstract presented at the Annual Symposium of the American Medical Informatics Association, Chicago, IL, 2016.
 +
# Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RS. 2017. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform. 2017 May;69:177-187. doi: 10.1016/j.jbi.2017.04.011. PMID: 28428140; PMCID: PMC5706448 [Available on 2018-05-01] DOI:10.1016/j.jbi.2017.04.011
 +
# Timothy A. Miller, Steven Bethard, Hadi Amiri, Guergana Savova. Unsupervised Domain Adaptation for Clinical Negation Detection. Proceedings of the 16th Workshop on Biomedical Natural Language Processing. 2017.
 +
# Timothy A. Miller, Dmitriy Dligach, Steven Bethard, Chen Lin, and Guergana Savova. Towards generalizable entity-centric coreference resolution. Journal of Biomedical Informatics, 69; 251-258. 2017.
 +
# Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
 +
# Miller, T; Bethard, S; Amiri, H; Savova, G. 2017. Unsupervised Domain Adaptation for Clinical Negation Detection. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
 +
# Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records Cancer Research 77(21), November 2017 DOI: 10.1158/0008-5472.CAN-17-0615.
 +
# Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe - A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Annual Symposium of the American Medical Informatics Association (AMIA). Nov 2017. Washington DC.
 +
# Savova, G; Miller, T. 2018. DeepPhe and Extraction of Oncology Patient Phenotypes from Unstructured Text Using NLP and Other AI Tools. Presentation to Dana Farber Cancer Institute. January  24 2018. Boston, MA.
 +
# Warner, Jeremy. 2018. Improving Cancer Diagnosis and Care: Patient Access to Oncologic Imaging and Pathology Expertise and Technologies. the National Cancer Policy Forum of the National Academies of Sciences, Engineering, and Medicine. http://www.nationalacademies.org/hmd/Activities/Disease/NCPF/2018-FEB-12/Videos/Session%204%20Videos/32%20Warner.aspx
  
 
== DeepPhe Software ==
 
== DeepPhe Software ==
Line 76: Line 94:
 
== DeepPhe Gold Set ==
 
== DeepPhe Gold Set ==
 
* [[Deidentification_Process | Process for Deidentification of Source Documents]].
 
* [[Deidentification_Process | Process for Deidentification of Source Documents]].
 +
* [[Public_Deidentification_Process | Process for Deidentification of Source Documents]].
 +
* [[Public:Deidentification_Process | Process for Deidentification of Source Documents]].
 
* [[Gold_Set_Selection | Process for Selection of Gold Set Source Documents]].
 
* [[Gold_Set_Selection | Process for Selection of Gold Set Source Documents]].
* Training/Development/Test splits
+
* DeepPhe UPMC Training/Development/Test splits
** training set: all documents for Breast Cancer patients 03, 11, 92, 93 for a total of 48 documents
+
** training set:  
** development set: all documents for Breast Cancer patients 02, 21 for a total of 42 documents
+
*** all documents for Breast Cancer patients 03, 11, 92, 93 for a total of 48 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
** test set: all documents for Breast Cancer patients 01, 16 for a total of 41 documents
+
*** all documents for Breast Cancer patients extended 04,05,06,09,10,12,13,14,18,19,20,22,23,26,27,30,31,32,33,34,35,40,41,42,43,38,39,46,47 for a total of 954 documents  (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
 +
*** all documents for Melanoma patients 05, 06, 18, 19, 25, 28, 30, 33, 34, 42, for a total of 233 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\trainSet\DeepPhe DevSet Phenotype Annotations.xlsm
 +
*** all documents for Ovarian Cancer patients 3, 4, 7, 8, 12, 13, 16, 17, 18, 20, 24, 25, 26, 27, 30, 31, 32, 34, 37, 38, 41, 42, 43, 44, 46, 48 for a total of 1675 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\trainSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\trainSet\DeepPhe_ovCa_Train_Set_Phenotype_Annotations_GOLD.xlsm
 +
** development set:  
 +
*** all documents for Breast Cancer patients 02, 21 for a total of 42 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
 +
*** all documents for Breast Cancer patients extended 15,17,28,29,36,37,44,45,07,08,24,25 for a total of 416 documents  (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
 +
*** all documents for Melanoma patients 07, 32, 43 for a total of 215 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\devSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\devSet\DeepPhe DevSet Phenotype Annotations.xlsm
 +
*** all documents for Ovarian Cancer patients 9, 11, 19, 28, 29, 35, 39, 47 for a total of 562 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\devSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\devSet\DeepPhe_ovCa_Dev_Set_Phenotype_Annotations_GOLD.xlsm
 +
** test set:  
 +
*** all documents for Breast Cancer patients 01 (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\DeepPhe Test Phenotype Annotations v2.xlsm
 +
*** all documents for Breast Cancer extended for patients 01, 02, 63, 76, 100, 101, 104, 106, 109, 111, 114, 115, 117, 118, 119, 120, 121, 123, 125, 126, 129, 130, 132, 136, 137, 138, 142, 143, 155, 156, 158, 174, 181, 189, 197 for phenotyping level testing use (\\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\DeepPhe Test Phenotype Annotations v2.xlsm
 +
*** all documents for Melanoma patients 02, 03, 11, 12, 14, 16, 24, 27, 41, 44 for a total of 229 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\testSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\testSet\DeepPhe TestSet Phenotype Annotations.xlsm
 +
*** all documents for Ovarian Cancer patients 15, 21, 33, 36, 40, 45, 49, 50 for a total of 559 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\testSet); gold annotations are in \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\testSet\DeepPhe_ovCa_Test_Set_Phenotype_Annotations_GOLD.xlsm
 
** use the training set for developing the algorithms and the development set to report results and error analysis. The test set will be used only for the final evaluation to go in publications.
 
** use the training set for developing the algorithms and the development set to report results and error analysis. The test set will be used only for the final evaluation to go in publications.
 
+
* [[SEER_Project_Splits| SEER Project Train/Dev/Test Splits]]
 
+
* [[Clinical Genomics Gold Set | Clinical Genomics Gold Set ]]
  
 
== Qualitative Interviews ==
 
== Qualitative Interviews ==
Line 100: Line 132:
 
* [[Adopted Standards and Conventions for NLP annotations | Adopted Standards and Conventions for NLP annotations (task 1.4.2)]]
 
* [[Adopted Standards and Conventions for NLP annotations | Adopted Standards and Conventions for NLP annotations (task 1.4.2)]]
 
* [[Gold_Set_Selection | Gold Set Selection]]
 
* [[Gold_Set_Selection | Gold Set Selection]]
 +
* [[Entity mention and Template Evaluation Statistics | Entity Mention and Template Evaluation Statistics]]
 +
* [[Phenotype Evaluation Statistics | Phenotype Evaluation Statistics (with DeepPhe v1)]]
 +
* [[Phenotype Evaluation Statistics (with DeepPhe v2) | Phenotype Evaluation Statistics (with DeepPhe v2)]]
 
* Modeling
 
* Modeling
 
**[[Phenotyping Rules]]
 
**[[Phenotyping Rules]]
 +
** [[Breast Cancer Model]]
 +
** [[Melanoma Model]]
 +
** [[Ovarian Cancer Model]]
 
** [[Cancer_phenotype_modeling_notes| Cancer phenotype modeling]] notes
 
** [[Cancer_phenotype_modeling_notes| Cancer phenotype modeling]] notes
 
** [[Layered cancer phenotyping]]
 
** [[Layered cancer phenotyping]]
Line 132: Line 170:
 
* [[ Gold_standard_annotations| Gold standard annotations]]
 
* [[ Gold_standard_annotations| Gold standard annotations]]
 
* [[ Licensing| Licensing]]
 
* [[ Licensing| Licensing]]
* [[Year2_Goals| Year 2 goals]]
 
 
* [[Research_coreference| RESEARCH: Coreference]]
 
* [[Research_coreference| RESEARCH: Coreference]]
 
* [[Research_relations| RESEARCH: Relation extraction]]
 
* [[Research_relations| RESEARCH: Relation extraction]]
Line 138: Line 175:
 
* [[Research_hci| RESEARCH: Human-Computer interaction]]
 
* [[Research_hci| RESEARCH: Human-Computer interaction]]
 
* [[Research_birads| RESEARCH: BiRADS]]
 
* [[Research_birads| RESEARCH: BiRADS]]
 
+
* [[Demo_June_2016| Demo in June 2016]]
 +
* [[SEER_Project_Tech_Req| SEER Project Technical Requirements]]
 +
* [[SEER_Project_Splits| SEER Project Train/Dev/Test Splits]]
 +
* [[Paper_ideas_2016| Paper Ideas 2016]]
 +
* [[Year2_Goals| Year 2 goals (May 2015-April 2016)]]
 +
* [[Year3_Goals| Year 3 goals and Publication Ideas (May 2016-April 2017)]]
 +
* [[Year4_Goals| Year 4 goals (May 2017-April 2018)]]
 +
* [[Year5_Goals| Year 5 goals (May 2018-April 2019)]]
  
  
Line 164: Line 208:
 
*[[ScrumSprint_7 | Sprint 7]]
 
*[[ScrumSprint_7 | Sprint 7]]
 
*[[ScrumSprint_8 | Sprint 8]]
 
*[[ScrumSprint_8 | Sprint 8]]
*[[ScrumSprint_9 | Sprint 9, Feb 9 - March 8, 2016]]
+
*[[ScrumSprint_9 | Sprint 9, Feb 9 - March 15, 2016]]
 +
*[[ScrumSprint_10 | Sprint 10, March 15 - April 12, 2016]]
 +
*[[ScrumSprint_11 | Sprint 11, April 13 - May 10, 2016]]
 +
*[[ScrumSprint_12 | Sprint 12, May 11 - June 7, 2016]]
 +
*[[ScrumSprint_13 | Sprint 13, June 26 - July 26, 2016]]
 +
*[[ScrumSprint_14 | Sprint 14, July 26 - August 30, 2016]]
 +
*[[ScrumSprint_15 | Sprint 15, August 31 - September 27, 2016]]
 +
*[[ScrumSprint_16 | Sprint 16, September 27 - October 25, 2016]]
 +
*[[ScrumSprint_17 | Sprint 17, October 25 -- November 29, 2016]]
 +
*[[ScrumSprint_18 | Sprint 18, November 30, 2016 -- January 3, 2017]]
 +
*[[ScrumSprint_19 | Sprint 19, January 3 -- January 31, 2017]]
 +
*[[ScrumSprint_20 | Sprint 20, February 1 - February 28, 2017]]
 +
*[[ScrumSprint_21 | Sprint 21, March 1 - April 4, 2017]]
 +
*[[ScrumSprint_22 | Sprint 22, April 5 - April 26, 2017]]
 +
*[[ScrumSprint_23 | Sprint 23, April 26 - May 24, 2017]]
 +
*[[ScrumSprint_24 | Sprint 24, June 6 - July 11, 2017]]
 +
*[[ScrumSprint_25 | Sprint 25, July 12 - Aug 16, 2017]]
 +
*[[ScrumSprint_26 | Sprint 26, Aug 17 - Sept 20, 2017]]
 +
*[[ScrumSprint_27 | Sprint 27, Sept 21 - Oct 18, 2017]]
 +
*[[ScrumSprint_28 | Sprint 28, Oct 19 - Nov 15, 2017]]
 +
*[[ScrumSprint_29 | Sprint 29, Nov 15 - Dec 13, 2017]]
 +
*[[ScrumSprint_30 | Sprint 30, Dec 13, 2017 - Jan 17, 2018]]
 +
*[[ScrumSprint_31 | Sprint 31, Jan 17 - Feb 14, 2018]]
 +
*[[ScrumSprint_32 | Sprint 32, Feb 15 - March 14, 2018]]
 +
*[[ScrumSprint_33 | Sprint 33, March 14 - April 11, 2018]]
 +
*[[ScrumSprint_34 | Sprint 34, April 12 - May 9, 2018]]
 +
*[[ScrumSprint_35 | Sprint 35, May 10 - June 6, 2018]]
 +
*[[ScrumSprint_36 | Sprint 36, June 6 - July 11, 2018]]
  
 
== Meeting Notes ==
 
== Meeting Notes ==
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_01_25_2018| January 25, 2018]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_01_18_2018| January 18, 2018]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_01_11_2018| January 11, 2018]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_01_05_2018| January 5, 2018]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_12_21_2017| December 21, 2017]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_12_14_2017| December 14, 2017]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_11_16_2017| November 16, 2017]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_11_09_2017| November 9, 2017]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_11_02_2017| November 2, 2017]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_24_17_2017| October 24, 2017]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_10_17_2017| October 17, 2017]] Rules and Ontology Development Meeting
 +
*[[Ontology_and_Rules_DeepPhe_Meeting_10_12_2017| October 12, 2017]] Rules and Ontology Development Meeting
 +
*[[Melanoma_Rules_DeepPhe_Meeting_10_05_2017| October 5, 2017]] Melanoma Rules and Ontology Meeting
 +
*[[Melanoma_Rules_DeepPhe_Meeting_09_28_2017| September 28, 2017]] Melanoma Rules and Ontology Meeting
 +
*[[Melanoma_Rules_DeepPhe_Meeting_09_14_2017| September 14, 2017]] Melanoma Rules and Ontology Meeting
 +
*[[Melanoma_Rules_DeepPhe_Meeting_09_07_2017| September 7, 2017]] Melanoma Rules and Ontology Meeting
 +
*[[Melanoma_Rules_DeepPhe_Meeting_08_24_2017| August 24, 2017]] Melanoma Rules Meeting
 +
*[[Melanoma_Rules_DeepPhe_Meeting_08_17_2017| August 17, 2017]] Melanoma Rules Meeting
 +
*[[Melanoma_Rules_DeepPhe_Meeting_08_10_2017| August 10, 2017]] Melanoma Rules Meeting
 +
*[[MelanomaRules_DeepPhe_Meeting_08032017| August 3, 2017]] Melanoma Rules Meeting
 
*[[Research_DeepPhe_Meeting_08272015| August 27, 2015]] Research Meeting
 
*[[Research_DeepPhe_Meeting_08272015| August 27, 2015]] Research Meeting
 
*[[Modeling_DeepPhe_Meeting_08032015| August 3, 2015]] Modeling Meeting
 
*[[Modeling_DeepPhe_Meeting_08032015| August 3, 2015]] Modeling Meeting
Line 214: Line 305:
  
 
== Contact ==
 
== Contact ==
If you need assistance and/or if you have questions about the project, feel free to send e-mail to Guergana.Savova at childrens dot harvard dot edu or to Rebecca Crowley Jacobson at rebeccaj at pitt dot edu
+
If you need assistance or if you have further questions about the project, feel free to e-mail Guergana.Savova@childrens.harvard.edu or to Rebecca Crowley Jacobson at rebeccaj@pitt.edu.
 
+
 
+
  
 
== Getting started ==
 
== Getting started ==

Revision as of 09:57, 5 June 2018

Public Site

Please visit our Cancer Deep Phenotype (DeepPhe) public site at http://deepphe.healthnlp.org.


Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project

Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project.

Cancer is a genomic disease, with enormous heterogeneity in its behavior. In the past, our methods for categorization, prediction of outcome, and treatment selection have relied largely on a morphologic classification of Cancer. But new technologies are fundamentally reframing our views of cancer initiation, progression, metastasis, and response to treatment; moving us towards a molecular classification of Cancer. This transformation depends not only on our ability to deeply investigate the cancer genome, but also on our ability to link these specific molecular changes to specific tumor behaviors. As sequencing costs continue to decline at a supra-Moore’s law rate, a torrent of cancer genomic data is looming. However, our ability to deeply investigate the cancer genome is outpacing our ability to correlate these changes with the phenotypes that they produce. Translational investigators seeking to associate specific genetic, epigenetic, and systems changes with particular tumor behaviors, lack access to detailed observable traits about the cancer (the so called ‘deep phenotype’), which has now become a major barrier to research.

We propose the advanced development and extension of a software platform for performing deep phenotype extraction directly from medical records of patients with Cancer, with the goal of enabling translational cancer research and precision medicine. The work builds on previous informatics research and software development efforts from Boston Children’s Hospital and University of Pittsburgh groups, both individually and together. Multiple software projects developed by our groups (some initially funded by NCI) that have already passed the initial prototyping and pilot development phase (eMERGE, THYME, TIES, ODIE, Apache cTAKES) will be combined and extended to produce an advanced software platform for accelerating cancer research. Previous work in a number of NIH-funded translational science initiatives has already demonstrated the benefits of these methodologies (e.g. Electronic Medical Record and Genomics (eMERGE), PharmacoGenomics Research Network (PGRN), SHARPn, i2b2). However, to date these initiatives have focused exclusively on select non-cancer phenotypes and have had the goal of dichotomizing patients for a particular phenotype of interest (for example, Type II Diabetes, Rheumatoid Arthritis, or Multiple Sclerosis). In contrast, our proposed work focuses on extracting and representing multiple phenotype features for individual patients, to build a cancer phenotype model, relating observable traits over time for individual patients.

Our first four development specific aims significantly extend the capability of our current software, focusing on challenging problems in biomedical information extraction. These aims support the development and evaluation of novel methods for cancer deep phenotype extraction:

Specific Aim 1: Develop methods for extracting phenotypic profiles. Extract patient’s deep phenotypes, and their attributes such as general modifiers (negation, uncertainty, subject) and cancer specific characteristics (e.g. grade, invasion, lymph node involvement, metastasis, size, stage)

Specific Aim 2: Extract gene/protein mentions and their variants from the clinical narrative

Specific Aim 3: Create longitudinal representation of disease process and its resolution. Link phenotypes, treatments and outcomes in temporal associations to create a longitudinal abstraction of the disease

Specific Aim 4: Extract discourses containing explanations, speculations, and hypotheses, to support explorations of causality

Our last two implementation specific aims focus on the design of the software to support the cancer research community, ensuring the usability and utility of our software. These aims support the design, dissemination and sharing of the products of this work to maximize impact on cancer research:

Specific Aim 5: Design and implement a computational platform for deep phenotype discovery and analytics for translational investigators, including integrative visual analytics.

Specific Aim 6: Advance translational research in driving cancer biology research projects in breast cancer, ovarian cancer, and melanoma. Include research community throughout the design of the platform and its evaluation. Disseminate freely available software.

Impact: The proposed work will produce novel methods for extracting detailed phenotype information directly from the EMR, the major source of such data for patients with cancer. Extracted phenotypes will be used in three ongoing translational studies with a precision medicine focus. Dissemination of the software will enhance the ability of cancer researchers to abstract meaningful clinical data for translational research. If successful, systematic capture and representation of these phenotypes from EMR data could later be used to drive clinical genomic decision support.

Who We Are

  • Boston Childrens Hospital/Harvard Medical School
    • Guergana Savova (PI)
    • Timothy Miller
    • Sean Finan
    • David Harris
    • Chen Lin
    • James Masanz
    • past members -- Dmitriy Dligach (currently faculty at Loyola University, Chicago), Pei Chen
  • University of Pittburgh
    • Harry Hochheiser (site PI)
    • Olga Medvedeva
    • Mike Davis
    • Zhou Yuan
    • past members - through June 2017: Rebecca Crowley Jacobson (MPI), Roger Day, Adrian Lee, Robert Edwards, John Kirkwood, Kevin Mitchell, Eugene Tseytlin, Girish Chavan, Melissa Castine; Liz Legowski (through Jan 2015)
  • Vanderbilt University
    • Jeremy Warner (site PI)
    • Alicia Beeghly-Fadiel
  • Dana-Farber Cancer Institute
    • Elizabeth Buchbinder

Funding

The project described is supported by Grant Number 1U24CA184407-01 from the National Cancer Institute at the US National Institutes of Health. This work is part of the NCI's Informatics Technology for Cancer Research (ITCR) Initiative (http://itcr.nci.nih.gov/) The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The project period is May 2014 - April, 2019.


Publications and presentations crediting DeepPhe

  1. Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
  2. Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
  3. Lin, Chen; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
  4. Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016
  5. Timothy A. Miller, Sean Finan, Dmitriy Dligach, Guergana Savova. Robust Sentence Segmentation for Clinical Text. Abstract presented at the Annual Symposium of the American Medical Informatics Association, San Francisco, CA, 2015.
  6. Hochheiser, Harry; Castine, Melissa; Harris, David; Savova, Guergana; Jacobson, Rebecca. 2016. An Information Model for Cancer Phenotypes. BMC Medical Informatics and Decision Making. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-016-0358-4
  7. Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). December 2016, Houston, Texas, USA
  8. Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain.
  9. Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
  10. Timothy A. Miller, Dmitriy Dligach, Chen Lin, Steven Bethard, Guergana Savova. Feature Portability in Cross-domain Clinical Coreference. Abstract presented at the Annual Symposium of the American Medical Informatics Association, Chicago, IL, 2016.
  11. Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RS. 2017. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform. 2017 May;69:177-187. doi: 10.1016/j.jbi.2017.04.011. PMID: 28428140; PMCID: PMC5706448 [Available on 2018-05-01] DOI:10.1016/j.jbi.2017.04.011
  12. Timothy A. Miller, Steven Bethard, Hadi Amiri, Guergana Savova. Unsupervised Domain Adaptation for Clinical Negation Detection. Proceedings of the 16th Workshop on Biomedical Natural Language Processing. 2017.
  13. Timothy A. Miller, Dmitriy Dligach, Steven Bethard, Chen Lin, and Guergana Savova. Towards generalizable entity-centric coreference resolution. Journal of Biomedical Informatics, 69; 251-258. 2017.
  14. Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
  15. Miller, T; Bethard, S; Amiri, H; Savova, G. 2017. Unsupervised Domain Adaptation for Clinical Negation Detection. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
  16. Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records Cancer Research 77(21), November 2017 DOI: 10.1158/0008-5472.CAN-17-0615.
  17. Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe - A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Annual Symposium of the American Medical Informatics Association (AMIA). Nov 2017. Washington DC.
  18. Savova, G; Miller, T. 2018. DeepPhe and Extraction of Oncology Patient Phenotypes from Unstructured Text Using NLP and Other AI Tools. Presentation to Dana Farber Cancer Institute. January 24 2018. Boston, MA.
  19. Warner, Jeremy. 2018. Improving Cancer Diagnosis and Care: Patient Access to Oncologic Imaging and Pathology Expertise and Technologies. the National Cancer Policy Forum of the National Academies of Sciences, Engineering, and Medicine. http://www.nationalacademies.org/hmd/Activities/Disease/NCPF/2018-FEB-12/Videos/Session%204%20Videos/32%20Warner.aspx

DeepPhe Software

The DeepPhe system will be available as part of Apache cTAKES at http://ctakes.apache.org/. It is also available at https://github.com/DeepPhe/DeepPhe.

DeepPhe software components will also be deployed in the TIES Software System for sharing and accessing deidentified NLP-processed data with tissue(http://ties.pitt.edu/) which is deployed as part of the TIES Cancer Tissue Network (TCRN) across multiple US Cancer Centers.

DeepPhe software development will be coordinated as per software development policies.

DeepPhe software documentation for developers is available in

DeepPhe Gold Set

  • Process for Deidentification of Source Documents.
  • Process for Deidentification of Source Documents.
  • Process for Deidentification of Source Documents.
  • Process for Selection of Gold Set Source Documents.
  • DeepPhe UPMC Training/Development/Test splits
    • training set:
      • all documents for Breast Cancer patients 03, 11, 92, 93 for a total of 48 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Breast Cancer patients extended 04,05,06,09,10,12,13,14,18,19,20,22,23,26,27,30,31,32,33,34,35,40,41,42,43,38,39,46,47 for a total of 954 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Melanoma patients 05, 06, 18, 19, 25, 28, 30, 33, 34, 42, for a total of 233 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\trainSet\DeepPhe DevSet Phenotype Annotations.xlsm
      • all documents for Ovarian Cancer patients 3, 4, 7, 8, 12, 13, 16, 17, 18, 20, 24, 25, 26, 27, 30, 31, 32, 34, 37, 38, 41, 42, 43, 44, 46, 48 for a total of 1675 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\trainSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\trainSet\DeepPhe_ovCa_Train_Set_Phenotype_Annotations_GOLD.xlsm
    • development set:
      • all documents for Breast Cancer patients 02, 21 for a total of 42 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Breast Cancer patients extended 15,17,28,29,36,37,44,45,07,08,24,25 for a total of 416 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Melanoma patients 07, 32, 43 for a total of 215 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\devSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\devSet\DeepPhe DevSet Phenotype Annotations.xlsm
      • all documents for Ovarian Cancer patients 9, 11, 19, 28, 29, 35, 39, 47 for a total of 562 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\devSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\devSet\DeepPhe_ovCa_Dev_Set_Phenotype_Annotations_GOLD.xlsm
    • test set:
      • all documents for Breast Cancer patients 01 (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\DeepPhe Test Phenotype Annotations v2.xlsm
      • all documents for Breast Cancer extended for patients 01, 02, 63, 76, 100, 101, 104, 106, 109, 111, 114, 115, 117, 118, 119, 120, 121, 123, 125, 126, 129, 130, 132, 136, 137, 138, 142, 143, 155, 156, 158, 174, 181, 189, 197 for phenotyping level testing use (\\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\DeepPhe Test Phenotype Annotations v2.xlsm
      • all documents for Melanoma patients 02, 03, 11, 12, 14, 16, 24, 27, 41, 44 for a total of 229 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\testSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\testSet\DeepPhe TestSet Phenotype Annotations.xlsm
      • all documents for Ovarian Cancer patients 15, 21, 33, 36, 40, 45, 49, 50 for a total of 559 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\testSet); gold annotations are in \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\testSet\DeepPhe_ovCa_Test_Set_Phenotype_Annotations_GOLD.xlsm
    • use the training set for developing the algorithms and the development set to report results and error analysis. The test set will be used only for the final evaluation to go in publications.
  • SEER Project Train/Dev/Test Splits
  • Clinical Genomics Gold Set

Qualitative Interviews


Project materials/ WIKIs to tasks


Presentations

Communication


Scrum Sprints

Meeting Notes

Licensing

Licensing policies for DeepPhe software and ontological models.


Contact

If you need assistance or if you have further questions about the project, feel free to e-mail Guergana.Savova@childrens.harvard.edu or to Rebecca Crowley Jacobson at rebeccaj@pitt.edu.

Getting started

Consult the User's Guide for information on using the wiki software.