Difference between revisions of "Main Page"

From HealthNLP-Cancer
Jump to: navigation, search
(Who We Are)
(Scrum Sprints)
(48 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Public Site ==
+
<!-- == Public Site ==
'''Please visit our Cancer Deep Phenotype (DeepPhe) ''public site'' at [http://healthnlp.hms.harvard.edu/deepphe/wiki http://deepphe.healthnlp.org].'''
+
'''Please visit our Cancer Deep Phenotype (DeepPhe) ''public site'' at [http://healthnlp.hms.harvard.edu/deepphe/wiki http://deepphe.healthnlp.org].''' -->
 
+
  
  
 
== Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project ==
 
== Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project ==
Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project.
+
Our goal is to develop novel methods for information extraction to facilitate automatic/unsupervised/minimally supervised extraction of specific discrete cancer-related data from various types of unstructured electronic medical records.
 +
 
 +
<!-- Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project.
  
 
Cancer is a genomic disease, with enormous heterogeneity in its behavior. In the past, our methods for categorization, prediction of outcome, and treatment selection have relied largely on a morphologic classification of Cancer. But new technologies are fundamentally reframing our views of cancer initiation, progression, metastasis, and response to treatment; moving us towards a molecular classification of Cancer. This transformation depends not only on our ability to deeply investigate the cancer genome, but also on our ability to link these specific molecular changes to specific tumor behaviors. As sequencing costs continue to decline at a supra-Moore’s law rate, a torrent of cancer genomic data is looming. However, our ability to deeply investigate the cancer genome is outpacing our ability to correlate these changes with the phenotypes that they produce. Translational investigators seeking to associate specific genetic, epigenetic, and systems changes with particular tumor behaviors, lack access to detailed observable traits about the cancer (the so called ‘deep phenotype’), which has now become a major barrier to research.
 
Cancer is a genomic disease, with enormous heterogeneity in its behavior. In the past, our methods for categorization, prediction of outcome, and treatment selection have relied largely on a morphologic classification of Cancer. But new technologies are fundamentally reframing our views of cancer initiation, progression, metastasis, and response to treatment; moving us towards a molecular classification of Cancer. This transformation depends not only on our ability to deeply investigate the cancer genome, but also on our ability to link these specific molecular changes to specific tumor behaviors. As sequencing costs continue to decline at a supra-Moore’s law rate, a torrent of cancer genomic data is looming. However, our ability to deeply investigate the cancer genome is outpacing our ability to correlate these changes with the phenotypes that they produce. Translational investigators seeking to associate specific genetic, epigenetic, and systems changes with particular tumor behaviors, lack access to detailed observable traits about the cancer (the so called ‘deep phenotype’), which has now become a major barrier to research.
Line 23: Line 24:
 
Advance translational research in driving cancer biology research projects in breast cancer, ovarian cancer, and melanoma. Include research community throughout the design of the platform and its evaluation. Disseminate freely available software.
 
Advance translational research in driving cancer biology research projects in breast cancer, ovarian cancer, and melanoma. Include research community throughout the design of the platform and its evaluation. Disseminate freely available software.
  
Impact: The proposed work will produce novel methods for extracting detailed phenotype information directly from the EMR, the major source of such data for patients with cancer. Extracted phenotypes will be used in three ongoing translational studies with a precision medicine focus. Dissemination of the software will enhance the ability of cancer researchers to abstract meaningful clinical data for translational research. If successful, systematic capture and representation of these phenotypes from EMR data could later be used to drive clinical genomic decision support.
+
Impact: The proposed work will produce novel methods for extracting detailed phenotype information directly from the EMR, the major source of such data for patients with cancer. Extracted phenotypes will be used in three ongoing translational studies with a precision medicine focus. Dissemination of the software will enhance the ability of cancer researchers to abstract meaningful clinical data for translational research. If successful, systematic capture and representation of these phenotypes from EMR data could later be used to drive clinical genomic decision support. -->
 +
 
  
 
== Who We Are ==
 
== Who We Are ==
 
* Boston Childrens Hospital/Harvard Medical School
 
* Boston Childrens Hospital/Harvard Medical School
** Guergana Savova (PI)
+
** Guergana Savova (MPI)
 
** Timothy Miller
 
** Timothy Miller
 
** Sean Finan
 
** Sean Finan
Line 35: Line 37:
  
 
* University of Pittburgh
 
* University of Pittburgh
** Harry Hochheiser (site PI)
+
** Harry Hochheiser (MPI)
 
** Zhou Yuan
 
** Zhou Yuan
 
** past members - through June 2017: Rebecca Crowley Jacobson (MPI), Roger Day, Adrian Lee, Robert Edwards, John Kirkwood, Kevin Mitchell, Eugene Tseytlin, Girish Chavan, Melissa Castine; Liz Legowski (through Jan 2015), Olga Medvedeva, Mike Davis
 
** past members - through June 2017: Rebecca Crowley Jacobson (MPI), Roger Day, Adrian Lee, Robert Edwards, John Kirkwood, Kevin Mitchell, Eugene Tseytlin, Girish Chavan, Melissa Castine; Liz Legowski (through Jan 2015), Olga Medvedeva, Mike Davis
  
 
* Vanderbilt University
 
* Vanderbilt University
** Jeremy Warner (site PI)
+
** Jeremy Warner (MPI)
 
** Alicia Beeghly-Fadiel
 
** Alicia Beeghly-Fadiel
  
 
* Dana-Farber Cancer Institute
 
* Dana-Farber Cancer Institute
 
** Elizabeth Buchbinder
 
** Elizabeth Buchbinder
 +
 +
* Kentucky Cancer Registry
 +
** Eric Durbin (MPI)
 +
** Isaac Hands
 +
** Jong Jeong
 +
** Ramakanth (Rama) Kavuluru
 +
** David Rust
 +
** Lisa Witt
  
 
== Funding ==
 
== Funding ==
 
The project described is supported by the National Cancer Institute at the US National Institutes of Health. It is part of the NCI's Informatics Technology for Cancer Research (ITCR) Initiative (http://itcr.nci.nih.gov/) The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
 
The project described is supported by the National Cancer Institute at the US National Institutes of Health. It is part of the NCI's Informatics Technology for Cancer Research (ITCR) Initiative (http://itcr.nci.nih.gov/) The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
  
== Publications and presentations crediting DeepPhe ==
+
 
# Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
+
== Publications and presentations ==
 +
Peer-reviewed publications:
 +
<ul>
 +
  <ol>
 +
<li> Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Amiri, Hadi; Bethard, Steven and Savova, Guergana. 2018. Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction. LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis. Oct 31-Nov 1, 2018. Brussels, Belgium.
 +
https://aclanthology.coli.uni-saarland.de/papers/W18-5619/w18-5619 </li>
 +
 
 +
<li>Malty, Andrew M., Jain, Sandeep K., Yang, Peter C., Harvey, Krysten, Warner, Jeremy L. Computerized approach to creating a systematic ontology of hematology/oncology regimens. JCO Clinical Cancer Informatics. 2018 May 11.
 +
http://ascopubs.org/doi/full/10.1200/CCI.17.00142 </li>
 +
<li>Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Lin, Chen; Savova, Guergana. 2017. Towards Generalizable Entity-Centric Clinical Coreference Resolution. Journal of Biomedical Informatics. Vol. 69, May 2017, pp. 251-258. https://doi.org/10.1016/j.jbi.2017.04.015; http://www.sciencedirect.com/science/article/pii/S1532046417300850</li>
 +
<li>Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RS. 2017. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform. 2017 May;69:177-187. doi: 10.1016/j.jbi.2017.04.011. PMID: 28428140; PMCID: PMC5706448 [Available on 2018-05-01] DOI:10.1016/j.jbi.2017.04.011
 +
https://www.sciencedirect.com/science/article/pii/S1532046417300813 </li>
 +
<li>Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017.
 +
https://aclanthology.coli.uni-saarland.de/papers/W17-2341/w17-2341 </li>
 +
<li>Miller, T; Bethard, S; Amiri, H; Savova, G. 2017. Unsupervised Domain Adaptation for Clinical Negation Detection. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
 +
https://aclanthology.coli.uni-saarland.de/papers/W17-2320/w17-2320 </li>
 +
<li>Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe - A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Annual Symposium of the American Medical Informatics Association (AMIA). Nov 2017. Washington DC
 +
https://amia2017.zerista.com/event/member/389439 </li>
 +
<li>Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Research 77(21), November 2017 DOI: 10.1158/0008-5472.CAN-17-0615.
 +
https://www.ncbi.nlm.nih.gov/pubmed/29092954</li>
 +
<li>Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain.
 +
https://aclanthology.coli.uni-saarland.de/papers/E17-2118/e17-2118 </li>
 +
<li>Chen, Lin; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016
 +
https://aclanthology.coli.uni-saarland.de/papers/W16-2914/w16-2914 </li>
 +
<li>Hochheiser, Harry; Castine, Melissa; Harris, David; Savova, Guergana; Jacobson, Rebecca. 2016. An Information Model for Computable Cancer Phenotypes. BMC Medical Informatics and Decision Making.
 +
https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-016-0358-4
 +
https://www.ncbi.nlm.nih.gov/pubmed/27629872 </li>
 +
<li>Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). Medical Informatics Thematic Track. December 2016, Houston, Texas, USA</li>
 +
<li>Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
 +
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765699/ </li>
 +
<li>Chen, Lin; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Multilayered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. 2016 Mar;23(2):387-95. doi: 10.1093/jamia/ocv113
 +
https://www.ncbi.nlm.nih.gov/pubmed/26521301 </li>
 +
</ol>
 +
</ul>
 +
 
 +
Peer-reviewed other:
 +
<ul>
 +
  <ol>
 +
<li>Beeghly-Fadiel, Alicia; Warner, Jeremy; Finan, Sean; Masanz, James;  Hochheiser, Harry; Savova, Guergana. (under review). Deep Phenotype Extraction to Facilitate Cancer Research: Extending DeepPhe to Ovarian Cancer. American Association for Cancer Research (AACR) 2019. March 29-April 3, 2019. Atlanta, GA.</li>
 +
<li>Yuan, Zhou; Finan, Sean; Warner, Jeremy; Savova, Guergana; Hochheiser, Harry. 2018. Toward Longitudinal Visual Analytics for Cancer Patient Trajectories Extracted from Clinical Text. 2018 Workshop on Visual Analytics and Healthcare, Demonstration Presentation. AMIA 2018, Nov 3-7, 2018. San Francisco, CA.</li>
 +
<li>Chen Lin, Timothy A. Miller, Hadi Amiri, David Harris, Samuel M. Rubinstein, Jeremy Warner, Guergana K. Savova, Ph.D. 2018. Classification of electronic medical records of breast cancer and melanoma patients into clinical episodes. 30th Anniversary AACR Special Conference Convergence: Artificial Intelligence, Big Data, and Prediction of Cancer. Oct 14-17, 2018. Newport, RI, USA.</li>
 +
<li>Warner, Jeremy; Elhadad, Noemie; Bastarache, Lisa; Gotz, David; Savova, Guergana. 2018. Panel - Didactic: Computable Longitudinal Patient Trajectories. Annual Symposium of the American Medical Informatics Association. November, 2018. San Francisco, CA. (peer-reviewed panel)</li>
 +
<li>Savova G, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, Warner JL, Jacobson R. DeepPhe – a natural language processing system for extracting cancer phenotypes from clinical records. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.</li>
 +
<li>Warner JL, Harris D, Rubinstein S, Finan S, Lin C, Miller T, Amiri H, Hochheiser H, Savova G. Capturing high-resolution temporal cancer phenotypes using DeepPhe. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.</li>
 +
<li>Yang PC, Malty A, Jain SK, Harvey K, Finan S, Warner JL. 2018. A Comprehensive Ontology of Hematology/Oncology Regimens. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.</li>
 +
<li>Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA. (peer-reviewed panel)</li>
 +
</ol>
 +
</ul>
 +
 
 +
Invited presentations:
 +
<ul>
 +
  <ol>
 +
<li>Savova, Guergana. 2019. Cancer Deep Phenotype Extraction from Electronic Medical Records. Molecular Med Tri-con. March 10-15, 2019. San Francisco, CA, USA </li>
 +
<li>Savova G. 2018. Software and Research Challenges for Clinical NLP. Dana Farber Cancer Institute; 2018 October; Boston, MA, USA. </li>
 +
<li>Savova, Guergana. 2018. Cancer Deep Phenotype Extraction form Electronic Medical Records (DeepPhe). College of American Pathologists Pathology Electronic Reporting meeting (CAP PERT). July 29, 2018. Montreal, QB, CA. </li>
 +
<li>Warner, Jeremy. 2018. A Comprehensive Ontology of Hematology/Oncology Regimens. College of American Pathologists Pathology Electronic Reporting meeting (CAP PERT). July 29, 2018. Montreal, QB, CA. </li>
 +
<li>Savova, G; Miller, T. 2018. DeepPhe and Extraction of Oncology Patient Phenotypes from Unstructured Text Using NLP and Other AI Tools. Presentation to Dana Farber Cancer Institute. January 24 2018. Boston, MA.</li>
 +
<li>Warner, Jeremy. 2017. Supporting cancer registries through automated extraction of pathology and chemotherapy regimen information.” CDC/NCI/FDA/VA Clinical Natural Language Processing Workshop. Atlanta, GA. </li>
 +
<li>Savova, Guergana. 2017. Select Applications of Natural Language Processing in Biomedicine. Natural Language Processing Symposium, Boston University, Boston, MA. November, 2017. </li>
 +
<li>Jacobson, Rebecca. 2017. Invited presentation at Ohio State University James Cancer Center Grand Rounds, January 20th, 2017</li>
 +
<li>Jacobson, Rebecca. 2017. Invited presentation at Case Western University Comprehensive Cancer Center Seminar Series, March 10th, 2017</li>
 +
<li>Jacobson, Rebecca. 2016. Invited presentation of cTAKES and DeepPhe to NCI in January, 2016. Gaithersburg, MD</li>
 +
<li>Jacobson, Rebecca. 2016. Invited presentation in CBIIT Speaker Series, February 17, 2016. Gaithersburg, MD</li>
 +
<li>Jacobson, Rebecca. 2016. Invited presentation at University of Pittsburgh Cancer Informatics (UPCI) External Advisory Board, March 8, 2016</li>
 +
<li>Finan, Sean. 2016. cTAKES/deepPhe presentation at the ITCR workshop at CI4CC in Napa, CA</li>
 +
<li>Jacobson, Rebecca. 2016. Invited presentation at SEER PI meeting in New Mexico, March 16, 2016</li>
 +
<li>Jacobson, Rebecca. 2016. Invited presentation at University of Michigan Department of Learning Health Sciences, April 6th, 2016</li>
 +
<li>Jacobson, Rebecca. 2016. Invited presentation at Pathology Informatics 2016, Pittsburgh PA, May 24th, 2016</li>
 +
<li>Jacobson, Rebecca. 2016. Invited presentation at University of Pittsburgh Cancer Institute Scientific Retreat, Greensburg, PA, June 16th, 2016</li>
 +
<li>Jacobson, Rebecca and Savova, Guergana. 2016. Invited presentation at SEER meeting in Gaithersburg, MD, December 10, 2016</li>
 +
<li>Jacobson, Rebecca and Savova, Guergana. Invited presentation of cTAKES/DeepPhe to NCI in October, 2015</li>
 +
</ol>
 +
</ul>
 +
 
 +
Other:
 +
<ul>
 +
  <ol>
 +
<li>Interview with Uduak Thomas of the GenomeWeb magazine. May 16, 2014. https://www.genomeweb.com/informatics/upitt-bch-team-use-696k-grant-develop-nlp-based-tools-extract-phenotype-data-emr#.W3HF1NJKi70 </li>
 +
<li>Project website: cancer.healhnlp.org </li>
 +
<li>Github repository: https://github.com/DeepPhe </li>
 +
<li>Listed on the ITCR website, Tools: https://itcr.cancer.gov/informatics-tools </li>
 +
</ol>
 +
</ul>
 +
 
 +
<!--# Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
 
# Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
 
# Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.
 
# Lin, Chen; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
 
# Lin, Chen; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
Line 68: Line 162:
 
# Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe - A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Annual Symposium of the American Medical Informatics Association (AMIA). Nov 2017. Washington DC.
 
# Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe - A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Annual Symposium of the American Medical Informatics Association (AMIA). Nov 2017. Washington DC.
 
# Savova, G; Miller, T. 2018. DeepPhe and Extraction of Oncology Patient Phenotypes from Unstructured Text Using NLP and Other AI Tools. Presentation to Dana Farber Cancer Institute. January  24 2018. Boston, MA.
 
# Savova, G; Miller, T. 2018. DeepPhe and Extraction of Oncology Patient Phenotypes from Unstructured Text Using NLP and Other AI Tools. Presentation to Dana Farber Cancer Institute. January  24 2018. Boston, MA.
# Warner, Jeremy. 2018. Improving Cancer Diagnosis and Care: Patient Access to Oncologic Imaging and Pathology Expertise and Technologies. the National Cancer Policy Forum of the National Academies of Sciences, Engineering, and Medicine. http://www.nationalacademies.org/hmd/Activities/Disease/NCPF/2018-FEB-12/Videos/Session%204%20Videos/32%20Warner.aspx
+
# Warner, Jeremy. 2018. Improving Cancer Diagnosis and Care: Patient Access to Oncologic Imaging and Pathology Expertise and Technologies. the National Cancer Policy Forum of the National Academies of Sciences, Engineering, and Medicine. http://www.nationalacademies.org/hmd/Activities/Disease/NCPF/2018-FEB-12/Videos/Session%204%20Videos/32%20Warner.aspx -->
  
 
== DeepPhe Software ==
 
== DeepPhe Software ==
The DeepPhe system will be available as part of Apache cTAKES at http://ctakes.apache.org/. It is now available at https://github.com/DeepPhe/DeepPhe-Release and https://github.com/DeepPhe/DeepPhe-Viz.
+
<!-- The DeepPhe system will be available as part of Apache cTAKES at http://ctakes.apache.org/. It is now available at https://github.com/DeepPhe/DeepPhe-Release and https://github.com/DeepPhe/DeepPhe-Viz.
  
 
DeepPhe software components will also be deployed in the TIES Software System for sharing and accessing deidentified NLP-processed data with tissue(http://ties.pitt.edu/) which is deployed as part of the TIES Cancer Tissue Network (TCRN) across multiple US Cancer Centers.
 
DeepPhe software components will also be deployed in the TIES Software System for sharing and accessing deidentified NLP-processed data with tissue(http://ties.pitt.edu/) which is deployed as part of the TIES Cancer Tissue Network (TCRN) across multiple US Cancer Centers.
  
DeepPhe software development will be coordinated as per [[DeepPhe_code_repositories_and_policies | software development policies]].
+
DeepPhe software development will be coordinated as per [[DeepPhe_code_repositories_and_policies | software development policies]]. -->
  
DeepPhe software documentation for developers is available in  
+
DeepPhe release is available in  
* [[ctakes_api | ctakes api]]
+
<!--* [[ctakes_api | ctakes api]]-->
 
* [https://github.com/DeepPhe/DeepPhe-Release code]
 
* [https://github.com/DeepPhe/DeepPhe-Release code]
 +
  
 
== DeepPhe Gold Set ==
 
== DeepPhe Gold Set ==
Line 86: Line 181:
 
* [[Public:Deidentification_Process | Process for Deidentification of Source Documents]].
 
* [[Public:Deidentification_Process | Process for Deidentification of Source Documents]].
 
* [[Gold_Set_Selection | Process for Selection of Gold Set Source Documents]].
 
* [[Gold_Set_Selection | Process for Selection of Gold Set Source Documents]].
* DeepPhe UPMC Training/Development/Test splits
+
* DepPhe Training/Development/Test splits
 
** training set:  
 
** training set:  
 
*** all documents for Breast Cancer patients 03, 11, 92, 93 for a total of 48 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
 
*** all documents for Breast Cancer patients 03, 11, 92, 93 for a total of 48 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
Line 105: Line 200:
 
* [[SEER_Project_Splits| SEER Project Train/Dev/Test Splits]]
 
* [[SEER_Project_Splits| SEER Project Train/Dev/Test Splits]]
 
* [[Clinical Genomics Gold Set | Clinical Genomics Gold Set ]]
 
* [[Clinical Genomics Gold Set | Clinical Genomics Gold Set ]]
 +
  
 
== Qualitative Interviews ==
 
== Qualitative Interviews ==
Line 111: Line 207:
 
* [[Media:Cd-quick-intro-201408110918.pdf|Contextual Design Notes]]
 
* [[Media:Cd-quick-intro-201408110918.pdf|Contextual Design Notes]]
 
* [[Informant_Interviews | Notes on interviews with informants]]
 
* [[Informant_Interviews | Notes on interviews with informants]]
 
  
  
 
== Project materials/ WIKIs to tasks ==
 
== Project materials/ WIKIs to tasks ==
* Liquid Planner link (project management): https://app.liquidplanner.com/space/26220/dashboard
+
* [[Archive]]
* [[Stakeholder_templates | Templates]] for describing stakeholders.
+
*[[UG3 Technical Details]]
* [[DeepPhe_code_repositories_and_policies | Software development policies and repositories]].
+
* [[Deep_Phe_data_repository_and_policies | Data Repository and Policies]].
+
* [[Adopted Standards and Conventions for NLP annotations | Adopted Standards and Conventions for NLP annotations (task 1.4.2)]]
+
* [[Gold_Set_Selection | Gold Set Selection]]
+
* [[Entity mention and Template Evaluation Statistics | Entity Mention and Template Evaluation Statistics]]
+
* [[Phenotype Evaluation Statistics | Phenotype Evaluation Statistics (with DeepPhe v1)]]
+
* [[Phenotype Evaluation Statistics (with DeepPhe v2) | Phenotype Evaluation Statistics (with DeepPhe v2)]]
+
* Modeling
+
**[[Phenotyping Rules]]
+
** [[Breast Cancer Model]]
+
** [[Melanoma Model]]
+
** [[Ovarian Cancer Model]]
+
** [[Cancer_phenotype_modeling_notes| Cancer phenotype modeling]] notes
+
** [[Layered cancer phenotyping]]
+
***[[Episode modeling]]
+
** [http://www.hl7.org/implement/standards/fhir/ FHIR]  modeling
+
***[[FHIR_Cancer_examples| FHIR Cancer examples]]
+
***[[FHIR_General_questions| General questions regarding FHIR usage]]
+
***[[FHIR_Unresolved| Unresolved Questions to be addressed in FHIR models]]
+
***[[FHIR and RDF]]
+
***[[FHIR Value Sets]]
+
** Domain Modeling Notes/Questions
+
*** [[Breast Cancer Domain Notes/Questions]]
+
** [[Cancer_phenotype_model_validation | Validation of models with domain experts]]
+
** [[Comptency_questions|Competency questions]] to be used for validation of models.
+
** [[Episode_questions| Analysis tasks potentially requiring episode labels]]
+
** [[Representational_issues| Representations]] of the models.
+
** Historical pages
+
*** [[CEM_Cancer_phenotype_models| CEM Cancer phenotype models]]: models describing the original CEM Models
+
** [[ValueDecomposition_issues| Value decomposition issues]] https://docs.google.com/document/d/1riAHoLRdEmp4Ah9Z8NXN-ABkcAW9nnfNXQ5_md5rgYs/edit
+
 
+
* [[Visual Analytics]]
+
**[[User_Stories | User Stories]]
+
* [[Informant Interviews]]
+
** [[User Challenges]]
+
* [[Technical Infrastructure]]
+
* [[ Deep_learing | Deep Learning]]
+
* [[ Cross_document_coreference | Cross document coreference]]
+
* [[ Summarization_Logic | Summarization Logic]]
+
* [[ Architecture | Architecture]]
+
* [[ SoftwareBestPractices | Software Best Practices]]
+
* [[ Gold_standard_annotations| Gold standard annotations]]
+
* [[ Licensing| Licensing]]
+
* [[Research_coreference| RESEARCH: Coreference]]
+
* [[Research_relations| RESEARCH: Relation extraction]]
+
* [[Research_temporality| RESEARCH: Temporal relations]]
+
* [[Research_hci| RESEARCH: Human-Computer interaction]]
+
* [[Research_birads| RESEARCH: BiRADS]]
+
* [[Demo_June_2016| Demo in June 2016]]
+
* [[SEER_Project_Tech_Req| SEER Project Technical Requirements]]
+
* [[SEER_Project_Splits| SEER Project Train/Dev/Test Splits]]
+
* [[Paper_ideas_2016| Paper Ideas 2016]]
+
* [[Year2_Goals| Year 2 goals (May 2015-April 2016)]]
+
* [[Year3_Goals| Year 3 goals and Publication Ideas (May 2016-April 2017)]]
+
* [[Year4_Goals| Year 4 goals (May 2017-April 2018)]]
+
* [[Year5_Goals| Year 5 goals (May 2018-April 2019)]]
+
 
+
 
+
'''Presentations'''
+
 
+
* How to effectively use LiquidPlanner for DeepPhe: https://www.dropbox.com/s/1f6nkhx3yxh4v9q/LiquidPlanner%20for%20Deep-Phe.pptx
+
* DeepPhe Rule Driven Architectures: https://www.dropbox.com/s/hl70zkvjs1ftt5a/DeepPhe%20Rule%20Driven%20Architectures.pptx
+
  
 
== Communication ==
 
== Communication ==
 
* Weekly team meetings
 
* Weekly team meetings
 
* Tools we use for communication are listed in our [[Communications_Plan |Communications Plan ]].
 
* Tools we use for communication are listed in our [[Communications_Plan |Communications Plan ]].
 +
  
 
== Scrum Sprints ==
 
== Scrum Sprints ==
*[[OurScrumProcess | Our Scrum Process]]
+
* [[Previous sprints]]
* [https://trello.com/ Sprint Story Boards]
+
 
* [https://docs.google.com/forms/d/1ecotVLQFwGt7ykif8P0IoAtpFl5JiHHW9s4uEXaNQ90/viewform Standup Form]
+
*[[Year1_goals_DeepPheCR | Goals DeepPhe-CR July 2019 - June 2020]]
*[[ScrumSprint_1 | Sprint 1]]
+
*[[ScrumSprint_DeepPheCR | Sprint 1 DeepPhe-CR, August 2019]]
*[[ScrumSprint_2 | Sprint 2]]
+
*[[ScrumSprint_DeepPheCR_2 | Sprint 2 DeepPhe-CR, Sept 19 - Oct 17, 2019]]
*[[ScrumSprint_3 | Sprint 3]]
+
*[[ScrumSprint_DeepPheCR_3 | Sprint 3 DeepPhe-CR, Oct 18 - Nov 14, 2019]]
*[[ScrumSprint_4 | Sprint 4]]
+
*[[ScrumSprint_DeepPheCR_4 | Sprint 4 DeepPhe-CR, Nov 14 - Jan 9, 2020]]
*[[ScrumSprint_5 | Sprint 5]]
+
*[[ScrumSprint_DeepPheCR_5 | Sprint 5 DeepPhe-CR, Jan 9 - Feb 6, 2020]]
*[[ScrumSprint_6 | Sprint 6]]
+
*[[ScrumSprint_7 | Sprint 7]]
+
*[[ScrumSprint_8 | Sprint 8]]
+
*[[ScrumSprint_9 | Sprint 9, Feb 9 - March 15, 2016]]
+
*[[ScrumSprint_10 | Sprint 10, March 15 - April 12, 2016]]
+
*[[ScrumSprint_11 | Sprint 11, April 13 - May 10, 2016]]
+
*[[ScrumSprint_12 | Sprint 12, May 11 - June 7, 2016]]
+
*[[ScrumSprint_13 | Sprint 13, June 26 - July 26, 2016]]
+
*[[ScrumSprint_14 | Sprint 14, July 26 - August 30, 2016]]
+
*[[ScrumSprint_15 | Sprint 15, August 31 - September 27, 2016]]
+
*[[ScrumSprint_16 | Sprint 16, September 27 - October 25, 2016]]
+
*[[ScrumSprint_17 | Sprint 17, October 25 -- November 29, 2016]]
+
*[[ScrumSprint_18 | Sprint 18, November 30, 2016 -- January 3, 2017]]
+
*[[ScrumSprint_19 | Sprint 19, January 3 -- January 31, 2017]]
+
*[[ScrumSprint_20 | Sprint 20, February 1 - February 28, 2017]]
+
*[[ScrumSprint_21 | Sprint 21, March 1 - April 4, 2017]]
+
*[[ScrumSprint_22 | Sprint 22, April 5 - April 26, 2017]]
+
*[[ScrumSprint_23 | Sprint 23, April 26 - May 24, 2017]]
+
*[[ScrumSprint_24 | Sprint 24, June 6 - July 11, 2017]]
+
*[[ScrumSprint_25 | Sprint 25, July 12 - Aug 16, 2017]]
+
*[[ScrumSprint_26 | Sprint 26, Aug 17 - Sept 20, 2017]]
+
*[[ScrumSprint_27 | Sprint 27, Sept 21 - Oct 18, 2017]]
+
*[[ScrumSprint_28 | Sprint 28, Oct 19 - Nov 15, 2017]]
+
*[[ScrumSprint_29 | Sprint 29, Nov 15 - Dec 13, 2017]]
+
*[[ScrumSprint_30 | Sprint 30, Dec 13, 2017 - Jan 17, 2018]]
+
*[[ScrumSprint_31 | Sprint 31, Jan 17 - Feb 14, 2018]]
+
*[[ScrumSprint_32 | Sprint 32, Feb 15 - March 14, 2018]]
+
*[[ScrumSprint_33 | Sprint 33, March 14 - April 11, 2018]]
+
*[[ScrumSprint_34 | Sprint 34, April 12 - May 9, 2018]]
+
*[[ScrumSprint_35 | Sprint 35, May 10 - June 6, 2018]]
+
*[[ScrumSprint_36 | Sprint 36, June 6 - July 11, 2018]]
+
*[[ScrumSprint_37 | Sprint 37, July 12 - August 15, 2018]]
+
*[[ScrumSprint_38 | Sprint 38, Aug 16 - Sept 12, 2018]]
+
*[[ScrumSprint_39 | Sprint 39, Sept 13 - Oct 10, 2018]]
+
*[[ScrumSprint_40 | Sprint 40, Oct 11 - Nov 14, 2018]]
+
*[[ScrumSprint_41 | Sprint 41, Nov 15 - Dec 12, 2018]]
+
*[[ScrumSprint_42 | Sprint 42, Dec 13, 2018 - Jan 9, 2019]]
+
*[[ScrumSprint_43 | Sprint 43, Jan 10 - Feb 6, 2019]]
+
*[[ScrumSprint_44 | Sprint 44, Feb 7 - March 6, 2019]]
+
*[[ScrumSprint_45 | Sprint 45, March 7 - April 3, 2019]]
+
  
 
== Meeting Notes ==
 
== Meeting Notes ==
*[[Ontology_and_Rules_DeepPhe_Meeting_01_25_2018| January 25, 2018]] Rules and Ontology Development Meeting
+
* [[Meeting notes]]
*[[Ontology_and_Rules_DeepPhe_Meeting_01_18_2018| January 18, 2018]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_01_11_2018| January 11, 2018]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_01_05_2018| January 5, 2018]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_12_21_2017| December 21, 2017]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_12_14_2017| December 14, 2017]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_11_16_2017| November 16, 2017]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_11_09_2017| November 9, 2017]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_11_02_2017| November 2, 2017]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_24_17_2017| October 24, 2017]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_10_17_2017| October 17, 2017]] Rules and Ontology Development Meeting
+
*[[Ontology_and_Rules_DeepPhe_Meeting_10_12_2017| October 12, 2017]] Rules and Ontology Development Meeting
+
*[[Melanoma_Rules_DeepPhe_Meeting_10_05_2017| October 5, 2017]] Melanoma Rules and Ontology Meeting
+
*[[Melanoma_Rules_DeepPhe_Meeting_09_28_2017| September 28, 2017]] Melanoma Rules and Ontology Meeting
+
*[[Melanoma_Rules_DeepPhe_Meeting_09_14_2017| September 14, 2017]] Melanoma Rules and Ontology Meeting
+
*[[Melanoma_Rules_DeepPhe_Meeting_09_07_2017| September 7, 2017]] Melanoma Rules and Ontology Meeting
+
*[[Melanoma_Rules_DeepPhe_Meeting_08_24_2017| August 24, 2017]] Melanoma Rules Meeting
+
*[[Melanoma_Rules_DeepPhe_Meeting_08_17_2017| August 17, 2017]] Melanoma Rules Meeting
+
*[[Melanoma_Rules_DeepPhe_Meeting_08_10_2017| August 10, 2017]] Melanoma Rules Meeting
+
*[[MelanomaRules_DeepPhe_Meeting_08032017| August 3, 2017]] Melanoma Rules Meeting
+
*[[Research_DeepPhe_Meeting_08272015| August 27, 2015]] Research Meeting
+
*[[Modeling_DeepPhe_Meeting_08032015| August 3, 2015]] Modeling Meeting
+
*[[Modeling_DeepPhe_Meeting_07202015| July 20, 2015]] Modeling Meeting
+
*[[DeepPhe_Meeting_07072015 | July 7, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_07012015 | July 1, 2015]] Scrum Sprint - 1
+
*[[DeepPhe_Meeting_06262015 | June 26, 2015]] Software architecture meeting
+
*[[DeepPhe_Meeting_06232015 | June 23, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_06092015 | June 9, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_05122015 | May 12, 2015]] Team meeting:DeepPhe demo
+
*[[DeepPhe_Meeting_05052015 | May 5, 2015]] Team meeting:DeepPhe demo
+
*[[DeepPhe_Meeting_04282015 | April 28, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_04132015 | April 13, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_03172015 | March 17, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_02232015 | February 23, 2015]] Model prioritization meeting
+
*[[DeepPhe_Meeting_02172015 | February 17, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_02032015 | February 3, 2015]] Bi-weekly team meeting
+
*[[BCH_DeepPhe_Meeting_01282015 | January 28, 2015]] BCH team meeting
+
*[[DeepPhe_Meeting_01202015 | January 20, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_01062015 | January 6, 2015]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_12092014 | December 9, 2014]] BCH team meeting
+
*[[DeepPhe_Meeting_12072014_a | December 9, 2014]] Bi-weekly team meeting
+
*[[BCH_DeepPhe_Meeting_11202014 | November 20, 2014]] BCH team meeting
+
*[[DeepPhe_Meeting_11112014_a | November 11, 2014]] Bi-weekly team meeting
+
*[[DeepPhe_Meeting_11112014 | November 11, 2014]] BCH team meeting
+
*[[DeepPhe_Meeting_11042014 | November 4, 2014]] BCH team meeting
+
*[[DeepPhe_Meeting_11032014 | November 3, 2014]] PI meeting
+
*[[DeepPhe_Meeting_10272014 | October 27, 2014]] Bi-weekly team meeting: Avillach's presentation on tranSMART, cTAKES and PCORI
+
*[[DeepPhe_Meeting_10142014 | October 14, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_09302014 | September 30, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_09022014 | September 2, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_08192014 | August 19, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_08052014 | August 5, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_07222014 | July 22, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_07162014 | July 15, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Harvard_Meeting_07102914 | July 10, 2014]] Hochheiser visit to Savova group
+
*[[DeepPhe_Meeting_06242014 | June 24, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_06102014 | June 10, 2014]] Bi-weekly team meeting: agenda and notes
+
*[[DeepPhe_Meeting_05272014 | June 3, 2014]] All hands kick-off meeting
+
*[[DeepPhe_Meeting_05082014 | May 08, 2014]] NCIP collaboration with UT (Bermstram/Xu)
+
  
== Licensing ==
 
 
[[Licensing| Licensing policies]] for DeepPhe software and ontological models.
 
  
 
== Contact ==
 
== Contact ==
 
 
If you need assistance or if you have further questions about the project, contact us at the [https://groups.google.com/forum/#!forum/deepphe DeepPhe group].
 
If you need assistance or if you have further questions about the project, contact us at the [https://groups.google.com/forum/#!forum/deepphe DeepPhe group].
 +
  
 
== Getting started ==
 
== Getting started ==

Revision as of 18:25, 8 January 2020


Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project

Our goal is to develop novel methods for information extraction to facilitate automatic/unsupervised/minimally supervised extraction of specific discrete cancer-related data from various types of unstructured electronic medical records.


Who We Are

  • Boston Childrens Hospital/Harvard Medical School
    • Guergana Savova (MPI)
    • Timothy Miller
    • Sean Finan
    • David Harris
    • Chen Lin
    • past members -- Dmitriy Dligach (currently faculty at Loyola University, Chicago), Pei Chen, James Masanz
  • University of Pittburgh
    • Harry Hochheiser (MPI)
    • Zhou Yuan
    • past members - through June 2017: Rebecca Crowley Jacobson (MPI), Roger Day, Adrian Lee, Robert Edwards, John Kirkwood, Kevin Mitchell, Eugene Tseytlin, Girish Chavan, Melissa Castine; Liz Legowski (through Jan 2015), Olga Medvedeva, Mike Davis
  • Vanderbilt University
    • Jeremy Warner (MPI)
    • Alicia Beeghly-Fadiel
  • Dana-Farber Cancer Institute
    • Elizabeth Buchbinder
  • Kentucky Cancer Registry
    • Eric Durbin (MPI)
    • Isaac Hands
    • Jong Jeong
    • Ramakanth (Rama) Kavuluru
    • David Rust
    • Lisa Witt

Funding

The project described is supported by the National Cancer Institute at the US National Institutes of Health. It is part of the NCI's Informatics Technology for Cancer Research (ITCR) Initiative (http://itcr.nci.nih.gov/) The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


Publications and presentations

Peer-reviewed publications:

    1. Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Amiri, Hadi; Bethard, Steven and Savova, Guergana. 2018. Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction. LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis. Oct 31-Nov 1, 2018. Brussels, Belgium. https://aclanthology.coli.uni-saarland.de/papers/W18-5619/w18-5619
    2. Malty, Andrew M., Jain, Sandeep K., Yang, Peter C., Harvey, Krysten, Warner, Jeremy L. Computerized approach to creating a systematic ontology of hematology/oncology regimens. JCO Clinical Cancer Informatics. 2018 May 11. http://ascopubs.org/doi/full/10.1200/CCI.17.00142
    3. Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Lin, Chen; Savova, Guergana. 2017. Towards Generalizable Entity-Centric Clinical Coreference Resolution. Journal of Biomedical Informatics. Vol. 69, May 2017, pp. 251-258. https://doi.org/10.1016/j.jbi.2017.04.015; http://www.sciencedirect.com/science/article/pii/S1532046417300850
    4. Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RS. 2017. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform. 2017 May;69:177-187. doi: 10.1016/j.jbi.2017.04.011. PMID: 28428140; PMCID: PMC5706448 [Available on 2018-05-01] DOI:10.1016/j.jbi.2017.04.011 https://www.sciencedirect.com/science/article/pii/S1532046417300813
    5. Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017. https://aclanthology.coli.uni-saarland.de/papers/W17-2341/w17-2341
    6. Miller, T; Bethard, S; Amiri, H; Savova, G. 2017. Unsupervised Domain Adaptation for Clinical Negation Detection. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017 https://aclanthology.coli.uni-saarland.de/papers/W17-2320/w17-2320
    7. Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe - A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Annual Symposium of the American Medical Informatics Association (AMIA). Nov 2017. Washington DC https://amia2017.zerista.com/event/member/389439
    8. Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Research 77(21), November 2017 DOI: 10.1158/0008-5472.CAN-17-0615. https://www.ncbi.nlm.nih.gov/pubmed/29092954
    9. Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain. https://aclanthology.coli.uni-saarland.de/papers/E17-2118/e17-2118
    10. Chen, Lin; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016 https://aclanthology.coli.uni-saarland.de/papers/W16-2914/w16-2914
    11. Hochheiser, Harry; Castine, Melissa; Harris, David; Savova, Guergana; Jacobson, Rebecca. 2016. An Information Model for Computable Cancer Phenotypes. BMC Medical Informatics and Decision Making. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-016-0358-4 https://www.ncbi.nlm.nih.gov/pubmed/27629872
    12. Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). Medical Informatics Thematic Track. December 2016, Houston, Texas, USA
    13. Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765699/
    14. Chen, Lin; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Multilayered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. 2016 Mar;23(2):387-95. doi: 10.1093/jamia/ocv113 https://www.ncbi.nlm.nih.gov/pubmed/26521301

Peer-reviewed other:

    1. Beeghly-Fadiel, Alicia; Warner, Jeremy; Finan, Sean; Masanz, James; Hochheiser, Harry; Savova, Guergana. (under review). Deep Phenotype Extraction to Facilitate Cancer Research: Extending DeepPhe to Ovarian Cancer. American Association for Cancer Research (AACR) 2019. March 29-April 3, 2019. Atlanta, GA.
    2. Yuan, Zhou; Finan, Sean; Warner, Jeremy; Savova, Guergana; Hochheiser, Harry. 2018. Toward Longitudinal Visual Analytics for Cancer Patient Trajectories Extracted from Clinical Text. 2018 Workshop on Visual Analytics and Healthcare, Demonstration Presentation. AMIA 2018, Nov 3-7, 2018. San Francisco, CA.
    3. Chen Lin, Timothy A. Miller, Hadi Amiri, David Harris, Samuel M. Rubinstein, Jeremy Warner, Guergana K. Savova, Ph.D. 2018. Classification of electronic medical records of breast cancer and melanoma patients into clinical episodes. 30th Anniversary AACR Special Conference Convergence: Artificial Intelligence, Big Data, and Prediction of Cancer. Oct 14-17, 2018. Newport, RI, USA.
    4. Warner, Jeremy; Elhadad, Noemie; Bastarache, Lisa; Gotz, David; Savova, Guergana. 2018. Panel - Didactic: Computable Longitudinal Patient Trajectories. Annual Symposium of the American Medical Informatics Association. November, 2018. San Francisco, CA. (peer-reviewed panel)
    5. Savova G, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, Warner JL, Jacobson R. DeepPhe – a natural language processing system for extracting cancer phenotypes from clinical records. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.
    6. Warner JL, Harris D, Rubinstein S, Finan S, Lin C, Miller T, Amiri H, Hochheiser H, Savova G. Capturing high-resolution temporal cancer phenotypes using DeepPhe. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.
    7. Yang PC, Malty A, Jain SK, Harvey K, Finan S, Warner JL. 2018. A Comprehensive Ontology of Hematology/Oncology Regimens. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.
    8. Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA. (peer-reviewed panel)

Invited presentations:

    1. Savova, Guergana. 2019. Cancer Deep Phenotype Extraction from Electronic Medical Records. Molecular Med Tri-con. March 10-15, 2019. San Francisco, CA, USA
    2. Savova G. 2018. Software and Research Challenges for Clinical NLP. Dana Farber Cancer Institute; 2018 October; Boston, MA, USA.
    3. Savova, Guergana. 2018. Cancer Deep Phenotype Extraction form Electronic Medical Records (DeepPhe). College of American Pathologists Pathology Electronic Reporting meeting (CAP PERT). July 29, 2018. Montreal, QB, CA.
    4. Warner, Jeremy. 2018. A Comprehensive Ontology of Hematology/Oncology Regimens. College of American Pathologists Pathology Electronic Reporting meeting (CAP PERT). July 29, 2018. Montreal, QB, CA.
    5. Savova, G; Miller, T. 2018. DeepPhe and Extraction of Oncology Patient Phenotypes from Unstructured Text Using NLP and Other AI Tools. Presentation to Dana Farber Cancer Institute. January 24 2018. Boston, MA.
    6. Warner, Jeremy. 2017. Supporting cancer registries through automated extraction of pathology and chemotherapy regimen information.” CDC/NCI/FDA/VA Clinical Natural Language Processing Workshop. Atlanta, GA.
    7. Savova, Guergana. 2017. Select Applications of Natural Language Processing in Biomedicine. Natural Language Processing Symposium, Boston University, Boston, MA. November, 2017.
    8. Jacobson, Rebecca. 2017. Invited presentation at Ohio State University James Cancer Center Grand Rounds, January 20th, 2017
    9. Jacobson, Rebecca. 2017. Invited presentation at Case Western University Comprehensive Cancer Center Seminar Series, March 10th, 2017
    10. Jacobson, Rebecca. 2016. Invited presentation of cTAKES and DeepPhe to NCI in January, 2016. Gaithersburg, MD
    11. Jacobson, Rebecca. 2016. Invited presentation in CBIIT Speaker Series, February 17, 2016. Gaithersburg, MD
    12. Jacobson, Rebecca. 2016. Invited presentation at University of Pittsburgh Cancer Informatics (UPCI) External Advisory Board, March 8, 2016
    13. Finan, Sean. 2016. cTAKES/deepPhe presentation at the ITCR workshop at CI4CC in Napa, CA
    14. Jacobson, Rebecca. 2016. Invited presentation at SEER PI meeting in New Mexico, March 16, 2016
    15. Jacobson, Rebecca. 2016. Invited presentation at University of Michigan Department of Learning Health Sciences, April 6th, 2016
    16. Jacobson, Rebecca. 2016. Invited presentation at Pathology Informatics 2016, Pittsburgh PA, May 24th, 2016
    17. Jacobson, Rebecca. 2016. Invited presentation at University of Pittsburgh Cancer Institute Scientific Retreat, Greensburg, PA, June 16th, 2016
    18. Jacobson, Rebecca and Savova, Guergana. 2016. Invited presentation at SEER meeting in Gaithersburg, MD, December 10, 2016
    19. Jacobson, Rebecca and Savova, Guergana. Invited presentation of cTAKES/DeepPhe to NCI in October, 2015

Other:


DeepPhe Software

DeepPhe release is available in


DeepPhe Gold Set

  • Process for Deidentification of Source Documents.
  • Process for Deidentification of Source Documents.
  • Process for Deidentification of Source Documents.
  • Process for Selection of Gold Set Source Documents.
  • DepPhe Training/Development/Test splits
    • training set:
      • all documents for Breast Cancer patients 03, 11, 92, 93 for a total of 48 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Breast Cancer patients extended 04,05,06,09,10,12,13,14,18,19,20,22,23,26,27,30,31,32,33,34,35,40,41,42,43,38,39,46,47 for a total of 954 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Melanoma patients 05, 06, 18, 19, 25, 28, 30, 33, 34, 42, for a total of 233 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\trainSet\DeepPhe DevSet Phenotype Annotations.xlsm
      • all documents for Ovarian Cancer patients 3, 4, 7, 8, 12, 13, 16, 17, 18, 20, 24, 25, 26, 27, 30, 31, 32, 34, 37, 38, 41, 42, 43, 44, 46, 48 for a total of 1675 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\trainSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\trainSet\DeepPhe_ovCa_Train_Set_Phenotype_Annotations_GOLD.xlsm
    • development set:
      • all documents for Breast Cancer patients 02, 21 for a total of 42 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Breast Cancer patients extended 01,15,16,17,28,29,36,37,44,45,07,08,24,25 for a total of 457 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedDev\DeepPhe Gold Phenotype Annotations_v2.xlsm
      • all documents for Melanoma patients 07, 32, 43 for a total of 215 (processed only 211 docs) documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\devSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\devSet\DeepPhe DevSet Phenotype Annotations.xlsm
      • all documents for Ovarian Cancer patients 9, 11, 19, 28, 29, 35, 39, 47 for a total of 562 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\devSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\devSet\DeepPhe_ovCa_Dev_Set_Phenotype_Annotations_GOLD.xlsm
    • test set:
      • all documents for Breast Cancer patients 01 (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\DeepPhe Test Phenotype Annotations v2.xlsm
      • all documents for Breast Cancer extended for patients 01, 02, 09,10,12,15,17,18,19,20,23,24,27,32,36,39,44,63, 76, 100, 101, 104, 106, 109, 111, 114, 115, 117, 118, 119, 120, 121, 123, 125, 126, 129, 130, 132, 136, 137, 138, 142, 143, 155, 156, 158, 174, 181, 189, 197 for phenotyping level testing use (\\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\breast\UPMCextendedTest\DeepPhe Test Phenotype Annotations v2.xlsm
      • all documents for Melanoma patients 02, 03, 11, 12, 14, 16, 24, 27, 41, 44 for a total of 229 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\testSet); gold annotations are \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\melanoma\testSet\DeepPhe TestSet Phenotype Annotations.xlsm
      • all documents for Ovarian Cancer patients 15, 21, 33, 36, 40, 45, 49, 50 for a total of 559 documents (in BCH \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\testSet); gold annotations are in \\rc-fs\chip-nlp\Public\DeepPhe\DeepPheDatasets\ovarian\final_dataset\testSet\DeepPhe_ovCa_Test_Set_Phenotype_Annotations_GOLD.xlsm
    • use the training set for developing the algorithms and the development set to report results and error analysis. The test set will be used only for the final evaluation to go in publications.
  • SEER Project Train/Dev/Test Splits
  • Clinical Genomics Gold Set


Qualitative Interviews


Project materials/ WIKIs to tasks

Communication


Scrum Sprints

Meeting Notes


Contact

If you need assistance or if you have further questions about the project, contact us at the DeepPhe group.


Getting started

Consult the User's Guide for information on using the wiki software.