- 1 Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project
- 2 Who We Are
- 3 Funding
- 4 Publications and presentations
- 5 DeepPhe Software
- 6 DeepPhe Gold Set
- 7 Qualitative Interviews
- 8 Project materials/ WIKIs to tasks
- 9 Communication
- 10 Scrum Sprints
- 11 Meeting Notes
- 12 Contact
- 13 Getting started
Welcome to the Cancer Deep Phenotype Extraction (DeepPhe) project
Our goal is to develop novel methods for information extraction to facilitate automatic/unsupervised/minimally supervised extraction of specific discrete cancer-related data from various types of unstructured electronic medical records.
Who We Are
- Boston Childrens Hospital/Harvard Medical School
- Guergana Savova (MPI)
- Timothy Miller
- Sean Finan
- David Harris
- Chen Lin
- past members -- Dmitriy Dligach (currently faculty at Loyola University, Chicago), Pei Chen, James Masanz
- University of Pittburgh
- Harry Hochheiser (MPI)
- Zhou Yuan
- past members - through June 2017: Rebecca Crowley Jacobson (MPI), Roger Day, Adrian Lee, Robert Edwards, John Kirkwood, Kevin Mitchell, Eugene Tseytlin, Girish Chavan, Melissa Castine; Liz Legowski (through Jan 2015), Olga Medvedeva, Mike Davis
- Vanderbilt University
- Jeremy Warner (MPI)
- Alicia Beeghly-Fadiel
- Dana-Farber Cancer Institute
- Elizabeth Buchbinder
- Kentucky Cancer Registry
- Eric Durbin (MPI)
- Isaac Hands
- Jong Jeong
- Ramakanth (Rama) Kavuluru
- David Rust
- Lisa Witt
The project described is supported by the National Cancer Institute at the US National Institutes of Health. It is part of the NCI's Informatics Technology for Cancer Research (ITCR) Initiative (http://itcr.nci.nih.gov/) The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publications and presentations
- Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Amiri, Hadi; Bethard, Steven and Savova, Guergana. 2018. Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction. LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis. Oct 31-Nov 1, 2018. Brussels, Belgium. https://aclanthology.coli.uni-saarland.de/papers/W18-5619/w18-5619
- Malty, Andrew M., Jain, Sandeep K., Yang, Peter C., Harvey, Krysten, Warner, Jeremy L. Computerized approach to creating a systematic ontology of hematology/oncology regimens. JCO Clinical Cancer Informatics. 2018 May 11. http://ascopubs.org/doi/full/10.1200/CCI.17.00142
- Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Lin, Chen; Savova, Guergana. 2017. Towards Generalizable Entity-Centric Clinical Coreference Resolution. Journal of Biomedical Informatics. Vol. 69, May 2017, pp. 251-258. https://doi.org/10.1016/j.jbi.2017.04.015; http://www.sciencedirect.com/science/article/pii/S1532046417300850
- Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RS. 2017. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform. 2017 May;69:177-187. doi: 10.1016/j.jbi.2017.04.011. PMID: 28428140; PMCID: PMC5706448 [Available on 2018-05-01] DOI:10.1016/j.jbi.2017.04.011 https://www.sciencedirect.com/science/article/pii/S1532046417300813
- Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017. https://aclanthology.coli.uni-saarland.de/papers/W17-2341/w17-2341
- Miller, T; Bethard, S; Amiri, H; Savova, G. 2017. Unsupervised Domain Adaptation for Clinical Negation Detection. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017 https://aclanthology.coli.uni-saarland.de/papers/W17-2320/w17-2320
- Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe - A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Annual Symposium of the American Medical Informatics Association (AMIA). Nov 2017. Washington DC https://amia2017.zerista.com/event/member/389439
- Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. 2017. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Research 77(21), November 2017 DOI: 10.1158/0008-5472.CAN-17-0615. https://www.ncbi.nlm.nih.gov/pubmed/29092954
- Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain. https://aclanthology.coli.uni-saarland.de/papers/E17-2118/e17-2118
- Chen, Lin; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016 https://aclanthology.coli.uni-saarland.de/papers/W16-2914/w16-2914
- Hochheiser, Harry; Castine, Melissa; Harris, David; Savova, Guergana; Jacobson, Rebecca. 2016. An Information Model for Computable Cancer Phenotypes. BMC Medical Informatics and Decision Making. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-016-0358-4 https://www.ncbi.nlm.nih.gov/pubmed/27629872
- Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). Medical Informatics Thematic Track. December 2016, Houston, Texas, USA
- Dmitriy Dligach, Timothy Miller, Guergana K. Savova. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765699/
- Chen, Lin; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Multilayered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. 2016 Mar;23(2):387-95. doi: 10.1093/jamia/ocv113 https://www.ncbi.nlm.nih.gov/pubmed/26521301
- Beeghly-Fadiel, Alicia; Warner, Jeremy; Finan, Sean; Masanz, James; Hochheiser, Harry; Savova, Guergana. (under review). Deep Phenotype Extraction to Facilitate Cancer Research: Extending DeepPhe to Ovarian Cancer. American Association for Cancer Research (AACR) 2019. March 29-April 3, 2019. Atlanta, GA.
- Yuan, Zhou; Finan, Sean; Warner, Jeremy; Savova, Guergana; Hochheiser, Harry. 2018. Toward Longitudinal Visual Analytics for Cancer Patient Trajectories Extracted from Clinical Text. 2018 Workshop on Visual Analytics and Healthcare, Demonstration Presentation. AMIA 2018, Nov 3-7, 2018. San Francisco, CA.
- Chen Lin, Timothy A. Miller, Hadi Amiri, David Harris, Samuel M. Rubinstein, Jeremy Warner, Guergana K. Savova, Ph.D. 2018. Classification of electronic medical records of breast cancer and melanoma patients into clinical episodes. 30th Anniversary AACR Special Conference Convergence: Artificial Intelligence, Big Data, and Prediction of Cancer. Oct 14-17, 2018. Newport, RI, USA.
- Warner, Jeremy; Elhadad, Noemie; Bastarache, Lisa; Gotz, David; Savova, Guergana. 2018. Panel - Didactic: Computable Longitudinal Patient Trajectories. Annual Symposium of the American Medical Informatics Association. November, 2018. San Francisco, CA. (peer-reviewed panel)
- Savova G, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, Warner JL, Jacobson R. DeepPhe – a natural language processing system for extracting cancer phenotypes from clinical records. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.
- Warner JL, Harris D, Rubinstein S, Finan S, Lin C, Miller T, Amiri H, Hochheiser H, Savova G. Capturing high-resolution temporal cancer phenotypes using DeepPhe. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.
- Yang PC, Malty A, Jain SK, Harvey K, Finan S, Warner JL. 2018. A Comprehensive Ontology of Hematology/Oncology Regimens. Annual conference of the North American Association of Central Cancer Registries (NAACCR). Pittsburgh, PA.
- Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA. (peer-reviewed panel)
- Savova, Guergana. 2019. Cancer Deep Phenotype Extraction from Electronic Medical Records. Molecular Med Tri-con. March 10-15, 2019. San Francisco, CA, USA
- Savova G. 2018. Software and Research Challenges for Clinical NLP. Dana Farber Cancer Institute; 2018 October; Boston, MA, USA.
- Savova, Guergana. 2018. Cancer Deep Phenotype Extraction form Electronic Medical Records (DeepPhe). College of American Pathologists Pathology Electronic Reporting meeting (CAP PERT). July 29, 2018. Montreal, QB, CA.
- Warner, Jeremy. 2018. A Comprehensive Ontology of Hematology/Oncology Regimens. College of American Pathologists Pathology Electronic Reporting meeting (CAP PERT). July 29, 2018. Montreal, QB, CA.
- Savova, G; Miller, T. 2018. DeepPhe and Extraction of Oncology Patient Phenotypes from Unstructured Text Using NLP and Other AI Tools. Presentation to Dana Farber Cancer Institute. January 24 2018. Boston, MA.
- Warner, Jeremy. 2017. Supporting cancer registries through automated extraction of pathology and chemotherapy regimen information.” CDC/NCI/FDA/VA Clinical Natural Language Processing Workshop. Atlanta, GA.
- Savova, Guergana. 2017. Select Applications of Natural Language Processing in Biomedicine. Natural Language Processing Symposium, Boston University, Boston, MA. November, 2017.
- Jacobson, Rebecca. 2017. Invited presentation at Ohio State University James Cancer Center Grand Rounds, January 20th, 2017
- Jacobson, Rebecca. 2017. Invited presentation at Case Western University Comprehensive Cancer Center Seminar Series, March 10th, 2017
- Jacobson, Rebecca. 2016. Invited presentation of cTAKES and DeepPhe to NCI in January, 2016. Gaithersburg, MD
- Jacobson, Rebecca. 2016. Invited presentation in CBIIT Speaker Series, February 17, 2016. Gaithersburg, MD
- Jacobson, Rebecca. 2016. Invited presentation at University of Pittsburgh Cancer Informatics (UPCI) External Advisory Board, March 8, 2016
- Finan, Sean. 2016. cTAKES/deepPhe presentation at the ITCR workshop at CI4CC in Napa, CA
- Jacobson, Rebecca. 2016. Invited presentation at SEER PI meeting in New Mexico, March 16, 2016
- Jacobson, Rebecca. 2016. Invited presentation at University of Michigan Department of Learning Health Sciences, April 6th, 2016
- Jacobson, Rebecca. 2016. Invited presentation at Pathology Informatics 2016, Pittsburgh PA, May 24th, 2016
- Jacobson, Rebecca. 2016. Invited presentation at University of Pittsburgh Cancer Institute Scientific Retreat, Greensburg, PA, June 16th, 2016
- Jacobson, Rebecca and Savova, Guergana. 2016. Invited presentation at SEER meeting in Gaithersburg, MD, December 10, 2016
- Jacobson, Rebecca and Savova, Guergana. Invited presentation of cTAKES/DeepPhe to NCI in October, 2015
- Interview with Uduak Thomas of the GenomeWeb magazine. May 16, 2014. https://www.genomeweb.com/informatics/upitt-bch-team-use-696k-grant-develop-nlp-based-tools-extract-phenotype-data-emr#.W3HF1NJKi70
- Project website: cancer.healhnlp.org
- Github repository: https://github.com/DeepPhe
- Listed on the ITCR website, Tools: https://itcr.cancer.gov/informatics-tools
DeepPhe release is available in
DeepPhe Gold Set
- Process for Deidentification of Source Documents.
- Process for Deidentification of Source Documents.
- Process for Deidentification of Source Documents.
- Process for Selection of Gold Set Source Documents.
- DepPhe Training/Development/Test splits
- training set:
- development set:
- test set:
- use the training set for developing the algorithms and the development set to report results and error analysis. The test set will be used only for the final evaluation to go in publications.
- training set:
- SEER Project Train/Dev/Test Splits
- Clinical Genomics Gold Set
- Detailed Stakeholder Descriptions.
- Interview Protocol
- Contextual Design Notes
- Notes on interviews with informants
Project materials/ WIKIs to tasks
- Weekly team meetings
- Tools we use for communication are listed in our Communications Plan .
- Goals DeepPhe-CR July 2019 - June 2020
- Sprint 1 DeepPhe-CR, August 2019
- Sprint 2 DeepPhe-CR, Sept 19 - Oct 17, 2019
If you need assistance or if you have further questions about the project, contact us at the DeepPhe group.
Consult the User's Guide for information on using the wiki software.