User Personae
Presented here is a series of stakeholder or user descriptions - referred to here as personae, which informed preliminary development of the cancer models.
Contents
- 1 Translational Scientist with “Dry Bench” Bioinformatics skills
- 2 Clinical Translational Scientist
- 3 Population Health Scientist/Health Care Outcomes Analyst
- 4 Information Broker
- 5 Informatics Researcher
- 6 NLP Developers
- 7 Domain Specific Application Developers
- 8 Integrative Cancer Biologists and Modelers
Translational Scientist with “Dry Bench” Bioinformatics skills
Background
- PhD trained scientist in wide range of fields relevant to cancer (e.g. genetics, pharmacology, molecular biology, immunology)
- Analytically trained and familiar with statistical methods, including genomics/bioinformatics.
- Unfamiliar with Natural Language Processing (NLP) Concepts
- Unfamiliar with NLP tools and resources
- Limited familiarity with OO programming languages
- Familiar with text manipulation languages ( e.g. Python, Perl, Ruby)
Premise/Story
Cancer biologists are unraveling the genomic and molecular changes that drive tumors towards specific behaviors such as progression and metastasis. Identifying these molecular drivers will require information about the specific cancer behaviors that they produce. This class of users will examine data for case finding and to classify cases based on outcome.
Expectations
- Population-level statistics, summarization, and comparisons.
- Graphical displays, including bar charts, error bars, etc.
- Inferential statistics
- Export to statistical software (SAS,SPSS,RapidMiner, R) ###Information needs
- Demographic data
- Treatment data
- Disease progression, metastasis and other outcomes (e.g. RECIST criteria)
- Available biomarkers and other clinical molecular information not in structured format (e.g. Oncotype Scores) ###Current tools and limitations
- Mac desktop, Linux and Windows computing
- Some familiarity with DBMS and data management principles
- Knowledge and use of statistical software (e.g. SAS, SPSS, RapidMiner, R), but time required to extract and format data is substantial.
- Routine access to PHI clinical text for work, able to interpret clinical text reports, but in-depth review is too error-prone and time-consuming.
Clinical Translational Scientist
Background
- MD, DrPH, or RN trained scientist in wide range of clinical specialties (e.g. oncology, surgery, medicine, pathology, epidemiology)
- Expert understanding of clinical oncology, cancer therapeutics, and patient management
- Familiar with statistical methods, including genomics/bioinformatics, but typically relies on statisticians, bioinformaticists and other collaborators and colleagues for analysis.
- Unfamiliar with NLP Concepts
- Unfamiliar with NLP tools and resources
- Unfamiliar with OO programming languages
- Unfamiliar with text manipulation languages ( e.g. Python, Perl, Ruby)
Premise/story
This class of users is interested in extracting phenotype features from a set of documents, often for correlating specific features, treatments, and/or outcomes with molecular characterizations of tumors. They may also be interested in risk factors and other co-morbidities that provide insight into cancer biology (e.g. immune function). Cares more about the quality of the phenotype information in relation to the quality of the scientific conclusions to be developed,rather than the details of the phenotype extraction methods.
Expectations
- Basic inferential statistics, summarization, and comparisons.
- Graphical displays, including bar charts, error bars, etc.
Information needs
- Demographic data
- Treatment data
- Disease progression, metastasis and other outcomes (e.g. RECIST criteria)
- Available biomarkers and other clinical molecular information not in structured format (e.g. Oncotype Scores)
Current tools and limitations
- Enterprise Windows computing
- Limited familiarity with DBMS and data management principles
- Use of statistical software (e.g. SAS, SPSS, RapidMiner, R), but time required to extract and format data is substantial.
- Routine access to PHI clinical text for work, able to interpret clinical text reports. Most able to interpret complex temporal and other relations in text, but in-depth review is too error-prone and time-consuming.
Population Health Scientist/Health Care Outcomes Analyst
Background
- Broad possibilities, including MD, PhD, MPH/DrPH, MBA.
- Analytically trained and familiar with statistical methods, but not necessarily in genomics/bioinformatics.
- Unfamiliar with NLP Concepts
- Unfamiliar with NLP tools and resources
- Unfamiliar with OO programming languages
- Unfamiliar with text manipulation languages ( e.g. Python, Perl, Ruby)
- Routine access to PHI clinical text for work, easily interprets clinical text reports
- Some familiarity with DBMS and data management principles
- Cares more about accuracy of results, confidence in results, summary information, pointers to WHY particular calls were made (chain of evidence) than the details of the implementation.
Premise/story
Cancer care has significant implications in terms of costs of care, effectiveness of different treatments, and values assigned to different outcomes. This class of users will examine data to study efficacy of treatment regimes across different patient groups, to identify factors that might influence costs or improve outcomes, and to otherwise understand how to allocate limited resources to optimize outcomes.
Expectations
- Population-level statistics, summarization, and comparisons.
- Graphical displays, including bar charts, error bars, etc.
- Basic inferential statistics
- Export to statistical software (SAS,SPSS, S+, RapidMiner, R)
Information needs
- Demographic data
- Treatment data
- Disease progress
- Outcomes
- Treatment context: physician, ward, etc.
Current tools and limitations
- Windows-based enterprise computing
- Some familiarity with DBMS and data management principles
- Knowledge and use of statistical software (e.g. SAS, SPSS, S+, RapidMiner), but time required to extract and format data is substantial.
- Routine access to PHI clinical text for work, able to interpret clinical text reports, but in-depth review is too error-prone and time-consuming.
Information Broker
Background
- Unfamiliar with NLP Concepts
- Unfamiliar with NLP tools and resources
- Unfamiliar with OO programming languages
- Unfamiliar with text manipulation languages (e.g. Python, Perl, Ruby)
- Familiar with DBMS and data management principles
- Limited or no ability to interpret clinical text reports
- Premise/story
This class is comprised of users who are employed at medical research institutions, educational institutions and software companies interested in the healthcare domain. Their daily job involves oversight of data storage solutions. They work with other user groups to identify data requirements and design data storage format, media, and access. They may be open to using existing tools and methods as part of their solutions, but prefer to stick to their established tools for and types of filesystems, shares, databases, etc., as well as their currently used methods for storage and retrieval. Their integration would be limited to accepting NLP output and storing it, possibly in a modified form, or providing mechanisms for NLP output providers to do so themselves. The Information Broker may act as a middleman between NLP output providers and NLP output end users. Success, by the Information Broker's definition, is a simple and consistent tool or method for the migration of NLP output data to a store from which that data is easily accessible by end users.
Expectations
- Documentation about the types and formats of NLP output.
- NLP output must be easy to integrate into their own data storage formats.
Information needs
- Information needs are dictated by the end users.
Information constraints
- The Information Broker should be able to work with a wide variety of standard data types and formats.
- The Information Broker should not be expected to work with any data type and format not of their choosing.
Current tools and limitations
- May not have consistent or in-depth communication with end users.
- May not have a complete understanding of what types of data end users may desire.
- Wide array of system environments, databases and tools may be used.
- Lack of standard NLP output data types make it hard to create a universal storage format.
- Lack of standard NLP output data formats make it hard to identify or create universal tools and methods for NLP data consumption.
Informatics Researcher
Background
- Familiar with NLP Concepts
- Familiar with NLP tools and resources
- Familiar with OO programming languages, but typically not expert
- Familiar with text manipulation languages ( e.g. Python, Perl, Ruby), but typically not expert
- Limited or no ability to interpret clinical text reports *Familiar with basic statistical analyses
Premise/story
This class is comprised of users who are employed at medical research institutions, educational institutions, companies in the medical industry and software companies interested in the healthcare domain. They are interested in research and development in the NLP domain, which may or may not be their full-time job. Specific Interests will vary from NLP efficiency to output coverage and accuracy. Their goal is to run software as a finished product in order to develop and test new applications of technology, and they are more interested in assembling workflows and changing parameters than modifying and extending code.
Expectations
- Documentation about the types and formats of NLP input and output.
- Documentation about the NLP software and model implementation.
- Documentation about NLP system tests and results.
- Documentation about the available NLP software configuration parameters.
- NLP software is easy to install, configure, and run "out of the box".
Information needs
- UMLS type Named Entities
- Temporal Expressions
- Temporal Events
- Temporal Relations
- Information constraints
- The Informatics Researcher must be able to utilize at least one standard cTakes data format: UIMA Cas, XMI file.
Current tools and limitations
- Wide array of system environments and tools may be used.
- May not have experience with cTakes or UIMA.
NLP Developers
Background
- 2-3 years experience working as a programmer, at least 6 months in the NLP domain.
- Familiar with NLP Concepts
- Familiar with NLP tools and resources
- Familiar with programming languages
- Limited or no ability to interpret clinical text reports
- Familiar with basic statistical analyses, particularly with respect to evaluation of NLP models.
Premise/story
This class of users are employed at medical research institutions, educational institutions and software companies interested in the healthcare domain. Their daily job involves writing NLP algorithms and systems to extract information from free text. They work with other user groups like domain scientists to identify the data source and type of information that needs extraction. To make their life easier, they are willing to to get under the hood, modify and extend existing software, modify algorithms and extraction targets to suit their needs.
Expectations
- Open source software
- Documentation about the features and limitations of the tools/libraries.
- Tutorials and instructions on configuring/customizing the software to their needs.
- Software must be easy to integrate into their own code.
- Tutorials and code samples on extending the software.
Current tools and limitations
- GATE, UIMA, various NLP algorithms and libraries.
- Lack of standardization makes it hard to integrate and use different tools and libraries with each other.
Domain Specific Application Developers
Background
- 2-3 years experience working as a programmer.
- Unfamiliar with NLP Concepts
- Unfamiliar with NLP tools and resources
- Familiar with programming languages
- Limited or no ability to interpret clinical text reports
Premise/story
This class of users are employed at medical research institutions, educational institutions and software companies interested in the healthcare domain. Their daily job involves writing code for software applications/tools. They work with other user groups to identify the requirements and design software solutions. They are open to using existing tools/libraries as part of their solutions. Their integration would be limited to using the NLP modules as black boxes, writing input and output translators to plugin these modules into their solutions.
Expectations
- Documentation about the features and limitations of the tools/libraries.
- Tutorials and instructions on configuring/customizing the software to their needs.
- Software must be easy to integrate into their own code.
Current tools and limitations
- Wide array of IDEs and libraries may be used.
- Lack of standardization makes it hard to integrate and use different tools and libraries with each other.
Integrative Cancer Biologists and Modelers
Background
- MD, PhD, MPH/DrPH.
- Broad knowledge of cancer biology
- Analytically trained and familiar with statistical and modeling methods
- Some knowledge of genomics/bioinformatics data and analysis methods.
- Familiar with OO and other programming languages
- Unfamiliar with NLP Concepts
- Unfamiliar with NLP tools and resources
- May not be familiar with text manipulation languages ( e.g. Python, Perl, Ruby)
- May not be familiarity with DBMS and data management principles
- Cares more about accuracy of results, confidence in results, summary information, pointers to WHY particular calls were made (chain of evidence) than the details of the implementation.
- Limited or no ability to interpret clinical text reports
Premise/story
The user (or team) could tackle questions about the evolutionary history of tumors in response to treatment. Metastasis is usually the killer of cancer patients. Usually treatment resistance mechanisms appear and tumors become ever more refractory. Thus a combination of systems biology and tumor population dynamics modeling could address important treatment questions, if detailed longitudinal clinical data are coupled to tumor genomics.
Expectations
- Ability to download or otherwise access data in a form suitable for processing.
Information needs
- Demographic, clinico-pathologic and comorbidity data
- Detailed treatment data
- Detailed disease progression data, including censoring dates
- Outcomes
- Genomic and other bioinformatics data when available.