User Personae

Presented here is a series of stakeholder or user descriptions - referred to here as personae, which informed preliminary development of the cancer models.

1 Translational Scientist with “Dry Bench” Bioinformatics skills
2 Clinical Translational Scientist
3 Population Health Scientist/Health Care Outcomes Analyst
4 Information Broker
5 Informatics Researcher
6 NLP Developers
7 Domain Specific Application Developers
8 Integrative Cancer Biologists and Modelers

Translational Scientist with “Dry Bench” Bioinformatics skills

Background

PhD trained scientist in wide range of fields relevant to cancer (e.g. genetics, pharmacology, molecular biology, immunology)
Analytically trained and familiar with statistical methods, including genomics/bioinformatics.
Unfamiliar with Natural Language Processing (NLP) Concepts
Unfamiliar with NLP tools and resources
Limited familiarity with OO programming languages
Familiar with text manipulation languages ( e.g. Python, Perl, Ruby)

Premise/Story

Cancer biologists are unraveling the genomic and molecular changes that drive tumors towards specific behaviors such as progression and metastasis. Identifying these molecular drivers will require information about the specific cancer behaviors that they produce. This class of users will examine data for case finding and to classify cases based on outcome.

Expectations

Population-level statistics, summarization, and comparisons.
Graphical displays, including bar charts, error bars, etc.
Inferential statistics
Export to statistical software (SAS,SPSS,RapidMiner, R) ###Information needs
Demographic data
Treatment data
Disease progression, metastasis and other outcomes (e.g. RECIST criteria)
Available biomarkers and other clinical molecular information not in structured format (e.g. Oncotype Scores) ###Current tools and limitations
Mac desktop, Linux and Windows computing
Some familiarity with DBMS and data management principles
Knowledge and use of statistical software (e.g. SAS, SPSS, RapidMiner, R), but time required to extract and format data is substantial.
Routine access to PHI clinical text for work, able to interpret clinical text reports, but in-depth review is too error-prone and time-consuming.

Clinical Translational Scientist

Background

MD, DrPH, or RN trained scientist in wide range of clinical specialties (e.g. oncology, surgery, medicine, pathology, epidemiology)
Expert understanding of clinical oncology, cancer therapeutics, and patient management
Familiar with statistical methods, including genomics/bioinformatics, but typically relies on statisticians, bioinformaticists and other collaborators and colleagues for analysis.
Unfamiliar with NLP Concepts
Unfamiliar with NLP tools and resources
Unfamiliar with OO programming languages
Unfamiliar with text manipulation languages ( e.g. Python, Perl, Ruby)

Premise/story

This class of users is interested in extracting phenotype features from a set of documents, often for correlating specific features, treatments, and/or outcomes with molecular characterizations of tumors. They may also be interested in risk factors and other co-morbidities that provide insight into cancer biology (e.g. immune function). Cares more about the quality of the phenotype information in relation to the quality of the scientific conclusions to be developed,rather than the details of the phenotype extraction methods.

Expectations

Basic inferential statistics, summarization, and comparisons.
Graphical displays, including bar charts, error bars, etc.

Information needs

Demographic data
Treatment data
Disease progression, metastasis and other outcomes (e.g. RECIST criteria)
Available biomarkers and other clinical molecular information not in structured format (e.g. Oncotype Scores)

Current tools and limitations

Enterprise Windows computing
Limited familiarity with DBMS and data management principles
Use of statistical software (e.g. SAS, SPSS, RapidMiner, R), but time required to extract and format data is substantial.
Routine access to PHI clinical text for work, able to interpret clinical text reports. Most able to interpret complex temporal and other relations in text, but in-depth review is too error-prone and time-consuming.

Population Health Scientist/Health Care Outcomes Analyst

Background

Broad possibilities, including MD, PhD, MPH/DrPH, MBA.
Analytically trained and familiar with statistical methods, but not necessarily in genomics/bioinformatics.
Unfamiliar with NLP Concepts
Unfamiliar with NLP tools and resources
Unfamiliar with OO programming languages
Unfamiliar with text manipulation languages ( e.g. Python, Perl, Ruby)
Routine access to PHI clinical text for work, easily interprets clinical text reports
Some familiarity with DBMS and data management principles
Cares more about accuracy of results, confidence in results, summary information, pointers to WHY particular calls were made (chain of evidence) than the details of the implementation.

Premise/story

Cancer care has significant implications in terms of costs of care, effectiveness of different treatments, and values assigned to different outcomes. This class of users will examine data to study efficacy of treatment regimes across different patient groups, to identify factors that might influence costs or improve outcomes, and to otherwise understand how to allocate limited resources to optimize outcomes.

Expectations

Population-level statistics, summarization, and comparisons.
Graphical displays, including bar charts, error bars, etc.
Basic inferential statistics
Export to statistical software (SAS,SPSS, S+, RapidMiner, R)

Information needs

Demographic data
Treatment data
Disease progress
Outcomes
Treatment context: physician, ward, etc.

Current tools and limitations

Windows-based enterprise computing
Some familiarity with DBMS and data management principles
Knowledge and use of statistical software (e.g. SAS, SPSS, S+, RapidMiner), but time required to extract and format data is substantial.
Routine access to PHI clinical text for work, able to interpret clinical text reports, but in-depth review is too error-prone and time-consuming.

Information Broker

Background

Unfamiliar with NLP Concepts
Unfamiliar with NLP tools and resources
Unfamiliar with OO programming languages
Unfamiliar with text manipulation languages (e.g. Python, Perl, Ruby)
Familiar with DBMS and data management principles
Limited or no ability to interpret clinical text reports
Premise/story

This class is comprised of users who are employed at medical research institutions, educational institutions and software companies interested in the healthcare domain. Their daily job involves oversight of data storage solutions. They work with other user groups to identify data requirements and design data storage format, media, and access. They may be open to using existing tools and methods as part of their solutions, but prefer to stick to their established tools for and types of filesystems, shares, databases, etc., as well as their currently used methods for storage and retrieval. Their integration would be limited to accepting NLP output and storing it, possibly in a modified form, or providing mechanisms for NLP output providers to do so themselves. The Information Broker may act as a middleman between NLP output providers and NLP output end users. Success, by the Information Broker's definition, is a simple and consistent tool or method for the migration of NLP output data to a store from which that data is easily accessible by end users.

Expectations

Documentation about the types and formats of NLP output.
NLP output must be easy to integrate into their own data storage formats.

Information needs

Information needs are dictated by the end users.

Information constraints

The Information Broker should be able to work with a wide variety of standard data types and formats.
The Information Broker should not be expected to work with any data type and format not of their choosing.

Current tools and limitations

May not have consistent or in-depth communication with end users.
May not have a complete understanding of what types of data end users may desire.
Wide array of system environments, databases and tools may be used.
Lack of standard NLP output data types make it hard to create a universal storage format.
Lack of standard NLP output data formats make it hard to identify or create universal tools and methods for NLP data consumption.

Informatics Researcher

Background

Familiar with NLP Concepts
Familiar with NLP tools and resources
Familiar with OO programming languages, but typically not expert
Familiar with text manipulation languages ( e.g. Python, Perl, Ruby), but typically not expert
Limited or no ability to interpret clinical text reports *Familiar with basic statistical analyses

Premise/story

This class is comprised of users who are employed at medical research institutions, educational institutions, companies in the medical industry and software companies interested in the healthcare domain. They are interested in research and development in the NLP domain, which may or may not be their full-time job. Specific Interests will vary from NLP efficiency to output coverage and accuracy. Their goal is to run software as a finished product in order to develop and test new applications of technology, and they are more interested in assembling workflows and changing parameters than modifying and extending code.

Expectations

Documentation about the types and formats of NLP input and output.
Documentation about the NLP software and model implementation.
Documentation about NLP system tests and results.
Documentation about the available NLP software configuration parameters.
NLP software is easy to install, configure, and run "out of the box".

Information needs

UMLS type Named Entities
Temporal Expressions
Temporal Events
Temporal Relations
Information constraints

The Informatics Researcher must be able to utilize at least one standard cTakes data format: UIMA Cas, XMI file.

Current tools and limitations

Wide array of system environments and tools may be used.
May not have experience with cTakes or UIMA.

NLP Developers

Background

2-3 years experience working as a programmer, at least 6 months in the NLP domain.
Familiar with NLP Concepts
Familiar with NLP tools and resources
Familiar with programming languages
Limited or no ability to interpret clinical text reports
Familiar with basic statistical analyses, particularly with respect to evaluation of NLP models.

Premise/story

This class of users are employed at medical research institutions, educational institutions and software companies interested in the healthcare domain. Their daily job involves writing NLP algorithms and systems to extract information from free text. They work with other user groups like domain scientists to identify the data source and type of information that needs extraction. To make their life easier, they are willing to to get under the hood, modify and extend existing software, modify algorithms and extraction targets to suit their needs.

Expectations

Open source software
Documentation about the features and limitations of the tools/libraries.
Tutorials and instructions on configuring/customizing the software to their needs.
Software must be easy to integrate into their own code.
Tutorials and code samples on extending the software.

Current tools and limitations

GATE, UIMA, various NLP algorithms and libraries.
Lack of standardization makes it hard to integrate and use different tools and libraries with each other.

Domain Specific Application Developers

Background

2-3 years experience working as a programmer.
Unfamiliar with NLP Concepts
Unfamiliar with NLP tools and resources
Familiar with programming languages
Limited or no ability to interpret clinical text reports

Premise/story

This class of users are employed at medical research institutions, educational institutions and software companies interested in the healthcare domain. Their daily job involves writing code for software applications/tools. They work with other user groups to identify the requirements and design software solutions. They are open to using existing tools/libraries as part of their solutions. Their integration would be limited to using the NLP modules as black boxes, writing input and output translators to plugin these modules into their solutions.

Expectations

Documentation about the features and limitations of the tools/libraries.
Tutorials and instructions on configuring/customizing the software to their needs.
Software must be easy to integrate into their own code.

Current tools and limitations

Wide array of IDEs and libraries may be used.
Lack of standardization makes it hard to integrate and use different tools and libraries with each other.

Integrative Cancer Biologists and Modelers

Background

MD, PhD, MPH/DrPH.
Broad knowledge of cancer biology
Analytically trained and familiar with statistical and modeling methods
Some knowledge of genomics/bioinformatics data and analysis methods.
Familiar with OO and other programming languages
Unfamiliar with NLP Concepts
Unfamiliar with NLP tools and resources
May not be familiar with text manipulation languages ( e.g. Python, Perl, Ruby)
May not be familiarity with DBMS and data management principles
Cares more about accuracy of results, confidence in results, summary information, pointers to WHY particular calls were made (chain of evidence) than the details of the implementation.
Limited or no ability to interpret clinical text reports

Premise/story

The user (or team) could tackle questions about the evolutionary history of tumors in response to treatment. Metastasis is usually the killer of cancer patients. Usually treatment resistance mechanisms appear and tumors become ever more refractory. Thus a combination of systems biology and tumor population dynamics modeling could address important treatment questions, if detailed longitudinal clinical data are coupled to tumor genomics.

Expectations

Ability to download or otherwise access data in a form suitable for processing.

Information needs

Demographic, clinico-pathologic and comorbidity data
Detailed treatment data
Detailed disease progression data, including censoring dates
Outcomes
Genomic and other bioinformatics data when available.

User Personae

Contents

Translational Scientist with “Dry Bench” Bioinformatics skills

Background

Premise/Story

Expectations

Clinical Translational Scientist

Background

Premise/story

Expectations

Information needs

Current tools and limitations

Population Health Scientist/Health Care Outcomes Analyst

Background

Premise/story

Expectations

Information needs

Current tools and limitations

Information Broker

Background

Expectations

Information needs

Information constraints

Current tools and limitations

Informatics Researcher

Background

Premise/story

Expectations

Information needs

Current tools and limitations

NLP Developers

Background

Premise/story

Expectations

Current tools and limitations

Domain Specific Application Developers

Background

Premise/story

Expectations

Current tools and limitations

Integrative Cancer Biologists and Modelers

Background

Premise/story

Expectations

Information needs

Navigation menu

Search