Data Sets
Specific Datasets require separate Data Use Agreements in addition to the Membership Agreement. For 2017 Membership Year, these datasets are ShARe (requires a Data Use Agreement with MIMIC/Physionet initiative) and THYME (requires a Data Use Agreement with Mayo Clinic). The Data Use Agreements are required to obtain the text files; obtaining the stand alone gold annotations does not require Data Use Agreements. The Center staff will guide each member candidate through the Data Use Agreement process although relinquishes itself from guarantees of the outcome.
When using a data set, please cite the associated papers and acknowledge the hNLP Center.
2017
2017 Data set | Annotation formats | Documentation |
---|---|---|
CCHMC ICD-9 radiology corpus | Text | A Shared Task Involving Multi-label Classification of Clinical Free Text |
ShARe disorders corpus | Knowtator Pipe-Delimited |
SemEval-2015 Task 14: Analysis of Clinical Text |
THYME corpus | Anafora | (1) Temporal Annotation in the Clinical Domain (2) SemEval-2016 Task 12: Clinical TempEval |