Washington, DC—Researchers have customized a software suite to automatically determine the disease severity in patients with rheumatoid arthritis (RA) based on information from their electronic medical records (EMRs). This is a solid first step toward streamlining and individualizing diagnosis and treatment of this condition, according to the investigators.
“The system we created will be used by our rheumatologist collaborators to correlate drug response, as measured by change in disease activity, with genotyping studies for the purpose of individualized medicine,” said Chen Lin, MA, a member of the Natural Language Processing Lab of Boston Children’s Hospital Informatics Program, after presenting information on the project.
The work was funded by the National Institutes of Health through the Pharmacogenomics Research Network (PGRN) and also was published in PLoS One (Lin C, et al. 2013; 8(8):e69932). Mr Lin and colleagues, including Timothy Miller, PhD, Guergana Savova, PhD, and Brigham and Women’s Hospital rheumatologist Elizabeth Karlson, MD, used the Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) as the framework for their project. The open-source software uses natural language processing methods to extract the information from EMRs.
The challenge the team sought to address was that physicians often do not code disease activity in structured fields in patients’ EMRs, opting instead to write it in the clinical narrative. In addition, laboratory values are not measured at every visit, observed Mr Lin. That makes it difficult to produce an accurate automated report on each patient’s disease status.
To surmount these challenges, the researchers used “automatic feature development” to program cTAKES to filter out all non relevant information in the written clinical notes from the EMRs, thereby boosting the underlying algorithm’s accuracy and efficiency. They also assigned 1 code for all terms used to describe RA in the notes; as a result, ‘rheumatoid arthritis,’ ‘Rheumatoid Arthritis,’ ‘ARTHRITIS RHEUMATOID,’ and ‘RA’ all were assumed by the program to be RA.
Another innovative approach employed by the team was to “train” the program to use information from structured disease activity assessment 28 (DAS28) scores, based on C-reactive protein and/or erythrocyte sedimentation rate levels, to classify each patient’s disease activity as being either low/remission or moderate/high. To do this, they ran data through the program from 2792 notes in the EMRs of 852 patients who had participated in the Brigham and Women’s Hospital Rheumatoid Arthritis Sequential Study (Iannaccone CK, et al. Rheumatology. 2011;50:40-46).
After the resulting fine-tuning, the team next tested the program on a set of 1749 notes from another 821 patients’ visits to rheumatology clinics at Brigham and Women’s Hospital. This time, the disease-activity label of each note was derived from not only the DAS28 scores but also the patients’ assessment of activity and the treating physicians’ estimation of the number of swollen and tender joints. The resulting accuracy of the program’s determination of patients’ disease activity was 78.9% as measured by the F1 score.
In the third and final step of testing, the researchers had the program analyze information from 344 randomly selected notes that lacked laboratory values from another 344 RA patients’ EMRs. Three rheumatologists separately reviewed all of the notes and made determinations of patients’ disease activity. The overall accuracy of the algorithm was 83.1%. The largest proportion of the program’s disease activity misclassifications compared with the rheumatologists’ determinations were in the moderate and low disease activity categories, accounting for 20% and 62% of the total errors, respectively.