Development of deep learning algorithms to categorize free-text notes pertaining to diabetes: convolution neural networks achieve higher accuracy than support vector machines
Abstract
Health professionals can use natural language processing (NLP) technologies when reviewing electronic health records (EHR). Machine learning free-text classifiers can help them identify problems and make critical decisions. We aim to develop deep learning neural network algorithms that identify EHR progress notes pertaining to diabetes and validate the algorithms at two institutions. The data used are 2,000 EHR progress notes retrieved from patients with diabetes and all notes were annotated manually as diabetic or non-diabetic. Several deep learning classifiers were developed, and their performances were evaluated with the area under the ROC curve (AUC). The convolutional neural network (CNN) model with a separable convolution layer accurately identified diabetes-related notes in the Brigham and Womens Hospital testing set with the highest AUC of 0.975. Deep learning classifiers can be used to identify EHR progress notes pertaining to diabetes. In particular, the CNN-based classifier can achieve a higher AUC than an SVM-based classifier.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.