Using Natural Language Processing to Make Content Recommendations and Identify Knowledge Gap

May 16, 2016 10:55 AM – 12:25 PM

Pedro Teixeira1, PhD, MS, Tao Le2, MD, MHS, Toufeeq Ahmed1, PhD, MS
1Vanderbilt School of Medicine, 2USMLE-Rx

Medical students are required to efficiently learn and apply a significant amount of basic and clinical science knowledge. We are developing innovative tools using natural language processing (NLP) and machine learning techniques to help individual medical learners and schools rapidly connect to relevant content and identify competency gaps.

We leveraged Wikipedia to build an extensive medical concept vocabulary (70,799 medical concepts) to enable NLP and machine learning tasks for filtering and content recommendation. We will demonstrate two different NLP systems to concept annotate and index content. The first system is based on open source Apache cTAKES using the Unified Medical Language System (UMLS). The second is a fast custom dictionary-based annotator utilizing Wikipedia-derived medical vocabulary to tag medical concepts. Both return a standard JSON output with the original content and the concepts found within the text along with each one’s position and a match value/relevance score. The resulting concepts extracted from our content are leveraged for rapid resource presentation or for tracking competency-based progress across topics. Learners are presented with direct links to high value concepts represented in Wikipedia or PubMed articles. We will discuss usage data, current application scenarios and future directions for development.