Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Cognitive models of student knowledge drive many of the instructional decisions that advanced educational systems make, including how to organize instructional messages, the sequence of topics, and problem selection in a curriculum. Developing adaptive educational content is expensive in both time and money. It requires the use of subject experts, cognitive scientists, and programmers, and the student models created often do not fit the data generated by students using the system. Data mining techniques can suggest improvements to these models which can improve the overall efficiency of student learning leading to a significant savings in time needed for students to learn skills. This research is enabled by DataShop, which is the world’s largest open data repository of transactional educational data collected from online learning courses, intelligent tutors, educational games, and simulations. The data is fine-grained, with student actions recorded roughly every 10 seconds, and it is longitudinal, spanning semester or yearlong courses. As of October 2012, almost 400 datasets are stored including over 90 million student actions which equates to over 200,000 student hours of data. Most student actions are “coded” meaning they are not only graded as correct or incorrect, but are categorized in terms of the hypothesized competencies or knowledge components needed to perform that action. DataShop allows researchers to import data in order to use the provided analysis tools, and to export data from the repository to perform additional analysis. Researchers have analyzed these data to better understand student cognitive and affective states and the results have been used to redesign instruction and demonstrably improve student learning.