Beyond the Hype of Big Data in Education

Authors:: Rasil Warnakulasooriya and Adam Black
Published:: Tuesday, September 4, 2018
Columns:: Industry Insights

min read

Eliciting useful and reliable insights from educational data is part science, part art. With decades of experience in data analysis of educational technologies' performance, Macmillan Learning has developed a playbook of best practices to help you succeed in your own educational data analysis.

The promise of big data is the talk of the town these days. However, does big data in education mean big insights for educators and edtech developers? While there is potential for a wealth of illuminating and practical insights into teaching and learning that can be derived from data, that potential can only be realized by asking the right questions, creatively tackling noisy and incomplete data, and recognizing the limitations of any analysis. Simply put, gaining useful and reliable insights from educational data—big or small—is part science and part art.

Why should educators and technologists be cautious about the promise of big data? We have identified four main challenges to deriving insights from educational data analytics on a large scale that calls for measured enthusiasm. These challenges include high variability among educational settings leading to limitations in data in any given context; the reality of educational technologies only reflecting parts of the teaching and learning experience; variability of educational content in both type and effectiveness; and the complexity and variability of conceptual domains (here, the term conceptual domains means distinct disciplines—such as English composition, economics, or physics—as well as conceptual differences that may be present within a given discipline.) These limitations also pose significant challenges in utilizing artificial intelligence techniques in education.

Although data from educational settings are now generated in large volumes, at high velocity, and with greater variety, the above limitations largely result in drowning analysts and educators in a sea of information that may obscure actionable insights or not be relevant enough to address the most pertinent questions facing learners, teachers, and policy makers.

So how can we gain reliable insights from educational data to help advance teaching and learning? From decades of experience in analyzing data from educational technologies, we have developed a playbook of best practices for how to be successful with educational data mining. In summary, we recommend data analysts:

Explore data from multiple angles and mine data conditionally to find the most appropriate form of analysis that fits the research goal.
Visualize data in every stage of exploration as much as possible before conducting further analyses or drawing conclusions. Data visualization can help avoid reaching conclusions that are not warranted.
Avoid foregone conclusions: that is, avoid looking for data to support foregone conclusions. Instead, make conclusions that stand out from the data and within the assumptions and constraints of the analyses.
Understand how users engage with learning applications before conducting analyses. Avoid treating applications as black boxes and data as context-free. This will help draw the most appropriate conclusions from the data. User experience and educational impact research can supplement or complement the platform data available to analysts.
Do not reject outlier data without due consideration. Outliers can lead to most surprising insights.
Use point statistics cautiously, as the blind use of them can often mask the "signal" from noise. In other words, the median isn't the message.¹
Isolate and adjust for confounding factors within data as much as possible. Confounding factors pose significant challenges to reaching the most appropriate conclusions from an analysis. However, forming hypotheses about various confounding factors and analyzing data to check the effect of these factors will support an understanding of the influence of them on the observed data.
Safeguard against overfitting statistical models. One of the goals of data analytics is to generalize the findings from a specific data set into the future. Developing statistical models that explain the current specific data set extremely well (that is, the model overfits the data) will result in those models failing to explain future data well. Thus, statistical models should be developed with a fine balance between model fit and model generalizability.
Strive to understand predictive and machine learning algorithms and their performance in a wide variety of educational circumstances. Never treat these algorithms as black boxes with data as mere inputs and results as mere outputs. Analysts working in education have a responsibility to avoid biases, whether explicit or hidden, and to be careful about any predictions and recommendations they make.
Strive to obtain quality data from product applications because improving student and instructor outcomes depends on it. The quality of the data captured by a learning application can be improved when that data reflects user engagement with the application with sufficient granularity for the research questions at hand.
Align statistical thinking with scientific thinking. For example, look for repeated patterns in the data and for the educational and practical significance of the findings rather than relying on statistical significance tests alone.
Consider how learning analytics can contribute to foundational empirical research in teaching and learning instead of limiting its use to simply summarizing observations, testing hypotheses, or developing algorithms (predictive or otherwise).

In the paper Beyond the Hype of Big Data in Education [https://www.macmillanlearning.com/Catalog/uploadedFiles/Beyond-the-Hype-of-Big-Data-in-Education.pdf], we provide more details for these guidelines, illustrate how they can be applied to a variety of practical examples, and share surprising and useful insights that can be derived through learning analytics.

We are entering an exciting age where learning analytics has the potential to become a major contributor to the improvement of education, when exercised with rigor and awareness. It lies at the cutting edge of an interdisciplinary field that bridges research in teaching and learning, cognitive science, educational impact research, user-centered design, machine learning, and artificial intelligence.

Presciently, a decade and a half ago in the article "The Frontier of Web-Based Instruction," the authors observed the "question of how web-based education affects teaching and learning remains largely unanswered, and the terrain of online learning remains largely unmapped."²

We believe that to a large degree this still holds true, but exciting innovations in edtech (and the data they generate) are providing novel and increasingly rich opportunities to research the processes of learning and teaching, provided that a sound approach to data analysis is exercised with transparency on the limits of the conclusions we draw from the analyses. Only through such an approach can practical insights be derived to directly benefit educators in how they design and deliver their courses and to edtech developers to create more empathetic effective solutions for students, instructors, and others in education.

We look forward to the journey ahead.

Notes

Stephen Jay Gould, "The Median Isn't the Message," in Bully for Brontosaurus: Reflections in Natural History (New York: W.W. Norton & Company, 1991). ↩
Coral Mitchell, Tony DiPetta, and James Kerr, "The Frontier of Web-Based Instruction," Education and Information Technologies 6, no. 2 (2001): 105–121. ↩

Rasil Warnakulasooriya is Vice President of Analytics and Data at Macmillan Learning.

Adam Black is Chief Strategy and Learning Officer at Macmillan Learning.

ParentTopics:: Big Data