(Originally posted in a slightly altered form at e-Literate)
A year or so ago I mentioned on my blog Hapgood that there were some oddities in data from Purdue's Course Signals results, results that have formed some of the basis for the strong version of learning analytics optimism. At that time I proposed that a substantial portion of the observed effect of Course Signals might be coming from a specific type of statistical error.
When Purdue updated those results a few weeks ago, I pointed back to those issues (still unresolved), expecting the critique to die the slow death it had last year. But things had changed. Michael Feldstein at e-Literate took up the issue, and pushed for a response to it. More people got involved. At this point the issue has reached the pages of both Inside Higher Ed and Times Higher Education.
If you are considering any analytics software at your institution, you need to be familiar with this issue, not so much because of the issues it raises with Course Signals, but because it highlights the sorts of questions we are nor asking enough when presented with efficacy research and case studies.
So here's a quick summary to get you up to speed.
What is Course Signals? Why is it important?
Course Signals is a software product developed at Purdue University to increase student success through the use of analytics to alert faculty, students, and staff to potential problems. Through using a formula that takes into account a variety of predictors and current behaviors (e.g. previous GPA, attendance, running scores), Course Signals can help spot potential academic problems before traditional methods might. That formula labels student status in given course according to a green-yellow-red scheme that clearly indicates whether students are in danger of the dreaded DWIF (dropping out, withdrawing, getting an incomplete, or failing)
While the product is used to to improve in-class student performance, the product is most often discussed in in a larger frame, as a product that increases long-term student success. The product has won prestigious awards for its approach to retention, and the product is particularly important in the analytics field, as its reported ability to increase retention by 21% makes it one of the most effective interventions out there, and suggests that technological solutions to student success can significantly outperform more traditional measures.
What problems were found in the data supporting the retention effects?
Purdue had been claiming that taking classes using CS technology led to better retention. Several anomalies in the data led to the discovery that the experiment may suffer from a "reverse-causality" problem.
One such anomaly was an odd "dose-response" curve. With many effective interventions, as exposure to the intervention increases, the desired benefit increases as well. In the recent Purdue data, taking one Course Signal-enhanced course was shown to have a very slight negative benefit, while taking two had a very strong benefit.
The story became even more complex when older data was examined. Early in the program taking one CS-enhanced course had a very substantial impact on retention, nearly equal to taking two CS-enhanced classes. But as the program expanded over the years, taking one CS-enhanced class started to show no impact at all. This behavior is not consistent with Course Signals causing higher retention.
I hypothesized a simple model to explain this shift: rather than students taking more CS-courses retaining at a higher rate, what was really happening was that the students who dropped out mid-year were taking less CS classes because they were taking less classes period. In other words, the retention/CS link existed, but not in a meaningful way. Unlike the Purdue model where taking CS-enhanced courses caused retention, this "reverse-causality" model explained why as participation expanded taking one CS-enhanced course might move from being a strong predictor to having no predictive force at all.
Michael Feldstein picked up on this analysis, and prodded the Purdue team for a response. When no response came, Alfred Essa, head of R & D and Analytics at McGraw-Hill, took my "back-of-the-envelope" model, and built it out into a full-fledged simulation. The simulation confirmed the reverse-causality model explained the data anomalies very well, much better than Purdue's causal model. Purdue's response to the simulation did not address the serious issues raised.
Does this mean Course Signals does not work?
It depends. Purdue has yet to respond to the new information in any meaningful way, and until they either release revised estimates that control for this effect or release their data for third-party analysis, we don't know the full story. Additionally, there are some course level effects seen in early Signals testing that will be unaffected by the issue.
However, Purdue's recent response to Inside Higher Ed indicates that they did not control for the reverse-causality issue at all. If this is true, then the likelihood is that the retention impact of Course Signals will be positive, but significantly below the 21% they have been claiming.
But positive impact is good, right?
Not really. The great insight regarding educational interventions of the past decade or so is what we might term "Hattie's Law", after researcher John Hattie. Most educational interventions have some effect. Doing something is usually better than doing nothing. The question that administrators face is not which interventions "work", but which interventions "work better than average."
At a 21% impact on retention, Course Signals was clearly in the "better than average" category, and its unparalleled dominance in that area suggested that the formula and approach embraced by Course Signals formed the best possible path forward.
Halve that impact and everything changes. Peer coaching models such as InsideTrack have shown impact in the 10-15% range. Increased student aid has shown moderate impact, as has streamlined registration and course access initiatives.
Additionally, other analytics packages exist that have taken a different route than Course Signals. Up until now, they have lived in the shadow of Purdue's success. If CS impact is shown to be significantly reduced, it may be time to give those approaches a second look.
What is unaffected by the new analysis?
Until Purdue fixes and reruns their analysis, it it hard to know what the effects might be. However, there were a number of claims Purdue made that were not based on longitudinal analysis, and these should stand. For instance, students in Course Signals do tend to get more A's and less F's, and that data would be unaffected by this issue.
While that's good, it's not the major intent of at least some institutions interested in the system. What makes systems like this particularly attractive is their ability to pay for themselves over time by increasing retention.
There remains a question as to how a system that boosts grades could fail to boost retention. There are a couple potential hypotheses. First of all, it is quite possible that when the numbers are rerun there will still be a significant, though reduced, retention effect, and that reduced effect is still congruent with the better scores.
Alternately, it could be that students in Course Signals courses score highly in Course Signals-enhanced courses, but at the expense of other courses. My daughter's math teacher has a very strict policy on math homework which has whipped her into shape in that class, but this means she often delays studying for other things. Students with finite time resources can rearrange their time, but not always expand it.
Finally, for some nontrivial amount of students, retention problems are not due to grades. Not to push the reverse-causality logic too far, but for some students low grades could be a sign of financial or domestic difficulty; fixing the grade would not address the larger problem.
What are the larger cultural implications?
As Michael Feldstein has outlined in a different post, there are major cultural implications to this error, ones which partially indict the research analytics community's approach to research. To my knowledge, the study was never peer-reviewed outside of its inclusion in conference proceedings, but it is one of the most referenced studies in learning analytics.
One small issue is how the paper was accepted into the LAK conference given these errors. But that issue is miniscule compared to the larger questions raised about the broader edtech environment. No community should rely soley on peer-review when it comes to vetting research results, and the cultural implications of an error like this going undetected this long in a community of data analysts will be the subject of a future post. For the moment we are still waiting for Purdue to engage honestly with the critique, and re-run their numbers after controlling for this effect. Hopefully that will happen later this week.