Retention and Intention in Massive Open Online Courses: In Depth

Authors:: Daphne Koller, Andrew Ng, Tom Do and Zhenghao Chen
Published:: Monday, June 3, 2013
Collection:: Editors' Pick

min read

Retention in MOOCs should be considered in the context of learner intent, especially given the varied backgrounds and motivations of students who choose to enroll. When viewed in the appropriate context, the apparently low retention in MOOCs is often reasonable.

article artwork

The MOOC "Retention Problem"

In 2012, the typical Coursera massive open online course (MOOC) enrolled between 40,000 and 60,000 students, of whom 50 to 60 percent returned for the first lecture. In classes with required programming or peer-graded assignments, around 15 to 20 percent of lecture-watchers submitted an assignment for grading. Of this group, approximately 45 percent successfully completed the course and earned a Statement of Accomplishment. In total, roughly 5 percent of students who signed up for a Coursera MOOC earned a credential signifying official completion of the course.

For educators used to thinking about student attrition in a traditional university setting, the "retention funnel" in a MOOC can cause considerable alarm. To a university professor accustomed to the traditional audience of committed, paying students in a brick-and-mortar classroom, the image of continuously-emptying lecture halls — where only one in every 20 students remains to the end — is an understandably frightening prospect. But is this really the appropriate framework for thinking about student success in MOOCs?

Proponents of MOOCs often point to various compensatory factors in favor of online courses. These factors range from financial considerations, such as the rising cost of higher education and the low marginal cost of repeated offerings of MOOCs, to scale considerations: graduating even 5 percent of 100,000 students in a MOOC provides many instructors with substantially greater reach than an entire lifetime of teaching in a conventional classroom. While valid, these perspectives still do not directly address the concerns regarding low completion rates in MOOCs and their implications for the viability of high-quality online education.

In this article, we examine the issue of retention in online courses. We argue that retention in MOOCs should be considered carefully in the context of learner intent, especially given the varied backgrounds and motivations of students who choose to enroll. When viewed in the appropriate context, retention in MOOCs is often quite reasonable. Moreover, it helps highlight and understand the value obtained from MOOCs by the "non-completing" population, which can help us provide them, as well as the "completers," with the learning experience best suited to their needs.

Understanding Learner Intent

The vast majority of students who enroll in traditional university classes enter with the explicit intent of earning a credential. When students do not receive a credential, either the student, the system, or some combination of both has failed. MOOCs, however, cater to a substantially more diverse audience. Some students enroll on a whim, to see what a course is about, to figure out whether a particular topic might be worth pursuing, or out of curiosity regarding online education in general. Other students sign up for a handful of classes with the idea of shopping around to find a good fit. Yet other students enroll in a MOOC in much the same way that one might "bookmark" an interesting web page for future reference. The typical Coursera student enrolls in four courses on average; roughly 40 percent of all students have at least two courses running simultaneously. Furthermore, most Coursera classes involve a substantial time commitment, with estimated course workloads usually ranging from 5 to 15 hours of work per week. Since there is no financial cost or barrier to entry, there is little reason to believe that even a majority of the students who enroll in a MOOC intend to complete the class.

Observing how students participate in online classes can reveal student intent. While some students engage with course content in ways that defy grouping, the majority exhibit behaviors that fall into clear categories, reflecting differences in learner motivation and intention. The most obvious distinction is the separation between "browsers" and "committed learners." Some browsers often sign up for a class during a burst of interest, but never show up for the first class; others browse for a week or two before disengaging.

Committed learners, who tend to stay engaged throughout most or all of a class, can be divided into at least three partially overlapping groups: passive participants, active participants, and community contributors:

Passive participants engage with a MOOC predominantly through watching lecture videos, have limited participation on course forums, and typically attempt few assignments and quizzes (but may interact with in-video questions as needed to progress through the video content).
Active participants engage in course content by completing homework assignments, quizzes, exams, and time-intensive programming or peer-graded assessments; they include the subset of "course completers" who do all the work necessary to earn a Statement of Accomplishment.
Community contributors also actively participate in courses, but their specific means of interaction is through generation of new content, such as engaging in forum discussions or contributing foreign language subtitles.

As evidenced by their varied behaviors, these three groups of students clearly have different goals for their MOOC experiences (see Figure 1). For example, passive participants typically have little need for the external validation provided by earning a Statement of Accomplishment in order to derive value from a MOOC. But even within certain groups, such as the active participants, different subgroups may have different behaviors; for instance, although course completers tend to earn Statements of Accomplishment, we have also observed the existence of "low-intensity" active participants who reduce their own course workload, for example by choosing to attempt quizzes and homework but not longer, in-depth assignments (see Figure 1b). These individuals are self-motivated learners and rely on quizzes and homework as formative assessments, independent of earning a credential.

Figure 1. Density maps illustrating patterns of activity among students on Coursera, aggregated across 86 Coursera classes.

The plots in figure 1 show the relationships between (a) lecture watching and assignment completion, (b) lecture watching and quiz taking, (c) quiz taking and assignment completion, and (d) forum activity and assignment completion. The axes in the various plots refer to the proportion of lectures, assessments, or quizzes completed per student in a course; for forum behavior, activity is measured using a normalized version of the "h-index" based on forum voting. In all plots, colors indicate the number of students with a particular activity profile on a logarithmic scale, after binning each axis into deciles. In (a) and (b), the red/yellow bands on the left correspond to students who watch some or all of the lectures but do not attempt assignments or quizzes. The brighter red cluster at the top left represents passive participants. The faint diagonal bands from the bottom left to the top right indicate students who start the class and drop off in the middle. And the cluster at the top right represents active participants — students who watched all or most of the lectures and did almost all of the quizzes/assignments; it is only this last cluster that might correspond to course completers. The faint green vertical stripes in (a) indicate individuals who only attempt a subset of the assignments in the class, but may go on to watch some or all of the lecture videos. In (c), the top-right and top-left clusters represent course completers and low-intensity active participants, respectively. In (d), the community contributors are mostly distributed along the right edge of the density map (representing individuals who finished the course), but some fairly active contributors, such as those along the left edge, only attempt a subset of the assignments.

Defining Retention in the Context of Intent

In most discussions of MOOCs to date, student retention (otherwise known as completion rate) is commonly defined as the fraction of individuals of those who initially enroll who successfully finish a course to the standards specified by the instructor. Completion rates provide a convenient metric for comparing across a broad range of MOOCs. Despite their simplicity, however, completion rates interpreted at face value sometimes give misleading views of the health of an online course because they fail to capture the diversity of goals and engagement patterns that students may have in a MOOC.

For retention metrics to be useful, they must be defined and interpreted with the learner's goals in mind. Passive lecture watchers, for example, may go through an entire course without ever touching an assessment, yet often derive substantial value from a MOOC without contributing to completion-based notions of retention. Lectures in typical MOOCs differ substantially from lectures delivered in the standard lecture hall setting in that students are forced to interact continuously with the material in the form of frequent in-video quizzes. Furthermore, the online format allows professors to pack more material into a 10-minute lecture segment knowing that students will have the option of re-watching videos multiple times. Across a range of Coursera classes, students watch a typical lecture video an average of 1.7 times, with one in every 10 students watching video more than 2.7 times on average. Given that the proportion of students who watch at least 90 percent of the available lecture videos is around twice the number of students who earn a Statement of Accomplishment, this important subpopulation cannot be ignored.

Of course, even within the group of passive lecture watchers, students may vary considerably in their level of commitment to a class. For example, figure 2 depicts the decline in lecture video watching over the duration of The University of Pennsylvania's "Modern and Contemporary American Poetry" class, taught by Professor Al Filreis. If attrition in lecture watching were primarily due to random life events, such as an unexpected business trip or a crisis at work, then the proportion of students watching a certain number of lecture video hours or more would roughly follow an exponential distribution, where students randomly leave the course at a fixed rate.

Figure 2. Lecture-based retention from "Modern and Contemporary American Poetry" (September 2012).

The jagged red line in figure 2 denotes the observed drop-off in lecture watching over the duration of the course (for each x-coordinate, the y-coordinate denotes the number of students who watched at least x hours of lecture video). The blue and green lines show the fits based on either a single exponential distribution or a two-component mixture of exponential distributions. The dotted-green line depicts the high retention group of students from the mixture model. The high quality of the two-component mixture model fit compared to the exponential distribution is characteristic of nearly all Coursera classes.

In practice, we have found that exponential distributions fit poorly with the observed data. In contrast, in essentially all Coursera classes a two-component mixture of exponential distributions — in which students are hypothesized to have been randomly drawn from either a population of high-retention students or a population of low-retention students — appears to model actual lecture watching drop-off very well. When comparing across 40 Coursera classes, the fraction of students inferred to have come from each population varies, and retention rates in the low-retention population also vary to some degree. But among the students in the high-retention group, retention rates are quite consistent across classes, with the median class achieving a retention rate of 92 percent per hour of lecture video (see figure 3).

Figure 3. Variation of two-component mixture model parameters across a collection of 40 Coursera classes.

For each class in figure 3, three model parameters are estimated: the hourly retention rate in the high retention group, the hourly retention rate in the low retention group, and the relative proportion of individuals in each group. In (a) and (b), each point represents a single class. Students in the high-retention group in most classes have hourly retention rates between 90 and 95 percent. Colors indicate classes that contain programming assignment, peer-grading assessments, or quizzes only.

There are also significant variations in the retention rates across different types of courses. For example, figure 3 shows that classes with programming assignments have significantly higher retention rates (p < 0.01) in the highly committed population than do classes with other types of assignment structure. A similar (but marginally nonsignificant, p = 0.077) effect is seen when restricting analysis to comparisons among computer-science classes with and without programming assignments. Furthermore, classes with programming assignments have a significantly higher estimated proportion of highly committed students than courses without programming assignments, an observation which still holds when restricting analysis to computer-science classes only (p < 0.01, p = 0.031, respectively). On the other hand, classes with peer grading have a broader spectrum of hourly retention rates in both subpopulations, and these retention rates tend to be correlated among the two populations, suggesting greater variability among peer-grading classes in their effectiveness at engaging students in an online platform.

Given that the typical Coursera course has roughly 8 to 9 hours of lecture video per month, the hourly lecture retention rates in the high-retention group translate to real-world monthly lecture retention rates of around 40 to 50 percent. (In fact, in the case of "Modern and Contemporary American Poetry," an hourly lecture retention rate of 93 percent among highly committed students translates to a monthly lecture retention rate of around 50 to 55 percent.) By comparison, for many mobile applications, monthly retention rates of 40 to 60 percent are common. In this sense, it is heartening and even surprising to see that MOOCs can attain such high lecture-retention rates, considering the substantial amount of attention and effort needed to keep up with an online course compared to an online social game (47 percent monthly retention) or social networking application (53 percent monthly retention). As a more direct comparison, only 55 percent of individuals maintain their New Year's resolutions over the first month, and resolutions, like online classes, can vary in the amount of effort and commitment required.

Because of the variability in student intent, it is important to study the completion rates among those students who actually begin the course intending to complete it. In general, students do not declare their intent from the beginning, making this rate difficult to estimate. To determine student intent, Stanford Professor Kristin Sainani asked students in her "Writing in the Sciences" course to fill out a pre-course survey about their planned amount of effort for the course. Of the students surveyed, roughly 63 percent of respondents planned to do all the work necessary to earn a Statement of Accomplishment. Considering that only around one-third of students registered in the class filled out the pre-course survey and that students completing the survey were likely to be the ones engaged in the class, clearly not all students entering the course were highly invested in finishing. But among students who intended to finish, roughly 24 percent successfully completed the course, compared to fewer than 2 percent in the remaining population of registered students.

An even more compelling indicator of intent can be found in Coursera's recently announced Signature Track. This recently developed optional program, available in selected MOOCs, provides students with a way to earn a more official credential for their accomplishments by participating in keystroke biometric and photo-based identity verification. Students enroll in Signature Track early in the course (week 2 or 3), pay a fee ($30–$100) for the identity-verification services, and earn an identity-verified, university-branded credential if they pass the course. Signing up for Signature Track is a clear statement by students that they intend to complete the course and earn a credential. In the first Signature Track class, "Nutrition for Health Promotion and Disease Prevention" taught by Professor Katie Ferraro from the University of California, San Francisco, the completion rate among paying Signature Track students was 74 percent compared to 9 percent in the non-Signature Track population (figure 4). Moreover, among students who indicated a strong intent to finish in a survey administered one month into the course, after the Signature Track signup deadline, completion rates were higher in the paying group (96 percent vs. 84 percent, p = 0.0009), suggesting that having a financial stake may provide an additional incentive to finish.

Figure 4. Completion rates of "Nutrition for Health Promotion and Disease Prevention" (January 2013).

In figure 4 highly committed refers to students who indicated a high level of commitment to finishing the course and intended to watch all lectures and complete all assignments. Paying Signature Track students (in blue) show a markedly higher hourly completion rate compared to non-Signature Track students (in red).

The Relevance of Retention in MOOCs

We have argued that discussions of retention within MOOCs must always be considered in the context of student intent in order to have real meaning. At an even higher level, however, one might ask the question of whether retention is even the right metric by which to measure success in a MOOC.

For students whose explicit goal is to earn a credential in a MOOC, retention-based metrics probably provide a reasonable proxy for course success, which is capped to some extent by the natural rate at which students disengage from any online activity due to unplanned life events, shifting personal priorities, and the low friction involved in stepping out of an online course. While we believe that online retention rates in MOOCs among students committed to completing the course are quite high, we also believe in the importance of striving always to improve retention among highly committed learners, whether by exploring new and more engaging forms of pedagogy, taking advantage of social networking to motivate students to maintain interest in lifelong learning, or adapting existing MOOCs to better fit the needs of the working adult online learner.

At the same time, in discussing MOOC retention rates, it is important to keep in mind the bigger picture. In traditional college courses, lack of retention is a serious problem. Currently, almost half of the students who begin college at a two- or four-year brick-and-mortar institutions fail to earn a degree within six years, according to a report by the National Commission on Higher Education Attainment (via the New York Times). In some colleges, the completion rate is close to 15 percent. If the average cost for undergraduate tuition, room, and board is estimated to be $13,600 at public institutions, $36,300 at private not-for-profit institutions, and $23,500 at private for-profit institutions, students who drop out have invested a considerable amount of money with little return. In many of these cases, there are also significant costs to taxpayers, whether via state support for public institutions or via financial aid. Non-completing students face reduced chances of getting a good job and hence considerably lower chances of repaying their loans.

By contrast, students who enroll for free in a MOOC and do not complete the course incur zero financial cost to themselves and taxpayers. Some critics correctly mention the cost in terms of student time; however, given the amount of time that people spend on activities such as watching television, "wasting" time on education, even by non-completing students, seems inoffensive. Indeed, one can relate the act of enrolling in a free online class to that of checking out a book from a public library: it would be absurd to measure the book's success strictly by the proportion of individuals who read its contents cover-to-cover within the standard loan period. Some people might read a few chapters of a nonfiction book and stop after getting enough information to suit their needs. Others might read more deliberately and renew the book a few times before finishing. In both cases, few would consider the lack of completion or the extra time taken to be a waste or a failure of the book.

The ease of non-completion in online MOOCs can be viewed as an opportunity for risk-free exploration. Think, for example, of a high school student trying to decide where to go to college. The student can explore different topics that she finds intriguing and pick the ones that are a good match for her interests and skills. At the same time, a student can try courses at different levels of difficulty, perhaps finding that she is capable of more than she thought, and try for a more selective school than she originally intended. Thus, MOOCs may help alleviate the problem of college under-matching — addressed in Crossing the Finish Line: Completing College at America's Public Universities — where students attend less-selective colleges than their skills would allow, an issue that particularly impacts the success of minority and first-generation college students. Interventions aimed at increasing retention by reducing the freedom of students to try out multiple classes ultimately do little to improve the true social impact of an online course, despite the fact that they may lead to increased retention rates by artificially selecting only the most dedicated students.

Ultimately, though, it is important to recognize that retention is only one of many factors underlying success in MOOCs and, arguably, far from the most important for many students. The goal of education is to provide students with the skills they need to achieve their own life goals, not to retain individuals in a classroom. Given the broad range of motivations in the population of students who participate in MOOCs, the true challenge of online education will be to identify what students want to get from their virtual classroom experience and help them achieve those goals.

ParentTopics:: Massive Open Online Course (MOOC) Online Learning Student Retention