Valid arguments exist for students to control data about themselves, and similarly plausible arguments suggest that the institution can claim ownership. This article explores both perspectives. To avoid win-lose solutions, institutions acting as "information fiduciaries" can reap the benefits of analyzing student data while respecting student rights.
At the University of Wisconsin–Madison, Kyle M. L. Jones is a fifth-year doctoral candidate in the School of Library and Information Studies, John C. Thomson Jr. is an instructional technology consultant for Learning Technology and Distance Education, and Kim Arnold is an evaluation consultant and lead for the Learning Analytics Initiative.
Since its advent the Internet has fostered construction of a knowledge-based economy. From the time we wake to the moment we fall asleep, we interact with a staggering amount of information. By adopting more information technology into learning, work, and life practices, colleges and universities address the changing needs of society. Some observers posit a paradigm shift in education, with Moore's law coming to fruition in such a way that all aspects of the teaching and learning experience can be influenced by the data that speeds through our laptops, tablets, smartphones, and peripheral devices, as well as our sensor-enabled physical infrastructures.1
Like their counterparts in commerce, government, and research, colleges and universities have begun to mine the data available to them to create actionable information and new insights about their students with the goal of increasing institutional efficiencies, effectiveness, and success. Institutional actors ask various questions of their data, such as:
- What can we learn about student learning processes?
- What is the best path for students to earn their degree?
- What resources do students use?
- How do students use their social network to aid learning?
- How can we improve educational software to meet a diversity of student needs?
Institutions have the potential to harness data created within the mesh of their information systems and mine it for potentially powerful insights to drive the future of higher education. Yet, many serious issues regarding data use could harm institutional reputations if left unaddressed: privacy, surveillance, morality, ethics, and legal and expected rights regarding personally identifiable data. Wrapped up in these concerns is one of particular interest to us: data ownership. That is, who owns the data used in large-scale analytic processes on college campuses, especially when the data is personally identifiable at the student level and used to enact change in an individual student's life?
Extant work discusses data analytics and issues of privacy (for examples, see Richards and King2 and Tene and Polonetsky3), and some authors have contextualized their conversations in regard to higher education (for examples, see Johnson4 and Pardo and Siemens5), but technological progress and social enthusiasm for data analytics continues to outpace these concerns. Higher education must mind the gap: these problems are political, legal, and social land mines for institutions that continue to push forward without addressing them.
One such land mine is data ownership. The success of data analytics depends on institutions first accessing, curating, harvesting, and controlling rapidly multiplying sources of data. Lack of control over the data might compromise the integrity of data-driven initiatives, and innovation might wane. But valid arguments exist for students to control data about themselves, in essence granting them an ownership right. Similarly, plausible arguments suggest that the institution can claim such a right.
We explore both perspectives here to highlight the complexity of this data ownership problem and show the plausibility of each party's claim to the data. Our exploration offers few answers — each institution's goals, values, and norms will dictate how they address the question of data ownership, as will how their legal department interprets applicable law.
Our goal then is to jumpstart the conversation and provide talking points as institutions consider urgent concerns regarding data ownership. To that end, we encourage institutions to
- ask questions about how student data is defined,
- consider the context around student data creation, and
- think about who actually creates student data.
To begin this conversation, consider the following day in the life of a fictional undergraduate:
Amy begins her day by using her campus-issued student ID number and password to log in to the single sign-on (SSO) system for web-based campus services, so she can check her e-mail. Next, she navigates to the library's integrated library system (ILS) to renew some overdue books, to look at other online resources for her English paper, and to download a few articles from research databases (which track what she accesses and downloads). Before walking to her biology class, she grabs breakfast at the student union (using her student ID card to pay), connects to the student union's wireless network, and accesses the discussion board for her online composition course in the learning management system (LMS). When she walks into class, a radio frequency identification (RFID) reader records the frequency from the RFID tag in her student ID; she's automatically marked as present in the course. Later in the day, Amy makes an online appointment with campus health services, joins the upcoming intramural tennis league through the recreation department's self-hosted website, and updates her academic profile in the student information system (SIS) to look for new grants or financial aid packages.
Amy's day reveals the breadth and depth of data that can be captured about a student. The data itself can capture a great deal about her academic history and current progress, financial situation, personal demographics, and health. The metadata captures when a student interacts with a system and grabs geolocation information using a student's Internet protocol (IP) address and the location of a sensor, like that of the RFID reader and wireless hotspot. When these types and sources of data record behavioral actions, like what library resources she uses (or fails to use), with whom she communicates (or ignores), and where she goes on campus (or doesn't), campus information systems cast a wide net and gather a remarkable amount of identifiable data.
A Student's Point of View
If Amy knew about the personal information explicitly and implicitly available to the institution, she might become nervous about potential misuse. She might not have known of her institution's capability to capture all of that personally identifiable data, or to aggregate it into a central data warehouse, archiving it indefinitely. Moreover, she might have known that her campus deployed learning analytics applications to help her with her composition course, but not that the apps included sensitive racial, socio-economic, and historical data about her to predict her success and suggest resources for her use. She could claim that her privacy had been invaded and that the institution had taken their data-driven processes too far, regardless of their intentions. Because the data is about her and often created by her as she uses information systems, she might claim that she should own it and control how it can and should be used.
An Institution's Point of View
Information technology and the knowledge-based economy it supports drive organizational and business practices, including in higher education. Amy's data is not only key to her success as a student, it's key to institutional success. Without Amy's data and that of her peers, the institution can plausibly claim that it could not successfully provide services, resources, and programs, educationally oriented or otherwise. Ultimately, the institution could not compete against its peer institutions should Amy and others claim ownership of their data and restrict its access and use. Denying students an ownership right to data about them would cement institutional access to a growing breadth and depth of useful data now and in the future.
Whose Perspective Prevails?
Since both students and institutions have plausible arguments for ownership of student data, who should own it? We need to explore in more depth why students and institutions both have valid ownership claims. We begin by asking questions about what defines student data and what rights those definitions might determine.
Defining Student Data
Data created in enterprise-level technology is sometimes created by students and at other times created about students. It is sometimes provided by students and at other times it is automatically recorded by systems. Some data is purposefully created to be "student data"; other times it simply exists as "digital exhaust," a byproduct of information systems.6 Whether the data is created to be student data may influence who can claim ownership of it.
Amy created data about herself in providing it to information systems like the LMS and the SIS. Her instructors create data about her as they enter grades into the same LMS; the library does the same, tracking the books she has checked out. When Amy entered her biology classroom, the attendance system automatically noted her presence in its database; the Wi-Fi network might do the same, logging her IP address when she connected to the network.
Some of the data about Amy was purposefully created. She meant to enter information about herself into the SIS, and her instructors attached her grades to her username in the LMS. But, the data created by the Wi-Fi network is digital exhaust. Most institutions do not track individual students as they move from one access point to another, yet the data exists.
Looking at purposefully created student data, it's easy to see why students might feel a sense of ownership — although less so for digital exhaust data, which should not be ignored as less valuable, less revealing, or less intrusive. It might not matter what Amy does on the Wi-Fi network, but a map of when, where, and for how long she connected to a given hotspot (the metadata) can provide insight into her personal behaviors — data she might not want her institution (or anyone else) to have access to.
Statutory definitions of student records can also provide a framework for understanding student data. Whether personally identifiable data about a student becomes part of her education record is determined under the Family Educational Rights and Privacy Act (FERPA), the statute that protects student privacy and affords students certain rights in the United States. Section 99.3 defines an education record as "directly related to a student" and "maintained by an educational agency," such as a college or university. Six unique exceptions exempt some types of data or information from becoming part of a student's record, but a literal reading of §99.3 presumes that all personally identifiable data goes into a student's education record and thus is subject to student inspection, review, and requests for amendment should she believe the data gathered and used is inaccurate, misleading, or in violation of her privacy rights.
Our personal perspective aligns with the literal readings: if data is personally identifiable data and attributable to an individual student, it is a part of her record. Others might interpret FERPA's definition of an education record differently, a salient point recently made in a special report to the Obama administration.7 Whether "purposefully" created or simply data exhaust, if the data is personally identifiable, students have legal rights related to it and plausible grounds to claim ownership of it.
Student Data in Context
Since institutions will interpret FERPA differently, it's not the best benchmark — evaluating claims to ownership may be better informed by examining student data in context. Context helps us understand the roles, activities, norms, and values that give life to student data and bound its use in practice.8 We look to roles to determine the types of actors expected to come in contact with the data; we look to activities to explain how actors use the data; we look to norms to guide appropriate action (and determine conflicting action); and we identify values that explain to what ends the data may be used as a means. Context serves as an analytic tool to examine whether the creation of student data is bound by definable circumstances and expected outcomes.
While seemingly straightforward, context can be contested. Take, for example, an online course employing learning analytics. At a fundamental level, the course is simply a context for teaching and learning; as such, the instructor may claim ownership rights over the learning analytics–created student data and analyze it to improve her practices. Yet, the course is ultimately part of a department's curriculum; at the departmental level, the director may claim the data to improve programmatic outcomes. A higher education institution is an assemblage of various departments, so the university may assert its ownership rights over student data for multiple purposes.
On the other hand, consider ingress and egress data created by RFID-enabled student IDs. We might assign ownership rights in this case to facilities management or campus security, whose staffs analyze the systems that gather and use this kind of data for their practices. Similarly, student admissions applications are created for a particular purpose under specific circumstances, and the resulting data is clearly contextualized.
However, in the era of big data, student data may be reworked and recontextualized repeatedly. The data harvested from an online course may be contextualized as a sort of formative assessment as the instructor examines her curriculum and practice. Looking at the student data from the student's perspective, she might see it as nothing more than derivative dust created as she completes her assignments. Similarly, a diverse group of stakeholders might remix student data gathered and created by admissions departments: the student sees it as a necessary step for admittance to the chosen institution; the admissions office uses the data to help determine whether to invite the student to enroll; and student success professionals use the data to develop models that determine if a student needs special resources and programs to succeed.
Clearly, determining the context of creation is difficult. But, if we return to the core elements of a context (roles, activities, norms, and values) and treat them carefully, we can identify the central purposes for the data's existence, those who are expected to come in contact with it, and the services it is supposed to support before it is put to new analytical ends. This allows us to weigh ownership claims.
Let's return to Amy's story and use context as an analytic tool. By using her student ID to pay for her breakfast, the data created as Amy swiped her ID (cost, what she purchased, and associated metadata) was always intended to support the institution's administrative functions (e.g., successful cafeteria service) and not to improve Amy's diet. The institution should own this data.
Now, should the institution begin to use analytics to track what Amy eats and suggest a new dietary regime (a plausible program given some analytic aims), some key roles, activities, norms, and values around that data have changed. At this point, Amy could argue that her personally identifiable data is being used in ways she deems unacceptable. So, to restrict use of her student ID swipe card data, she claims ownership of it.
The Creator of Student Data: Student or Institution?
Perhaps the context from which data is derived for analytic purposes is too complicated, or the purposes for the data are so multifaceted that it would be nigh impossible to separate and prioritize particular values. If defining student data and contextual inquiry does not help clarify ownership claims — what next? Consider who created the student data.
We recognize some validity in the argument that the student creates the data. At times the student enters personal data into forms or actively provides it in other ways. Additionally, we might construe observational data about the student as she interacts with information systems as created by the student, for how would the data exist if not for her movements through these information spaces? Without the student, there is no student data.
The same could be said about the system, however: without the technologies to capture the student's inputs, there is no student data. For example, click-through data is not the same as reporting gender on a form. In essence, the system creates the data by providing a structured environment in which the student interacts, and then records those movements-as-data. The systems are installed, maintained, and controlled by the institution. The institution frames — by virtue of what it values — the data the systems collect, therefore it "creates" the data.
In the three views of student data ownership examined, none of them definitive, student data can be defined a number of ways, contexts will be contested, and who creates the data is open for debate. Nonetheless, each view provides different angles from which to argue data ownership.
Student Owner Justification
Student data gathering and use practices have existed in higher education since the emergence of popular computing technology in the 1970s. No longer is the data used simply to maintain system functions, however, or to create and maintain a student's academic record. Now, data about students can be used to create highly personalized learning environments, customized interventions, and predictions of their future success and failure — the data is personal.
Students have strong grounds for claiming ownership of their identifiable data, especially when it can be or already is used to influence their academic, professional, and personal lives. The following concerns may spur students' desire to own their personal data:
- Serious privacy concerns. No matter if an instructor thinks she has the right to mine the details of a student's record or an institution creates algorithms to predict academic success, the student may feel that these practices intrude into her private life — especially if the data comes from a map of her interactions with Wi-Fi hotspots and comings and goings from campus buildings.
- Inability to escape data collection. As they become more aware of the data gathered and its end purposes, students may change their behaviors: taking fewer risks and conforming to practices that the system rewards instead of those potentially leading to intellectual stimulation, growth, and engagement with their local community and society at large. The greater the veil of data increases, the less able the student may be able to express "meaningful" autonomy.9
- Mistrust about data use. Students might not trust the institution to use data about them appropriately. Public revelations of data breaches and intrusive surveillance have sensitized students to the potentially damaging effects of access to their personally identifiable data.10 As institutions of all types — commercial, governmental, academic, and otherwise — gain access to and create increasing amounts of data about them, gaining students' trust could become increasingly harder.
Institutional Justification for Data Ownership
The overarching purpose of higher education is to educate the next generation of a nation's citizens. In the United States, student data use has grown with increased access to it from information technology systems. As a result, we now see student data as central in teaching, learning, and research practices and as a means to successfully educate students. To deny the institution ownership rights over student data, and possibly access to the data, would affect the quality of educational services provided to students.
Data has also become a principal element in institutional accountability. Accreditation and reaccreditation practices often rely on reports based on student data. As costs in American higher education continue to rise, students, parents, administrative stakeholders, and politicians have ratcheted up pressures to prove an institution's effectiveness and efficiency by calling for and examining data often tied directly to students and their academic progress. If an institution were to lose access to student data, then these important concerns and questions might go unanswered or answered with partial data.
Merging Perspectives: A Fiduciary Responsibility
We argue that neither the students' nor the institution's perspective should win out. It seems optimal to merge their perspectives: for students and institutions to share the opportunities, obligations, and responsibilities of data ownership. A shared ownership model would support the institution's data needs, protect students' privacy, and inform individual students about personally identifiable data use on campus and what rights they have to it.
Any ownership framework that denies institutions access to data about students, especially related to their academic progress, fails to address current pressures and institutional obligations to report to stakeholders. If students exercising a data ownership right remove particular sources of data from the information stream, a college may be unable to effectively serve its students and legal and financial stakeholders. Clearly, a balance must respect students' wishes concerning privacy and the institution's needs. Jack Balkin's idea of an information fiduciary role seems especially relevant here.11
Information fiduciaries (F) are parties (groups, companies, organizations, institutions, etc.) entrusted by a beneficiary (B) to take care of B's personal information. F acts in the interest of B, is loyal to B, and supports B's interests above her own.
Where higher education institutions act as an information fiduciary and students are the beneficiary, colleges and universities may still use the personal information of students to, say, report to accreditation committees, since those reports ultimately and clearly support educational practices that redound to the student. But for data analytics that do not have clear benefits for the student or may harm the student, the institution would be misusing its fiduciary power.
Information fiduciaries must also account for how they use the beneficiary's information; this is especially pertinent with the mission creep of "academic data." For example, if an institution makes broad use claims over most personally identifiable data to "help inform the education" of a given student, the action is suspect. To respect its duties as information fiduciaries, institutions would need to disclose exactly how such data use benefits the student.
A student aware of how her information is used can maximize her autonomy and make informed choices about the data's use. Informed choices help protect privacy, and students should have as many choices as possible regarding the data obtained, its analysis, the purposes it serves, and who has access to it. If an institution lawfully claims ownership over student data, then students have little informed choice. FERPA affords colleges and universities nearly unfettered access to and use of student data, yet "FERPA is the floor, not the ceiling."12 So, institutions would do well as information fiduciaries to incorporate additional protections, rights, and opportunities for student choice about data use in their technologies, policies, and data governance practices.
Data analytics should be communicated. We know from personal teaching experience how surprised students are when made aware mid-semester of some of the tracking and predictive capabilities of learning analytics technologies in learning management systems. Yet, when we inform students in our syllabi and in dialogue at the beginning of the semester about the analytical tools and the instructional purposes to which data will be put, students express less concern and more willingness to participate.
Institutions should provide similar information and host conversations about their data analytics. We do not agree with those who posit that transparency will resolve concerns related to the use and analysis of personally identifiable student data. Transparency is often a one-way communication stream; we argue for conversation, participation, and a willingness to hear concerns. Dialogue helps explain the value of using personally identifiable student data and provides an opportunity for institutions to clearly describe the students' rights to it, especially when those rights involve opt-in or opt-out choices and data management opportunities (e.g., via data dashboards). Students could hear from whom and about whom the data is derived, and institutions could fulfill their fiduciary responsibilities to maintain students' trust. Equal, inclusive conversation may alleviate concerns and encourage students to participate in analytic practices, in effect making them feel a sense of ownership in data analytics on campus.
In this article we have provided talking points for institutions to use as they address the issue of student data ownership. As the information technology landscape becomes more complex on college campuses, and as student data arguably becomes an institution's most valuable resource, this "simple" question — who owns the data? — demands serious consideration. The idea of treating a university or college as an information fiduciary holds promise, and we encourage readers to interrogate this concept further as they develop ownership rights and policies at their institutions.
- George Anders, "Moore's Law Touches Education at Last – to Techies' Delight," Forbes, April 16, 2014.
- Neal M. Richards and Jonathan H. King, "Three Paradoxes of Big Data," Stanford Law Review Online, Vol. 66, No. 41 (September 3, 2013): 41–46.
- Omer Tene and Jules Polonetsky, "Big Data for All: Privacy and User Control in the Age of Analytics," Northwestern Journal of Technology and Intellectual Property, Vol. 11, No. 5 (2013): 239–273.
- Jeffrey Alan Johnson, "Ethics of Data Mining and Predictive Analytics in Higher Education," paper presented at the Association for Institutional Research Annual Forum, May 19–22, 2013, Long Beach, California; doi: 10.2139/ssrn.2156058.
- Abelardo Pardo and George Siemens, "Ethical and Privacy Principles for Learning Analytics," British Journal of Educational Technology, Vol. 45, No. 3 (May 2014): 438–450; doi: 10.1111/bjet.12152.
- See more about information and data exhaust in Stan Davis and Bill Davidson, 2020 Vision: Transform Your Business Today to Succeed in Tomorrow's Economy (New York, NY: Fireside, 1991); and Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think (Boston, MA: Mariner Books, 2014).
- President's Council of Advisors on Science and Technology, Big Data and Privacy: A Technological Perspective, Executive Office of the President, May 2014.
- Helen Nissenbaum, Privacy in Context: Technology, Policy, and the Integrity of Social Life (Stanford, CA: Stanford University Press, 2010).
- C. Edwin Baker, "Autonomy and Informational Privacy, or Gossip: The Central Meaning of the First Amendment," Social Philosophy and Policy Foundation, 2004.
- Charles Duhigg, "How Companies Learn Your Secrets," New York Times, February 16, 2012; Glenn Greenwald, "NSA Collecting Phone Records of Millions of Verizon Customers Daily," The Guardian, June 5, 2013; and Patrick Svitek and Nick Anderson, "University of Maryland Computer Security Breach Exposes 300,000 Records," Washington Post, February 19, 2014.
- Jack Balkin, "Information Fiduciaries in the Digital Age," Balkinization blog, March 5, 2014.
- Family Educational Rights and Privacy Act, 34 C.F.R. § 99 (2012).