Policies and Practices: How to Improve Data Classification in Higher Education

min read

As colleges and universities collect and use more data for more purposes, understanding how to organize and safeguard it becomes critically important.

Triangle divided into 4 levels. Each level is a different color.
Credit: nkrumah frederick / Shutterstock.com © 2025

Data classification is a critical element of many activities at colleges and universities. It helps guide institutions in handling data appropriately, including data governance and access decisions, cybersecurity risk assessment, research data management, endpoint security management, remote work approvals, records retention, business risk management, and compliance with various data regulations (e.g., FERPA, HIPAA). But are colleges and universities using this structure effectively? How are we training those applying data classification? What is the current state of the art for data classification policy and implementation?

At the University of Wisconsin–Madison, we set out to understand how our policy and practices aligned, where there was confusion, and how to better implement data classification for our campus. Our mixed methods approach involved a campuswide survey (536 successful responses); a gap analysis comparing our existing campus data classification policy to the policies at peer institutions; and an environmental scan of external practices in the academic, government, and commercial sectors (e.g., Google, Microsoft, Amazon). The process we followed for evaluating and modernizing data classification policy, standards, and practices can serve as a model for other institutions examining their data classification landscape.

How Do Colleges and Universities Classify Data?

Colleges and universities take a range of approaches to data classification, including 3-, 4-, and 5-tier levels of classification for data, as shown in table 1. UW–Madison's Data Classification Policy (UW-504) is a typical 4-tier schema, with levels of Restricted, Sensitive, Internal, and Public.

Table 1. Data Classification Policies at Several U.S. Institutions
Type Institution and Policy Link Classification Levels
3 Tiers

Northern Kentucky University (Data Governance)

Public/Private/Confidential

Purdue University (Data Classification and Handling Procedures)

Public/Sensitive/Restricted

Stanford University (Risk Classifications)

Low/Moderate/High (non-PHI, PHI)

University of Colorado (Data Classification)

Public/Confidential/Highly Confidential

University of Delaware
(University Information Classifications)

Low/Moderate/High

University of Minnesota
(Data Security Classification)

Public/Private-Restricted/
Private-Highly Restricted

University of Nebraska (Risk Classification and Minimum Security Standards)

Low Risk/Medium Risk/High Risk

University of Texas at Austin
(Data Classification Standard)

Published/Controlled/Confidential

Weill Cornell Medicine
(Data Classification)

Low/Moderate/High

Yale University (Data Classification Procedure)

Low/Moderate/High

Indiana University (Data Classifications)

Public/University-Internal/Restricted/
Critical

4 Tiers

The Ohio State University (Institutional Data Policy Calculator)

S1: Public/S2: Internal/S3: Private/
S4: Restricted

Georgia Institute of Technology
(Data Security Classification Handbook)

Public Use/Internal Use/Sensitive/Highly Sensitive

Northwestern University
(Data Classification Policy)

Level 1/Level 2/Level 3/Level 4

Penn State
(Information Classification Decision Tool)

Low/Moderate/High/Restricted

University of California
(Classification of Information and IT Resources)

P1 Minimal/P2 Low/P3 Moderate/P4 High

University of Illinois Chicago (Data Classifications)

Public/Internal/Sensitive/High Risk

University of Iowa
(Data Classification Guidelines)

Public/University-Internal/Restricted/
Critical

University of Michigan
(U-M Data Classification Levels)

Low/Moderate/High/Restricted

University of Maryland
(Data Classification Standard)

Low/Moderate/High/Restricted
(data with compliance burdens, PCI/HIPAA/etc.)

University of South Carolina (Data and Information Governance Policy)

Public Information/Internal Use/Confidential/Restricted

University of Wisconsin–Madison (Data Classification)

Public/Internal/Sensitive/Restricted

5 Tiers

Harvard University (Data Classification)

L1, L2, L3, L4, L5


Many academic institutions apply their data classification schemas in service of a range of institutional functions. At UW–Madison, for example, we use data classification in the following ways:

  • Data governance: UW–Madison's Institutional Data Governance Program, coordinated by a unit reporting to the provost, gives authority for data in more than a dozen institutional data domains (e.g., finance, human resources, teaching and learning) to designated Data Trustees and Data Stewards. Data Stewards are responsible for applying data classification labels for data in their domain, per UW–Madison's Institutional Data Policy (UW-523).
  • Cybersecurity risk assessment: Under UW–Madison's Cybersecurity Risk Management Policy (UW-503), the Office of Cybersecurity (OC), based in the Division of Information Technology (DoIT), performs risk reviews as part of "the mandatory process for managing the cybersecurity risk associated with all information systems of any kind that store or process data used to accomplish university research, teaching and learning, or administration." Data classification is one of several inputs OC uses to determine asset risk level.
  • Data security management: The classification of data may apply to all aspects of data security management, including collection, storage, and retention. For example, data classification is one of the inputs UW–Madison's Data Storage Finder Tool (adapted from Cornell's Finder Module available via GitHub) considers in recommending storage platforms. In another example, data classification has figured into discussions about possible configurations of Smart Access, UW–Madison's proposed version of Zero Trust Network Access.
  • Cloud data storage: As part of DoIT, UW–Madison's Cloud Team manages data storage platforms such as Amazon Web Services, Google Cloud, and Microsoft Azure. Responsible and compliant management of these platforms requires classification of the data they hold. Data classification is often driven by self-assessment. However, classification may also be driven by institutional partners or the cloud providers themselves.
  • Research data management: Data classification is a component of UW–Madison's research Data Stewardship, Access, and Retention Policy (UW-4032), helping guide storage, management, and access decisions. As a large research institution, UW–Madison manages petabytes of research data, and the responsibilities for stewardship rest mainly on the principal investigators and college deans.
  • System-level policies: Public institutions that fall within statewide higher education systems may be subject to system-level data classification policies. For instance, when the University of Wisconsin System Administration adopted a data classification policy (UW System Administrative Policy 1031), it conflicted with the one already in use at UW–Madison, requiring development of crosswalks and other new guidance documents.

How Does Policy Align with Practice?

With such varied business uses for data classification, it's important to consider how policy does or does not align with practice. Any gap in accurately classifying data potentially hinders the ability to effectively protect data, creates risks to the institution, and could damage the trust of individuals associated with institutional data. Our investigation shows that a nuanced, up-to-date approach to data classification is needed, enhanced with training, tools, and awareness that may not exist today.

For a fuller picture of current understanding and use of data classifications at UW–Madison, we conducted an online survey of faculty and staff and held selective follow-up interviews. More than 500 respondents, representing 124 UW–Madison departments and units, told us about their familiarity with, confidence in using and interpreting, and understanding of resources related to data classification policy. The results confirmed a need for clarification—in terms of examples and authority—to enable policy use and shared understanding.

Although a wide majority (86%) of faculty and staff who work in areas that handle data regulated by HIPAA and FERPA expressed confidence in their ability to classify their data in those areas, 33% of all survey respondents were not familiar with data classifications at all. Others reported pervasive uncertainty about who to turn to with data classification questions. For example, survey respondents identified 16 different potential sources of information about data classification, including these:

  • An internet / knowledge base search
  • A supervisor
  • Department IT
  • Institutional data stewards
  • Several central offices, including institutional research, research compliance (e.g., IRB, HIPAA, Honest Broker), cybersecurity, and central IT

Many of the survey respondents who volunteered to participate in follow-up interviews said they are not sure who is responsible for data classification or where to go for answers to data classification questions. The broad themes raised include the following:

  • What is the purpose of data classification policy? How does it help faculty and staff use data in their roles?
  • Who has responsibility for classifying data? For research data, is this the role of the principal investigator (PI)? Do PIs have the training and knowledge they need to fill a role traditionally filled by data stewards? If not, who can answer PIs' questions about classifying research data?
  • How can employees find documentation of institutional data classifications assigned by Data Stewards in HR, finance, and other domains?
  • How can staff get timely help from a Data Steward when a data classification is needed?

Finally, participants said that data classification policy is not specific enough to help them do their jobs. Without additional guidance that uses terminology and examples familiar to them, UW–Madison faculty and staff are not sure how to apply policy to the day-to-day activities they perform in their roles and areas.

What Can Higher Education Learn from Other Sectors?

Looking beyond academia, several guidelines from the National Institute of Standards and Technology (NIST) are frequently used and referenced by the commercial and government sectors:

  • "Standards for Security Categorization of Federal Information and Information Systems" (i.e., Low, Moderate, High) (FIPS Publication 199)
  • "Risk Management Framework for Information Systems and Organizations: A System Life Cycle Approach for Security and Privacy" (SP 800-37)
  • "Protecting Controlled Unclassified Information in Nonfederal Systems and Organizations" (SP 800-171)
  • "Zero Trust Architecture" (SP 800-207)

Companies that manage large amounts of data typically recommend three levels of classification (see table 2). In addition, in its policy document, AWS notes that for "more complex data environments or varied data types," supplemental tagging or labeling techniques can be "helpful without adding complexity with more tiers." For example, an organization that handles student data might assign a "moderate" risk to such information and then attach such supplemental tags as "FERPA," "training," or "personally identifiable information" to those data records. The storage provider Box allows up to 25 total classification labels. That total is consistent with Microsoft's recommendation of "no more than five top-level parent labels, each with five sub-labels (25 total) to keep the user interface (UI) manageable."

Table 2. Data Classification Capabilities by Select Cloud Storage and Security Providers
Company Policy Document Recommended
Number of Levels
Additional Information
Google

Sensitivity and data risk levels

3 (Low, Moderate, High)

N/A

Amazon

Data classification models and schemes

3

Labeling for added complexity

Microsoft

Data classification & sensitivity label taxonomy

3–5

Up to five sub-labels for each parent label

Box

Classification Labels

N/A

Limit of 25 labels

Proofpoint

Types of Data Classification

4 (Public, Internal-Only, Confidential, Restricted)

N/A

Finally, a new NIST Internal Report (IR) 8496: Data Classification Concepts and Considerations for Improving Data Protection argues that "[o]rganizations should define their data classification policies in such a way that all affected parties, including external parties who share or receive data assets, have a common understanding of them." The report suggests that implementing automated data classification labeling techniques may be useful. For example, the report points out that "classifying a data asset only as 'sensitive data' typically does not provide enough information to identify all the data protection requirements for that data asset, since many types of data are considered sensitive. Classifying a data asset as 'PHI' instead of 'sensitive data' enables more fine-grained protection policies, such as preventing certain types of PHI from being sent to certain business partners."

Conclusions: How Can Higher Ed Improve Data Classification Usage?

As one survey respondent said, "People do not organize their lives around data classifications." We've realized throughout this discovery effort that our data classification policy and practices deserve a fresh look. Modernization in this space may be crucial to reducing the ongoing risks related to data management. The survey revealed some ways we might improve this space:

  1. Focus on roll-out and refresh: Simply publishing a policy is not enough. To fully implement data classification, a college or university must provide clear, consistent, and timely guidance for how to apply data classification and correctly tie together various uses of data classification in a consistent way.
    1. Provide examples: Offer use-case scenarios that apply to different units and types of work, different systems, and different responsibilities to help staff apply the policy.
    2. Remove barriers and simplify decision-making: Build awareness of data classification into campus processes and infrastructure operations. Improve awareness of the policy across campus by integrating data classification into systems where possible and developing strategic communications plans.
    3. Offer training: Incorporate data classification concepts into existing trainings, including employee onboarding or cybersecurity awareness, to help tie the policy to day-to-day work.
  2. Identify points of contact: Combinations of data (e.g., parking transactions and employee record data) can blur the boundaries between classifications. Although a user might easily classify data that contains a single data element in isolation, classifying data that contains that element in combination with one or more other elements can be more difficult. Provide clear pathways for asking questions to the data experts on campus.
  3. Allow for nuance: Classifiers, tags, or labels used in combination with a data classification schema can help users understand data types and other regulations to provide a fuller understanding of specific data.
  4. Address differences among data types: Your institution may govern various types of data differently. Be clear about how your data classification applies in different situations. Administrative data created by institutional business functions may need to be handled differently from data generated from research activities. Both may need to be handled differently from data collected in health clinics and other HIPAA-regulated units.

Developing a better understanding of the issues within the institution is a first step toward addressing them in a way that not only overcomes current challenges but also lays the groundwork for the development and successful implementation of a new data classification policy.

Acknowledgments

The authors are deeply grateful to colleagues at the University of Wisconsin–Madison who served on the campus working group for this research and provided invaluable insights and contributions for this article: Cameron Cook, Nathaniel DeLano, Amy Diestler, Sarah Grimm, Phil Hull, Steffanie Johnson, Dharvesh Naraine, Jack Talaska, Sara Tate-Pederson, Dan Voeks, and Sue Weier.


Lisa Johnston is Director, Data Governance, at the University of Wisconsin–Madison.

Heather Johnston is IT Policy Writer & Analyst at the University of Wisconsin–Madison.

© 2025 Lisa Johnston and Heather Johnston. The content of this work is licensed under a Creative Commons BY-NC-SA 4.0 International License.