This article presents two problems, a confession, and one clear solution for leaders who are grappling with how to properly assess learners in the age of AI.
The following scenario is unfolding in academic department meetings across the country: A faculty member who is reviewing a learner's written formative assessment believes something is off. Perhaps the style doesn't match the student's earlier work or incorporates a cadence or structure that seems out of place for the assignment type. The stakes are high as the talk turns to whether generative artificial intelligence (AI) was used. Another instructor raises concerns over the accuracy and bias of AI-detection tools. The conversation stalls, and the issue remains unresolved.
AI is rapidly transforming the classroom, and assessment is one of the most impacted areas. Old methods and new tools that are purported to be able to detect AI-generated text raise new concerns. What is the path forward? Higher education technology leaders should empower instructors to develop authentic assessment methods that are facilitated by technology—including AI. While these leaders are tasked with setting up instructors to succeed, the path ahead has never looked so uncertain.
AI Is Changing Everything
The landscape instructors face is transforming at an alarming pace. The proliferation of generative AI tools like ChatGPT has opened the door to misuse by learners. Even the concept of "misuse" is a gray area, as few institutions have laid out comprehensive policies around the use of AI tools by learners, instructors, or staff, and the line delineating what constitutes appropriate use has yet to be established. A recent survey of students, instructors, and administrators found that 51 percent of students would continue to use generative AI tools even if such tools were prohibited by their instructors or institutions.Footnote1 AI plagiarism—the use of generative AI to produce content that someone submits as their own work for assessment tasks—represents a real challenge across institutions. In an effort to solve the problem, IT and institutional leaders are grappling with what amounts to an arms race, procuring tools that claim to use AI to detect AI-generated plagiarism. If only it were that easy.
New Tools, New Concerns
A growing body of research is casting doubt on the efficacy of AI detection, with two key issues emerging: accuracy and bias. First, how accurately can AI-generated content be detected? Five computer scientists from the University of Maryland recently conducted a study in which they emphatically concluded that AI-generated text cannot be reliably detected, and simple paraphrasing is sufficient to evade detection.Footnote2 A separate study of fourteen AI detectors conducted by academic researchers in six countries found that the accuracy of these tools ranges from 33 to 81 percent, depending on the provider and the methodology used.Footnote3
Second, are the current iterations of AI-detection tools creating new issues by inadvertently introducing bias? The data that AI models are trained on is scraped from the internet, where the content is predominantly written in English. Stanford researchers evaluated whether this might lead to challenges in identifying whether Test of English as a Foreign Language (TOEFL) essays were AI-generated. Indeed, the researchers found that more than half of TOEFL essays were incorrectly classified as AI-generated.Footnote4
These challenges do not end at the edge of campus. As an education technology product leader, my team and I evaluated whether AI detection fits our solution ecosystem, and we grappled with these same questions. The framework that guided our evaluation and continues to guide our product development is grounded by inclusivity and accessibility principles and declares that academic integrity policies should level the playing field for every student. Fairness matters. In evaluating whether our company should pursue AI-detection capabilities, we conducted beta testing with a cohort of clients. The results were sobering. The participants had low confidence in the ability of AI detectors to correctly identify AI-generated content, with 80 percent of respondents believing that AI detectors are, at best, correct only "sometimes." When it comes to detecting biases, our evaluation also matched research related to the introduction of bias, with samples written by students who speak English as a second language and students with autism spectrum disorder incorrectly identified as AI-generated content. We concluded it would not be ethical to employ AI-detection tools at this point of development.
The Way Forward with AI: Humans
Assessment is still a critical pedological element, but it's time to think about it differently. In its simplest form, authentic assessment moves away from accrued knowledge to focus on the practical application of skills, prioritizing complex tasks over binary right-or-wrong questions. In the age of generative AI, authentic assessment is more important than ever. Injecting personal perspectives, critical thinking, and self-reflection in a way that appears genuine is much more difficult for generative AI technologies than it is for humans. For instance, authentic assessment in a business course may mean holding a mock negotiation where students can actively demonstrate their comprehension of the material. Authentic assessment isn't new. Educators have long recognized its benefits, including the ability to provide a clear connection between coursework and careers through the application of knowledge.
AI As a Means to Inspire Instructors
Authentic assessment demands time, a resource that is in short supply for instructors. Anthology believes AI can make a real, tangible impact today and help meet the challenge of creating more time for instructors. Learning technology infused with AI tools can reduce the time needed to complete administrative and production tasks like creating courses, enabling educators to spend more time teaching and working with students. Peer assessment and group work require a high level of authenticity, and tools that seamlessly support group work are another example of how technology can help instructors be more efficient.
Learning technology that reduces the administrative burden is critical for empowering instructors to rethink their assessment methods. There's no way around it: authentic assessment requires an investment of time on the part of the instructor. Freeing time that would otherwise be spent on repetitive or lower-value tasks to develop, test, and implement authentic assessment is the path forward. While the landscape may be transforming rapidly, instructors remain an institution's most valuable resource in the classroom. Combating the challenge that generative AI poses in evaluating learners starts with doubling down on the human element and adopting a proactive approach. The wait for solutions that can accurately identify AI plagiarism while avoiding serious ethical concerns might be long. Instead, institutional leaders need to embrace AI as part of the larger landscape and develop policies and approaches that use it to assess learners more authentically. For a more in-depth vision, download Anthology's whitepaper on an ethical path forward for education.
- Louis NeJame et al., "Generative AI in Higher Education: From Fear to Experimentation, Embracing AI's Potential," Blog + Higher Ed (blog), Tyton Partners, April 25, 2023. Jump back to footnote 1 in the text.
- Vinu Sankar Sadasivan et al., "Can AI-Generated Text Be Reliably Detected?" (unpublished manuscript, June 28, 2023), PDF file. Jump back to footnote 2 in the text.
- Deborah Weber-Wulff et al., "Testing of Detection Tools for AI-Generated Text," (unpublished manuscript, July 10, 2023), PDF file. Jump back to footnote 3 in the text.
- Weixin Liang et al., "GPT Detectors Are Biased Against Non-Native English Writers," Patterns 4, no. 7 (July 2023): 100779. Jump back to footnote 4 in the text.
JD White is Chief Product Officer at Anthology.
© 2023 Anthology.