Using the voice capabilities in ChatGPT to enrich interactive learning experiences could signal a new era in education and course design.
In language education, integrating artificial intelligence (AI) is no longer just a possibility; it is a reality unfolding in classrooms worldwide. ChatGPT, renowned for its exceptional text-generation capabilities, recently integrated voice-based interaction, allowing for real-time audio-responsive communication.Footnote1 This article chronicles the integration of ChatGPT into a conventional language-teaching framework—specifically, an advanced English language seminar course taught at a university in Tokyo during the fall 2023 term. This work illustrates the transformative impact AI can have on traditional educational practices and is meant to serve as a guide for educators.
Voice Capabilities in ChatGPT: An Overview
Recent advancements in generative AI, particularly near real-time voice interaction capabilities, necessitate a pivotal shift in language education methodologies. This interactive technology simulates realistic conversations akin to those students would have with native speakers of the target language.
ChatGPT is a sophisticated conversational language model known for generating human-like text. OpenAI publicly released ChatGPT on November 30, 2022.Footnote2 Soon after, various text-to-speech applications emerged, enabling the text generated by ChatGPT to be converted into spoken words. Initially, text-to-speech applications may have experienced latency and unnatural speech patterns, affecting the perception of a natural conversational flow.
On September 25, 2023, OpenAI announced the rollout of a built-in text-to-speech function for developers, enterprise users, and "Plus" users. This feature was built using the Whisper open-source speech-recognition system, which was made publicly available in September 2022. This integration provided seamless, rapid responses to prompts, improving the conversational experience. On November 21, 2023, OpenAI made these voice interaction features freely available to all mobile app users, regardless of their subscription status.Footnote3
Like Apple's voice-enabled virtual assistant Siri, ChatGPT offers five voice options: Juniper, Sky, Cove, Ember, and Breeze. Each voice has distinct assertiveness levels and generational nuances. The voices can be sampled on the OpenAI website without an account.Footnote4 These AI-generated voices are consistent across different languages, allowing users to simulate conversations with bilingual or multilingual speakers. The ability to maintain a consistent voice regardless of language is a notable improvement over traditional language-learning materials, which typically do not offer this type of experience. Users can switch voices at any time according to their preferences.
Pilot Study Design
As the instructor of an advanced English language seminar course at a Tokyo university, I conducted a pilot study in October and November 2023 to explore the potential of using voice-enabled AI tools, specifically those that enable real-time audio-responsive interactions, to enhance speaking and listening activities during language learning. Given the absence of existing models, I carefully designed the study to integrate these emerging capabilities into an educational context. As a part of this study, I also explored the applicability of such tools in other fields and disciplines.
In-Class Implementation of ChatGPT
For this pilot study, ChatGPT was integrated into a seminar course designed to elevate students' academic English proficiency. This course was structured around a globally recognized textbook and was ideal for the innovative integration of ChatGPT.
Five students were enrolled in the course. The small class size and the availability of a single iPad with a paid ChatGPT account facilitated a focused study setting. The students had already used ChatGPT for scriptwriting assignments, but using the voice capabilities in ChatGPT added a new dimension to the learning experience.
The pilot study was divided into two phases: a pre-pilot phase and a main pilot phase. Both phases were strategically timed within the semester to inject a refreshing element into the course. During the pre-pilot phase, the functionality of the technology (iPad setup, internet connectivity, and audio volume) and the feasibility of the pilot design were tested. Positive student reactions and feedback confirmed the feasibility of the approach, allowing me to proceed confidently to the main pilot phase. The main pilot phase included an online follow-up questionnaire aimed at gathering detailed student impressions and evaluations.Footnote5
Figure 1 illustrates the iPad set up in the seminar classroom, showing how ChatGPT was integrated into the teaching environment. The figure details the positioning and use of the iPad as a central hub for students' interactions with ChatGPT. Importantly, with the volume of the iPad set to the highest level, all students could hear clearly and interact with ChatGPT without any auditory issues.
During the pilot study, I used ChatGPT in two distinct phases: (1) to create mini-lecture scripts with comprehension questions before the class, and (2) to facilitate Q&A voice interactions with students during the class. The mini-lectures were crafted around specific topics (see table 1). My specifying prompts included asking ChatGPT to create about four hundred words of lecture content and five comprehension questions formatted like those in standardized listening tests. Given the nature of the pilot study, the selected material was easily accessible (slightly less challenging than the typical course content) to minimize external problems and make it easier to identify areas for improvement related to the use of ChatGPT.
Table 1 outlines the framework and content used in the pilot study, including an overview of the course structure and material. The seminar class met fifteen times over the semester. The pilot study was included in two course sessions.
Phase | Week | Audio Lecture Topics |
---|---|---|
Pre-pilot |
8 |
The Fundamentals of a Balanced Diet (approximately 4 minutes) |
Pilot |
11 |
The Science of First Impressions (approximately 6 minutes) |
Figure 2 illustrates the progression of the study design, emphasizing enhancements made to utilize the voice interaction capabilities of ChatGPT more effectively and enhance the learning experience. In both phases, students listened to a mini-lecture, responded to comprehension questions in print enhanced with audio, and then participated in a Q&A session with ChatGPT. They also engaged in self-study with the lecture script.Footnote6
During the main pilot phase, the process was further enriched. Students listened to the lectures twice, and the Q&A sessions were conducted entirely through voice interaction, including a review of correct answers by ChatGPT. These refinements aimed to enhance the students' listening and speaking practice and maximize engagement and learning effectiveness using AI-assisted methods.
The "Breeze" voice option was selected to enhance student engagement with ChatGPT during the study. Its youthful tone is well-suited to the students' ages, fostering camaraderie and relatability. However, the version of ChatGPT used during the pilots lacked the ability to read previous posts. To address this lack of functionality, NaturalReader, a free AI voice generator, was used to convert the mini-lecture text generated by ChatGPT into audio for listening purposes.Footnote7 The "Ryan" voice option at "normal" speed (180 words per minute) was selected for clarity. Students' feedback during a preliminary phase affirmed this choice. Despite its marginally artificial quality, the consistency in this voice selection ensured a coherent auditory experience. The students did not report any issues related to understanding the voice. This dual approach to voice selection leveraged the strengths of each voice technology to enhance the overall learning experience and was crucial to maintaining an engaging and accessible audio component during the study.
Ethical Considerations
This study integrated innovative educational activities in an elective seminar course. Students could actively participate in these activities and had the option to withdraw from the study at any time. Institutional policies classified the study as free from potential biological harm, exempting it from a separate ethical review. All participants provided informed consent for their questionnaire responses to be used in this research project. The survey allowed students to explicitly authorize the analysis and potential publication of their data, which was anonymized and stored on a secure, school-managed survey system. This data will be removed upon completion of the study. These measures were crucial for safeguarding students' privacy and autonomy and ensured their full awareness and comfort with their role in the study.
Observations and Feedback
The analysis presented in this article synthesizes various data sources, including my observations as the instructor during the pilot phases, transcripts from ChatGPT, and feedback from the students following the main pilot phase. This integrative approach provides a holistic view of the impact and outcomes of this research.
Instructor's Observations
Integrating ChatGPT voice interaction features into the classroom was pioneering; no templates or guidelines existed at the time of the study. The plan for conducting the pilot was developed through interactive "discussions" with ChatGPT in English. During these exchanges, the AI displayed a high level of comprehension in lesson design, comparable to interacting with a human colleague. The collaboration was seamless and intuitive. The only notable limitation was the inability of the system to read previous posts aloud. Overall, using ChatGPT for educational planning proved to be efficient and effective.
The pilot studies underscore the value of timely turn-taking for effective communication with ChatGPT. Figure 3 shows three vital visual cues for student training: the circle indicates ChatGPT's readiness to "listen," the cloud icon represents the processing of internet queries, and the vertical rectangles indicate that ChatGPT is responding. These cues replaced the non-verbal cues typical in human interactions. Learning to recognize these cues was essential for students to engage in smooth conversations with the AI. By the time the main pilot phase was conducted, the students were adept at navigating these signals, leading to more fluid and confident interactions with ChatGPT.
Throughout the pilots, the students did not display any negative reactions toward or make any negative comments about the ChatGPT implementation. Their prior experience with ChatGPT for text-based English presentation scriptwriting may have eased their transition to engaging in spoken dialogue with the system. A notable improvement in the students' communication skills with ChatGPT was observed from the pre-pilot phase to the main pilot phase. These observations were corroborated by students' responses to a simple questionnaire administered during the main pilot phase.
Further analysis of the text transcripts from the pilots revealed significant insights. In the pre-pilot phase, one student courageously posed a challenging question to ChatGPT—a behavior students typically might avoid to preserve a teacher's reputation in our cultural context. ChatGPT's detailed and accurate response to this challenging question was met with awe by all participants.
Student: "How many milliliters are in the 8 glasses of water that are recommended to drink in a day?"
ChatGPT: "Generally, the recommended daily water intake is calculated with one glass being approximately 240 milliliters. Therefore, 8 glasses amount to approximately 1,920 milliliters, which is about 2 liters."
The evolution in the students' approach to questioning was evident from the pre-pilot phase to the main pilot phase. Initially, students' questions were simple, focusing on vocabulary meanings. However, the students began asking more complex questions as the study progressed. This shift is exemplified in the following dialogue. The student's inquiry not only tested the evaluative capacities of ChatGPT but also facilitated the student's critical thinking and engagement with the material.
Student: "Which one is the most difficult among problems 1 to 5?"
ChatGPT: "The difficulty of a problem often varies based on an individual student's experience and knowledge. However, generally speaking, problem 3, 'How does cultural background influence first impressions?' might be the most challenging. This question requires an understanding of the cultural factors involved in forming first impressions and considering how these factors might affect specific behaviors or communication styles. Because cultural differences and their understanding can often be complex, it presents a substantial opportunity for thought for students."
Students' Feedback
Students' feedback was collected via a survey following the main pilot phase. The survey comprised seven questions. Questions one and two focused on the listening materials, questions three and four concerned interactions with ChatGPT, and question five assessed the overall appropriateness of the AI-integrated listening activity. Question six included an open-ended space for suggestions, and question seven sought the participants' consent for their responses to be analyzed and published in the study (see table 2).
Response Category | Q1: Sound Quality | Q2: Content Difficulty | Q3: Linguistic Clarity of ChatGPT | Q4: Content Clarity of ChatGPT | Q5: Overall Learning Scenario |
---|---|---|---|---|---|
No problem |
1 |
– |
3 |
3 |
3 |
Not much of a problem |
3 |
3 |
1 |
1 |
– |
Somewhat hard |
– |
1 |
– |
– |
1 |
Hard |
– |
– |
– |
– |
– |
Total Responses |
4 |
4 |
4 |
4 (100%) |
4 (100%) |
The responses reflect the small sample size of this pilot study. Percentages are provided for clarity.
The survey responses suggest that the listening materials were well received, the sound quality was adequate, and the level of difficulty of the learning materials was appropriate. Students' reaction to the responses provided by ChatGPT—in terms of both the language used and the content—were overwhelmingly positive. These findings indicate that ChatGPT may have adapted its language to match the difficulty level, a strategy often employed by human instructors. Most of the students (75 percent) deemed the learning scenario—which comprised listening materials, self-study, and a Q&A with ChatGPT—effective. However, one student (25 percent) responded differently to questions three, four, and five, suggesting a nuanced perspective of the difficulty level.
The open-ended feedback provided several valuable insights.
- Enunciation Issues: Three students mentioned occasional misinterpretations by ChatGPT. One student stated, "I think the response from ChatGPT would be perfect if it could accurately understand and hear our questions."
- Response Clarity and Appropriateness: Two students appreciated the clarity and accessibility of the responses provided by ChatGPT. One student noted, "The response from the AI was clear and easy to understand, and the English pronunciation was very easy to hear."
- Bilingual Handling: The ability of ChatGPT to respond in both Japanese and English was beneficial for students. One student remarked, "It was good that I could ask questions in both Japanese and English."
- Future Applications: Two students said they were anticipating future applications of AI tools in English classes, particularly tools with visual capabilities. One student observed, "I look forward to seeing what kind of applications there will be when AI is given eyes, and how English classes will evolve."
- Desired Improvements: Some students suggested improvements, such as controlling the speed of the responses and shortening the length of the responses provided by ChatGPT. The student who suggested speed control provided varied feedback across several survey questions, suggesting diverse student needs.
One intriguing aspect I observed during the pilots was how seamlessly the students switched between their native language and English, including interlanguage phonetic features. This dynamic interchange may have influenced how ChatGPT processed the students' queries, occasionally leading to responses that deviated from the students' intended meaning. Notably, the student who expressed a desire to control the speaking speed of ChatGPT suggests that the default speed might have been too fast for the student's comfort. This observation points to a valuable area for future improvement: enhancing students' interaction skills with AI. One potential direction would be to encourage students to actively direct ChatGPT to provide shorter, more straightforward responses. Not only would this approach improve the accessibility of the output provided by ChatGPT, it would help students manage their real-time interactions with the AI tool and control the flow of communication during their interactions. Such an adaptation could greatly aid in customizing student-AI interactions to better suit individual student needs in future implementations.
Reflections and Practical Considerations
The following key insights emerged during this pilot study. These insights offer practical considerations for future applications of ChatGPT across various disciplines.
- Versatility of the Learning Materials: The voice capability of ChatGPT can be integrated well into any subject using a text-and-comprehension-question pattern. Once the voice-interaction scenarios are refined, they can be replicated and tailored to meet varying difficulty levels and educational contexts.
- Scenario Planning: Besides the instructional scenarios described in this study, additional scenarios that integrate the voice capabilities of ChatGPT can be developed and applied in other fields such as science, history, and literature. Flexibility and adaptability to unexpected course changes are crucial for refining instructional scenarios.
- Language Choice Flexibility: Language selection in ChatGPT interactions should align with the learning objectives of the course. The flexibility of the tool in handling various language styles and difficulty levels aids in addressing cross-disciplinary themes, maximizing the effectiveness of the interactions when students use languages interchangeably.
- Voice and Audio Adjustment: The range of voice tones available in ChatGPT helps to facilitate connections with students in varying age groups, enhancing the overall learning experience. Connecting a single iPad to a classroom speaker can amplify the audio, ensuring the entire class can hear the audio responses.
- Small-Group Testing: Initial testing in small groups is advisable to identify and address technical and technology issues, thus paving the way for more successful implementations in larger educational settings.
Observing student-led initiatives reveals how AI can adapt to and enhance learning methods. In many cases, the student-led interactions with ChatGPT surpassed the instructor-designed scenarios in terms of effectiveness. Observing these interactions led to significant enhancements to the study design. Future applications of ChatGPT hold immense potential to evolve current and pioneer new pedagogical strategies. As AI technology advances, the methods used to integrate the tool in educational settings will also advance.
Future Directions
The next phase of this research project will focus on diverse lesson designs that integrate multiple ChatGPT instances, catering to individual and group activities. An interesting exploration could be planning a voice discussion session using ChatGPT to debate topics such as sustainable development with AI. Addressing current technological limitations, such as fixed speech speed and the inability to read previous interactions aloud, will further improve the utility of ChatGPT in education.
Conclusion
This research demonstrates the practical and transformative potential of using the voice functionality in ChatGPT in education. Enriching interactive learning experiences with this functionality could signal a new era in education and course design. As Richard Van Noorden and Richard Webb write, "The generative AI revolution has started. And there's no turning back."Footnote8 The challenge ahead lies in innovating and leveraging AI for dynamic and enriching educational collaborations.
Acknowledgments
I would like to express my gratitude to the students who participated in this pilot study for their enthusiastic engagement and valuable feedback. Their input enriched my exploration of this teaching methodology across various educational disciplines. Additionally, I would like to acknowledge the role of ChatGPT, an AI system created by OpenAI, in providing unique insights and opportunities for innovation in teaching and learning.
Notes
- "Chat GPT Can Now See, Hear, and Speak," Blog, OpenAI, September 25, 2023. Jump back to footnote 1 in the text.
- "Introducing ChatGPT," Blog, OpenAI, November 30, 2022. Jump back to footnote 2 in the text.
- "Chat GPT Can Now See, Hear, and Speak," September 25, 2023; "Introducing Whisper," OpenAI (website), September 21, 2022; "ChatGPT with Voice Is Available to All Users," Chat GPT Release Notes, OpenAI, November 21, 2023. Jump back to footnote 3 in the text.
- "ChatGPT Can Now See, Hear, and Speak," September 25, 2023. Jump back to footnote 4 in the text.
- The survey tool was developed using Qualtrics, a survey tool licensed by the university. Jump back to footnote 5 in the text.
- "Self-Study" refers to the phase where students individually work on printed text scripts and questions, allowing them to refine their answers based solely on prior audio listening. Jump back to footnote 6 in the text.
- The latest version of ChatGPT allows manual playback of voice responses directly from a desktop interface. This feature allows educators to generate audio content directly within the interface, without relying on a third-party application. Jump back to footnote 7 in the text.
- Richard Van Noorden and Richard Webb, "ChatGPT and Science: The AI System Was a Force in 2023—for Good and Bad," Nature, December 13, 2023. Jump back to footnote 8 in the text.
Terumi Miyazoe is an Associate Professor at Tokyo University of Science.
© 2024 Terumi Miyazoe. The content of this work is licensed under a Creative Commons BY 4.0 International License.