I Secretly Let ChatGPT Take My Final Exam. The Results Were Stunning.

A standardized exam bubble sheet is filled in in a swirling pattern.
Illustration by Natalie Matthews-Ramo

I teach in the computer science department at Vanderbilt University. In my Algorithms class this past spring, I decided to regularly expose my students to ChatGPT so they could see firsthand that it can’t replace their critical thinking skills. I did this primarily for selfish reasons—like most instructors, I wanted my students to rely on their own creative problem-solving abilities rather than ChatGPT to answer their homework questions. I hypothesized that demonstrating the fallibility of ChatGPT would be a more effective deterrent than a syllabus policy statement. I just needed to find its weaknesses when it came to our course materials.

I decided to ask ChatGPT some general questions about algorithms, along with some algorithms-related questions typically asked in software engineering interviews. The results were surprising. Sometimes, ChatGPT gave the correct answer. Other times, it gave an accurate algorithm, but not the best or fastest. There were even times when it answered a question incorrectly but was confident that it had given the right response. I brought these examples to class and posed the same questions to my students that I had asked ChatGPT. We turned it into a fun weekly event: “ChatGPT vs. CS 3250 Students.”

Two weeks before the end of the semester, one of my students asked if he could speak with me for a few minutes after the lecture. Once his classmates had left the room, he said: “I’ve been getting depressed lately about graduating. Everyone is talking about how large language models like ChatGPT will replace computer science majors. I’m beginning to feel like everything I learned over the past four years is already outdated. I don’t know what to do about that.”

Hoping to reassure him, I told him that no one knows where these technologies will lead. I explained that everyone who has built a career in computer science has learned that they need to quickly adapt to change. I have colleagues who wrote their computer programs using punch cards. And when I started college, there was no email or internet—but by the time I finished graduate school, we had email, laptops, and the World Wide Web. The trick, I told the student, is learning to harness the power of technology to enhance your life and productivity rather than being afraid of it or assuming it will do a better job than you.

We ended the conversation there. Even though the student was satisfied by my response, I wasn’t. I kept thinking about what he had said and about my role as an educator in this current technological landscape. Up until this point, I thought my classroom experiment had been going well. My students had quickly discovered that utilizing ChatGPT on their homework would only benefit them if they already understood the material; they realized that if they wanted to do well in class, they needed to think at a higher level than ChatGPT. After hearing this student’s concerns, I felt like I needed to take my experiment one step further. Maybe if I could prove to my students that they were better at reasoning than ChatGPT, it would help convince them they weren’t going to be replaced by LLMs any time soon. I just had to figure out how to do that.

And then I had an idea.

Before the final exam, I decided to give ChatGPT the exact same assessment my students would take. None of the exam questions relied on specific knowledge that could only have been gained by attending my lectures—most of the questions were derived from, or based on, algorithms questions used for software engineering job interviews. Once it finished answering all the questions, I copied them (without making any changes) into my exam template. Then, I created a fictitious student named “Glenn Peter Thompson” (GPT), and uploaded Glenn’s answers to our grading portal, where our large group of teaching assistants scored all the exams. (Our class was large enough that none of the graders knew of my experiment.) And then I waited to see what Glenn’s grades would be.

The results were stunning, and confirmed what I had witnessed during the semester: Every single student in the morning section of my class scored higher on the final exam than Glenn, which only managed a C-minus with a score of 72.5. My class average was in the mid-80s. In my afternoon section, and facing a different set of final exam problems, Glenn fared somewhat better, but still scored below the mean in the bottom third of the class—the equivalent of a C-plus.

After posting the final grades, I shared the experiment and results with my students, who were both thrilled and relieved that their skills (for which they pay good tuition dollars) outshined the shiny new kid on the block. I also shared the results with a couple of my colleagues on a newly formed committee charged with recommending policies for dealing with ChatGPT and A.I.-related tools in the classroom. They loved the experiment—one of my colleagues even suggested that faculty members from every discipline should do the same.

ChatGPT is the oversized A.I. elephant sitting front and center in every classroom. Instructors can try to ignore or prohibit it, but doing so doesn’t change the reality of the situation: Students are curious about it, talking about it, worried about it, and using it. Now I need to follow the same advice I gave my student. Like this emerging technology, I’ll continue to learn, evolve, and adapt. And if there’s a way for me to use new technology and become a more effective educator, I’m all in.

Oh, and there’s one other catch with ChatGPT. Unlike your standard calculator, which doesn’t get smarter the more you use it, LLMs like ChatGPT are ever-evolving and learning. That means I can’t sit back and believe I’ve got this all figured out. Glenn’s in my class again this semester, and he might have figured out the problems he didn’t understand last spring. But I’m staying one step ahead of him: I’ve already found my first new ChatGPT flaw for next week’s material.