Generative chatbots like ChatGPT also have a remarkable ability to pass for human-like performance in some limited social contexts, scoring well on standardized exams assessments typically used to measure aptitude and performance in a field (2). However, the lack of agency in chatbots means that they are unable to take responsibility for their actions.They cannot fully be members of the community if they operate outside of the ethics and morality of that community. If a generative chatbot makes up data we call it “hallucinations”, if a professional makes up or misrepresents their knowledge on a topic, they can be stripped of their credentials. Researchers who make up data are often stripped of their funding, title, and degree, medical doctors can have their license taken away, and layers can be disbarred.
AI also currently lack the ability to demonstrate physical expertise in a robust way, though there are plenty of companies working to make AI robots that unironically look like the machines that either turn against you or teach you how to love in scifi movies.
While chatbots may not have all of the dimensions of human expertise, they are often seen as potential tools to either give laypeople or novices access to expertise, or help them to develop expertise. A common refrain that I see in conversations around students using ChatGPT is that we should be teaching students how to use these tools. The logic here is that the horse is well out of the barn, and so it would be a mistake to ignore the widespread use of this tool and instead embrace it. It is worth exploring, then, how chatbots might be used to support learning.
One potential area where chatbots could be used is in intelligent tutor systems (ITS), which are designed to help support student learning by providing a simulated learning environment and/or responsive chatbot to help coach students through the learning process. Imundo et al. (2024) provides a brief overview of several of these systems, and while some of them show some promise, they generally are in very early stages (2). For example, Betty’s Brain ITS is a computer agent that students can teach material to and then test to see how well Betty’s Brain learned the information. While early tests showed promise, students who taught Betty made more complete concept maps than those who did not (6), classroom implementation presented some challenges as there was a wide variation in how well students were able to use the program (7). It is worth noting that these tutor systems are specifically designed to help students with specific strategies or within specific domains.
Within medical education, one of the use cases for chatbots is to generate clinical practice scenarios (8). There is a large demand for clinical practice scenarios within medical education because these form the basis of how students are assessed on their licensing exams (see: USMLE). A common study practice for students preparing for these exams is to engage with as many practice questions as they can – potentially thousands of practice questions. Access to these practice questions is typically through third-party resources that can cost hundreds of dollars to access (a one-month subscription to Uworld costs $319, a base subscription to AMBOSS is $19/month, but it costs $149 to have full access to their library of practice questions, TrueLearn starts with a base price of $149 for a month of access to its question bank). In this context, it makes sense why students might turn to ChatGPT to generate practice questions. A recent news article from AAMC reports on an AI tool that was developed to create questions for a course about the blood system. They report that 85% of the questions it created met the criteria and, after human review, 75% of the questions were given to students as study material. While AI might be a useful tool to aid in question creation, it is important to note that it is still prone to errors and biases and thus needs a fair degree of human oversight to ensure that medical students are not being taught inaccurate information (8). Again, I note that a creation of a specific generative tool trained for a specific purpose might be useful with human oversight.
As noted above, chatbots have the ability to assist with learning. However, these tools have the most utility when they are designed for a specific purpose and are used with direct oversight from experts. Experts have the ability to help develop an AI tool to solve a specific problem, determine the appropriate training data for the AI, and check the quality of the output. The last part is incredibly important as even a small error rate can have disastrous consequences if the tool is used on a large scale. It is difficult to estimate the error rate of ChatGPT given the range of prompts and requests it is asked to complete and estimates vary depending on the complexity of the task it is asked to complete. In the case of a complex task like replicating the results of a systematic review, generative chatbots like ChatGPT and Bard produce misleading and incorrect information 28.6% – 91.4% of the time (9). OpenAI estimates that its most accurate ChatGPT model is only misleading or wrong .8% of the time. Even if we go with the conservative estimate of .8%, OpenAI also reports that 200 million people use ChatGPT each week. That works out to 1,600,000 misleading or flat out incorrect responses each week. How do you, as a non-expert in a field, know whether or not you’ve gotten one of the million or so misleading or incorrect responses?
AI errors, particularly errors from generative chatbots, are especially concerning because these tools are particularly good at getting us to trust them. Garry, Kenkel, & Foster (2024) outlined how we decide how real or true something is, a process called reality monitoring (10). When we decide how real or true a piece of information is, we tend to rely on heuristics – does that sound familiar? Was the source confident? We can also rely on more effortful processing to determine the truthfulness of information; analyzing logic, checking sources, etc. Garry et al., highlight several ways in which chatbots exploit reality monitoring to seem more trustworthy. First, the conversational way in which people engage with chatbots help to imbue it with person-like characteristics. Second, chatbots often pause when the model is processing the request. ChatGPT will explain that it is “thinking”, “translating the problem”, “defining variables”, “figuring out equations”, and then “adjusting the calculations”. All of this gives the impression that ChatGPT is giving you a slow and deliberate answer. Third, unlike experts who tend to focus on the nuance of their field, chatbots give precise and definitive answers, which people tend to interpret as confidence in credible and accurate answers. All of these make it feel like we are interacting with a trustworthy source, our own personal assistant. It can be tempting, then, for people to think of AI as more objective and perhaps even more credible than expert sources (9).
Perhaps the biggest concern I have about using AI as a tool for learning is that it has the potential to remove deliberate practice for learners. In the main article that I’ve covered here, Imundo and colleagues largely assumed a good-faith engagement with AI (2). They highlighted ways in which purpose-built and supervised AI tools might used, with supervision, to improve learning or practice. As I noted above, this is very different from how I hear AI being used in education. My friends who teach in K-12 and my colleagues who teach at the university level are not dealing with an influx of targeted, specific AI models that students are using. They’re dealing with students using ChatGPT to outline, refine, and some times just wholesale write papers and complete assignments. I’ve noted at the top of each of these posts that, to the best of my knowledge, I have not used generative AI to write this post. As I edit this in Squarespace an AI tool in the editing pane (next to where I can choose my font and heading style) is constantly flashing. When I searched for articles to explain AI, an AI overview was the first thing to appear. I do most of my writing in Google Docs and it occasionally gives me a little popup asking if I want to try their AI tool, Gemini. It is, frankly, almost harder to not use generative AI at this point. If students are using AI for everything it’s at least in some part because AI is everywhere.