ChatGPT for teachers

Should we allow students to use ChatGPT?

According to our admin team, the current guidance is that students may use ChatGPT but need to understand it’s just another source of information that needs to be evaluated for credibility like any other source. And text that comes from GPT needs to be quoted and attributed just like anything the student did not write themselves.

However in order to help our students understand what that actually means, we need to help them understand what GPT and other Large Language Models (LLMs) are actually capable of and how they produce text via a totally different process than human authors.

The first thing to understand is that GPT is extremely unreliable on factual matters. It was built to produce plausible text not factual text. Which is not to say it never produces factual text. But if it does, it is only because that text happens to be more plausible than the alternative.

GPT has no ability to gauge the accuracy or truthfulness of its own output because it is not built to have any notion of accuracy or truthfulness. But that won’t stop it from confidently answering if you ask it if it is being accurate or truthful—it will generate whatever it determines to be the most plausible response to such a question but still with no regard to accuracy or truth.

As a general rule of thumb nobody should take any factual statement emitted by GPT at face value. It could possibly be useful to generate a starting list of facts if you (or your students) were going to verify them via some other means.

Remember when we worried that Wikipedia wasn’t reliable? It is the gold standard of human curated knowledge these days compared to the hallucinations of LLMs.

How can I tell if writing is AI generated?

Sadly, there is currently no way to reliably determine whether a piece of text was generated by an AI.

OpenAI, the makers of ChatGPT, have released a separate tool that attempts to classify texts as likely written by a human or likely written by an AI but even that tool will, according to them, classify 9% of human written texts as “Likely AI generated”. That means if you check all the assignments from a thirty-student class, all of whom did their own work, the tool would still estimate, on average at least two of them as having been likely written by an AI.

It is also extremely important to note that this is a separate tool from ChatGPT itself. You cannot ask the chat interface itself to evaluate whether it wrote a given piece of text—it will absolutely make something up.

To see this in action, you may want to read this conversation I had with GPT-4 about a piece of text that I know I wrote.

How does ChatGPT work?

If you read one thing (beyond this thing you’re reading now) about how to think about how ChatGPT works, you should read “ChatGPT as a query engine on a giant corpus of text”. If after reading that you want a more in-depth and technical explanation of how ChatGPT actually works you can read Stephen Wolfram’s explainer “What Is ChatGPT Doing … and Why Does It Work?”.

The really simple version is, ChatGPT is a giant function that predicts, given some starting text, what words are likely to come next based on what it has seen in the huge swaths of text from the internet that it has been trained on. It then picks a probable word, emits it, and uses the original text plus that new word as the starting point to predict another word, and so on. This is a obviously a vast oversimplification but that’s fundamentally what it’s doing.

The thing that makes it seem so eerily intelligent is the sophistication of how the “context” is embedded in the model; this is what allows it to produce answers that are responsive the questions, etc. But there is no underlying “mental model” beyond word probabilities and no ability for it to do anything other than generate probable words. Thus it can easily be tripped up by simple math and logic questions which you’d think a computer would be really good at.

There are several examples in “ChatGPT as a query engine on a giant corpus of text” of ChatGPTs inability to do basic math and logic. Or you can read “Math is hard” wherein I try to get GPT-4 to do some math.