ChatGPT gets more than half the programming questions wrong in recent study

But ChatGPT's confidence and politeness convince some people it's right

By Rob Thubron August 8, 2023, 8:08 17 comments

ChatGPT gets more than half the programming questions wrong in recent study

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

Facepalm: Generative AIs often get things wrong – even their makers don't hide this fact – which is why using them to help create code isn't a good idea. To test ChatGPT's general abilities and knowledge in this area, the system was asked a large number of software programming questions, more than half of which it got wrong. However, it still managed to fool a significant number of people.

A study from Purdue University (via The Reg) involved asking ChatGPT 517 Stack Overflow questions and asking a dozen volunteer participants about the results. The answers were assessed not only on whether they were correct, but also on their consistency, comprehensiveness, and conciseness. The team also analyzed the linguistic style and sentiment of the responses.

It wasn't a good showing for ChatGPT. OpenAI's tool answered just 48% of the questions correctly, while 77% were described as "verbose."

What's especially interesting is that ChatGPT's comprehensiveness and well-articulated language style meant that almost 40% of its answers were still preferred by the participants. Unfortunately for the generative AI, 77% of those preferred answers were wrong.

Also read: We Asked GPT Some Tech Questions, Can You Tell Which Answers Are Human?

"During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error," states the paper, written by researchers Samia Kabir, David Udo-Imeh, Bonan Kou, and assistant professor Tianyi Zhang. "However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."

Even when ChatGPT's answer was obviously wrong, two out of the 12 participants still preferred it due to the AI's pleasant, confident, and positive tone. Its comprehensiveness and the textbook style of writing also contributed to making a factually incorrect answer appear correct in some people's eyes.

"Many answers are incorrect due to ChatGPT's incapability to understand the underlying context of the question being asked," the paper explains.

Generative AI makers include warnings on their products' pages about the answers they give potentially being wrong. Even Google has warned its employees about the dangers of chatbots, including its own Bard, and to avoid directly using code generated by these services. When asked why, the company said that Bard can make undesired code suggestions, but it still helps programmers. Google also said it aimed to be transparent about the limitations of its technology. Apple, Amazon, and Samsung, meanwhile, are just some of the firms to have banned ChatGPT completely.

17 comments 239 likes and shares

// Related Stories

Featured on TechSpot