ChatGPT gets more than half the programming questions wrong in recent study

midian182

Posts: 9,745   +121
Staff member
Facepalm: Generative AIs often get things wrong – even their makers don't hide this fact – which is why using them to help create code isn't a good idea. To test ChatGPT's general abilities and knowledge in this area, the system was asked a large number of software programming questions, more than half of which it got wrong. However, it still managed to fool a significant number of people.

A study from Purdue University (via The Reg) involved asking ChatGPT 517 Stack Overflow questions and asking a dozen volunteer participants about the results. The answers were assessed not only on whether they were correct, but also on their consistency, comprehensiveness, and conciseness. The team also analyzed the linguistic style and sentiment of the responses.

It wasn't a good showing for ChatGPT. OpenAI's tool answered just 48% of the questions correctly, while 77% were described as "verbose."

What's especially interesting is that ChatGPT's comprehensiveness and well-articulated language style meant that almost 40% of its answers were still preferred by the participants. Unfortunately for the generative AI, 77% of those preferred answers were wrong.

Also read: We Asked GPT Some Tech Questions, Can You Tell Which Answers Are Human?

"During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error," states the paper, written by researchers Samia Kabir, David Udo-Imeh, Bonan Kou, and assistant professor Tianyi Zhang. "However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."

Even when ChatGPT's answer was obviously wrong, two out of the 12 participants still preferred it due to the AI's pleasant, confident, and positive tone. Its comprehensiveness and the textbook style of writing also contributed to making a factually incorrect answer appear correct in some people's eyes.

"Many answers are incorrect due to ChatGPT's incapability to understand the underlying context of the question being asked," the paper explains.

Generative AI makers include warnings on their products' pages about the answers they give potentially being wrong. Even Google has warned its employees about the dangers of chatbots, including its own Bard, and to avoid directly using code generated by these services. When asked why, the company said that Bard can make undesired code suggestions, but it still helps programmers. Google also said it aimed to be transparent about the limitations of its technology. Apple, Amazon, and Samsung, meanwhile, are just some of the firms to have banned ChatGPT completely.

Permalink to story.

 
Even Google has warned its employees about the dangers of chatbots, including its own Bard, and to avoid directly using code generated by these services. When asked why, the company said that Bard can make undesired code suggestions, but it still helps programmers.
There is so much spin here I'm dizzy. The BS hit the fan and the walls are all brown. I guess that Google just cannot stand to admit that they produced a pile of crap. Not to mention that people's BS detectors are so broken that 77% preferred AI's answers because of style - even though the answers are wrong? I guess if people can't tell the answers are wrong, people prefer the lipstick on the pig. :rolleyes:

My god man. AI? IMO, its more like BSI.
 
Was it GPT3.5 or 4? I've been using GPT4 for programming and it gets most of the programming problems right. It did make some mistakes here and there but the entire logic still worked.
 
There is so much spin here I'm dizzy. The BS hit the fan and the walls are all brown. I guess that Google just cannot stand to admit that they produced a pile of crap. Not to mention that people's BS detectors are so broken that 77% preferred AI's answers because of style - even though the answers are wrong? I guess if people can't tell the answers are wrong, people prefer the lipstick on the pig. :rolleyes:

My god man. AI? IMO, its more like BSI.

Studies have shown time and again that the majority of people will implicitly trust anyone who speaks in a confident manner, no matter how provably and obviously false their statements are. So this isn't shocking whatsoever.
 
Studies have shown time and again that the majority of people will implicitly trust anyone who speaks in a confident manner, no matter how provably and obviously false their statements are. So this isn't shocking whatsoever.

I didn't give a toss about corporate life - use to make me laugh those in dapper suits - walking briskly , confidently with files in one hand - smelt of BS back then. At Uni hated all the student politicians - imagine young Dems or Repub- speaking like they had all the answers - I smelt the stench of egos, learning their BS craft

Yet when I have a C64 as a young person you did know more what would work than some accountant .
The Sony Walkman inventor believed in his idea
Nike, Adidas do use young teens with acumen for the coming market etc for shoes etc ( what's going down on the street )

I tell my son - neat well written exam answers get more marks - as suffering markers need a break from messy drivel
As for Chat Bots - early days - still nascent - just another tool - like a every problem needs a big hammer
 
I wonder what percentage an ordinary programmer would of scored with those questions. I read somewhere that there's an error every 20 lines of new code, then there's simple typos. That 48% starts to sound fairly good to me.
 
I use it as an assistant rather than a "do my work for me" and copy/paste solution. Anyone who ever hands you their code (AI or person), you should always check it over. I've found ChatGPT useful in helping to describe ways of approaching a solution, offering several different introductions to a problem, rather than a final solution. In the end, it takes experience, much like what we traditionally do when hitting all of the major code forums or Google, is to review the code that you come across first and assess it's viability first before implementing. That said, it's still a very useful tool for getting things started or exploring different angles to the various problems.

I especially like using it to introduce me to other languages or frameworks with code that sort of works but requires effort to complete as a good "hello world" project. This gives me something I can do research on and explore the other code forum sites with.
 
Studies have shown time and again that the majority of people will implicitly trust anyone who speaks in a confident manner, no matter how provably and obviously false their statements are. So this isn't shocking whatsoever.
I was unaware of the studies, however, it makes perfect sense to me. All I need to do is look at some examples of modern organized religion to see prime examples of authority figures that are believed, by many, although, fortunately, not everyone. Not that organized religion holds a captive audience of authority figures being believed without question. Other facets of modern society have the same tendency.

To me, its emblematic of the problems of modern society. People have been trained to let others do their thinking for them and many seem to gladly submit.

EDIT: IMO, (and my wife's) AI has a vast potential to exacerbate that problem of letting others do the thinking for them.
 
Studies have shown time and again that the majority of people will implicitly trust anyone who speaks in a confident manner, no matter how provably and obviously false their statements are. So this isn't shocking whatsoever.
I'm proof of that. I tend to want to believe whatever I am listening to. Fortunately I realize what I am doing
 
Back