The humble PDF is becoming a problem for AI

This article contains quite a lot of mispredictions (false hopes). The truth is that within 1-2 years, AI will be able to extract information from PDFs better than an average human and will be able to reason about the text inside PDFs better than an average human.
 
This article contains quite a lot of mispredictions (false hopes). The truth is that within 1-2 years, AI will be able to extract information from PDFs better than an average human and will be able to reason about the text inside PDFs better than an average human.
We all know that PDF's won't stop AI, and that everything created and put online will inevitably be used by AI. Hell, before we know it, almost everything in the physical world will be recorded/streamed and processed and used by AI.
Just let us have our 2 seconds of joy, please.
 
This article contains quite a lot of mispredictions (false hopes). The truth is that within 1-2 years, AI will be able to extract information from PDFs better than an average human and will be able to reason about the text inside PDFs better than an average human.

Is this a prediction that AGI will be a thing within two years? If so, I doubt you are correct.

If you're talking about AI as it currently exists - it may well be able to get past its current problems with PDFs, but it will not be able "reason about the text inside" because it is not capable of reason. It does not reason and it does not think.
 
Is this a prediction that AGI will be a thing within two years? If so, I doubt you are correct.

I don't know what AGI means. It is too soon to tell what it is.

If you're talking about AI as it currently exists - it may well be able to get past its current problems with PDFs, but it will not be able "reason about the text inside" because it is not capable of reason. It does not reason and it does not think.

1. Today, AI is readily able to respond to the query "Write a C++ function doing FOO" with correct implementation of FOO that would take me personally 1-3 days to write. If you do not call that capability "reasoning", then what do you call it?

2. A human translator (capable of translating between just a few languages) is basically a dead profession in ~98% of cases, replaced by AI translation from any major language to any major language. Sure, those remaining ~2% human translators are still required, so the profession isn't going to die completely. If you do not call the ability to replace ~98% of human workforce as "the ability to 'reason' about the text that is being translated", then what do you call it?
 
I don't know what AGI means. It is too soon to tell what it is.



1. Today, AI is readily able to respond to the query "Write a C++ function doing FOO" with correct implementation of FOO that would take me personally 1-3 days to write. If you do not call that capability "reasoning", then what do you call it?

2. A human translator (capable of translating between just a few languages) is basically a dead profession in ~98% of cases, replaced by AI translation from any major language to any major language. Sure, those remaining ~2% human translators are still required, so the profession isn't going to die completely. If you do not call the ability to replace ~98% of human workforce as "the ability to 'reason' about the text that is being translated", then what do you call it?
1. I call that using a statical model to determine the most likely next character in a text string. The amount of time it takes AI to do that is irrelevant to the question of reasoning.

2. I call that the ability to replace ~98% of human workforce. What does replacing a workforce with automation have to do with the capacity for reason? That might be the strangest rhetorical question I have ever read.
 
it is not capable of reason. It does not reason and it does not think.
it doesn’t have to be… it just has to be better than humans at doing the select jobs we ask of it. The improvements in AI over just the past couple of years has been gigantic. There is no reason to believe that these improvements won’t keep on happening.
 
it doesn’t have to be… it just has to be better than humans at doing the select jobs we ask of it. The improvements in AI over just the past couple of years has been gigantic. There is no reason to believe that these improvements won’t keep on happening.
I didn't indicate that I believed that. In fact I wrote "...it may well be able to get past its current problems with PDFs..."

The other poster made a very specific assertion: "[AI] will be able to reason about the text inside PDFs better than an average human."

I think it's well worth pointing out that it will not. Whether it will be transformative despite that fact is a different question.
 
I didn't indicate that I believed that. In fact I wrote "...it may well be able to get past its current problems with PDFs..."

The other poster made a very specific assertion: "[AI] will be able to reason about the text inside PDFs better than an average human."

I think it's well worth pointing out that it will not. Whether it will be transformative despite that fact is a different question.
So you're merely arguing semantics... fair enough... I suspect this will be moot in a decade or 2 as whether an AI is "reasoning" will cease to be a concern. It will simply be.
 
I don't know what AGI means. It is too soon to tell what it is.



1. Today, AI is readily able to respond to the query "Write a C++ function doing FOO" with correct implementation of FOO that would take me personally 1-3 days to write. If you do not call that capability "reasoning", then what do you call it?
I call it copying. That's all AI is capable of doing at the moment. It doesn't have the capacity to come up with novel solutions if someone else hasn't already done so. What it can do is the same thing computers have done already at a lower level: assemble facts and answers from disparate origins and put them together in a way the original thinkers didn't have the sources to do.
2. A human translator (capable of translating between just a few languages) is basically a dead profession in ~98% of cases, replaced by AI translation from any major language to any major language. Sure, those remaining ~2% human translators are still required, so the profession isn't going to die completely. If you do not call the ability to replace ~98% of human workforce as "the ability to 'reason' about the text that is being translated", then what do you call it?
I don't think you understand the word "reason" at all. Just because we are astonished at what AI has been able to do is not remarkable nor is what AI doing "reasoning" the way a human would go about it.
 
1. I call that using a statical model to determine the most likely next character in a text string. The amount of time it takes AI to do that is irrelevant to the question of reasoning.

2. I call that the ability to replace ~98% of human workforce. What does replacing a workforce with automation have to do with the capacity for reason? That might be the strangest rhetorical question I have ever read.
Truly said. Much of what humans do for work is drudgery, it's the same argument that was made for things like vacuum cleaners and dishwashers. None of that makes AI suitable for doing things that require sophisticated thinking which includes things like emotional value, ethics, and long-term considerations. If I do "A" and "B" doesn't like it, what are the likely outcomes and will those be good for society or the world?
 
So you're merely arguing semantics... fair enough... I suspect this will be moot in a decade or 2 as whether an AI is "reasoning" will cease to be a concern. It will simply be.
Again, a very specific claim for AI was made. The other poster first asserted the imminent capability of AI to extract information from PDFs, and rather than stopping there, they went on to make a second, bigger claim about its capacity for reason. They obviously believed it to be an important enough matter to be worth making a claim about.

I happen to think that the method by which AI achieves it's results is an interesting subject and that it's worth countering misinformation on that subject. It's not a matter of "semantics". You clearly find it irrelevant, which I guess is fine.
 
Again, a very specific claim for AI was made. The other poster first asserted the imminent capability of AI to extract information from PDFs, and rather than stopping there, they went on to make a second, bigger claim about its capacity for reason. They obviously believed it to be an important enough matter to be worth making a claim about.

I happen to think that the method by which AI achieves it's results is an interesting subject and that it's worth countering misinformation on that subject. It's not a matter of "semantics". You clearly find it irrelevant, which I guess is fine.
I don't think semantics is irrelevant. I do think most people use language without understanding what they are doing in their heads. AI is never going to equal the quality of well-informed human thinking. It might be able to equal the average not well-informed person, which is why so many entry-level jobs and tasks are ripe for takeover. People are just going to have to get better education before even reaching college age. If your elementary and secondary schools are not up to the task, then woe betide you and your progeny.
 
Again, a very specific claim for AI was made. The other poster first asserted the imminent capability of AI to extract information from PDFs, and rather than stopping there, they went on to make a second, bigger claim about its capacity for reason. They obviously believed it to be an important enough matter to be worth making a claim about.
I don’t think that poster meant what he typed… but, neither of us can read minds… so semantics it is…
I happen to think that the method by which AI achieves its results is an interesting subject and that it's worth countering misinformation on that subject. It's not a matter of "semantics". You clearly find it irrelevant, which I guess is fine.
It might be an interesting subject - but isn’t relevant to reading PDF documents…
 
I don’t think that poster meant what he typed… but, neither of us can read minds… so semantics it is…

It might be an interesting subject - but isn’t relevant to reading PDF documents…
That's a complete non sequitur. You start by saying you don't think they meant what they said, then admit that you cannot know whether or not they meant it, then assert that my argument is therefore a semantic one. How on earth does that work? If someone says something wrong without meaning it, it's still wrong, and whether or not they "meant" it has no bearing on whether or not a given counterargument is based on "semantics".

The other poster introduced the link between AI's imminent ability to read PDFs and it's capacity for reason. The fact that the latter is not the cause of the former - and therefore not relavent to the former - is precisely my point, so why on earth are you upvoting them and arguing with me?
 
That's a complete non sequitur. You start by saying you don't think they meant what they said, then admit that you cannot know whether or not they meant it, then assert that my argument is therefore a semantic one. How on earth does that work? If someone says something wrong without meaning it, it's still wrong, and whether or not they "meant" it has no bearing on whether or not a given counterargument is based on "semantics".

The other poster introduced the link between AI's imminent ability to read PDFs and it's capacity for reason. The fact that the latter is not the cause of the former - and therefore not relavent to the former - is precisely my point, so why on earth are you upvoting them and arguing with me?
I’m merely stating that the use of the word “reason” wasn’t correct… he probably meant to say something like “decipher”… his point, as I understood it, was that AIs will be able to decipher, figure out, etc, PDFs far better than humans in the near future.

The ability to actually “reason” wasn’t, isn’t, and won’t be relevant.
 
I’m merely stating that the use of the word “reason” wasn’t correct… he probably meant to say something like “decipher”… his point, as I understood it, was that AIs will be able to decipher, figure out, etc, PDFs far better than humans in the near future.

The ability to actually “reason” wasn’t, isn’t, and won’t be relevant.
Them: Within 1-2 years AI will be able to extract information from PDF documents better than the average human and the sun will rise in the west.
Me: No, I'm afraid the sun cannot rise in the west.
You: Semantics! They probably meant to say something like "the sun will rise in the east." That was their point as I understood it. In any case, the rising of the sun is not relevant to AI reading PDFs.
 
Hey, almighty AI, looky here :
Render the PDF. OCR it - just like you do with any other image, or scanned page of a magazine.
I'll take the Nobel prize for this. No problem.
AI... Carry on.
 
Them: Within 1-2 years AI will be able to extract information from PDF documents better than the average human and the sun will rise in the west.
Me: No, I'm afraid the sun cannot rise in the west.
You: Semantics! They probably meant to say something like "the sun will rise in the east." That was their point as I understood it. In any case, the rising of the sun is not relevant to AI reading PDFs.
So how is any of this relevant?
 

The ability to actually “reason” wasn’t, isn’t, and won’t be relevant.
A good chunk of PDF utility is understanding the logic behind the decision making. "Why was this font chosen, in this size? What formatting is this? Why are there static rows instead of table separators?" A lot of complicated interconnected structures are maintained by the PDF format. Because that's the point: the first word in "PDF" is "portable" for a reason―they are designed to be platform-agnostic, requiring the bare minimum external libraries to interact with. Not everyone has a licensed copy of Microsoft Word on hand, but just about every device made in the last 30 years can open a PDF.

Trying to "rebuild" PDF files from the ground up, because current models cannot process them correctly, is stupid and also an admission of defeat. If, because they allow for obtuse and asymmetrical design methods, that deviate from linear pattern structures (such as having text run vertically instead of horizontally), AI systems cannot handle them, that's not an indication that PDFs are an inherently poorly-designed format―I mean, they very well might be, but not for that reason―that's a failure of the LLM to overcome the challenge.

LLMs are supposed to be these omniscient, all-knowing, general-purpose automation systems that can "replace all human cognition and labor", right? That's how they're being sold. Yet, attempting to process a simple word document, with a few more tab inserts than usual, causes them to crash and burn, in a daze of errors, bugs and formatting issues? That's not indicative of reading comprehension. That's word synthesis, done poorly. Like skimming 5 paragraphs of a book chapter and then trying to explain character motivations, based on that little information. You're gonna make sh*t up and get most of it wrong.
 
A good chunk of PDF utility is understanding the logic behind the decision making. "Why was this font chosen, in this size? What formatting is this? Why are there static rows instead of table separators?" A lot of complicated interconnected structures are maintained by the PDF format. Because that's the point: the first word in "PDF" is "portable" for a reason―they are designed to be platform-agnostic, requiring the bare minimum external libraries to interact with. Not everyone has a licensed copy of Microsoft Word on hand, but just about every device made in the last 30 years can open a PDF.

Trying to "rebuild" PDF files from the ground up, because current models cannot process them correctly, is stupid and also an admission of defeat. If, because they allow for obtuse and asymmetrical design methods, that deviate from linear pattern structures (such as having text run vertically instead of horizontally), AI systems cannot handle them, that's not an indication that PDFs are an inherently poorly-designed format―I mean, they very well might be, but not for that reason―that's a failure of the LLM to overcome the challenge.

LLMs are supposed to be these omniscient, all-knowing, general-purpose automation systems that can "replace all human cognition and labor", right? That's how they're being sold. Yet, attempting to process a simple word document, with a few more tab inserts than usual, causes them to crash and burn, in a daze of errors, bugs and formatting issues? That's not indicative of reading comprehension. That's word synthesis, done poorly. Like skimming 5 paragraphs of a book chapter and then trying to explain character motivations, based on that little information. You're gonna make sh*t up and get most of it wrong.

Most AI developers do not have your style of thinking. If they did, they would be forced out of their job positions.

You seem to be living in a fairy-tale world. Look, if Google/Meta/etc manager's decide that their AI development teams should consider processing of PDFs a top priority - because they believe it is economically justifiable - then AI will learn to correctly interpret the majority of existing PDF documents within just a few years.

The only exception will be password-encrypted PDF documents.

Understanding of PDF documents is nothing more than an image recognition problem. It is a simpler problem than recognizing objects in a photograph.

The startup company advocating that "rebuilding PDF files from the ground up" is the future, will almost certainly fail. The question is: why are you so easily persuaded by a random startup company's bidding? If they believe in that future, it is their sole responsibility to make that future happen.

What do you think will happen when the enormous data centers that are being built right now will go online over the next few years? Do you believe that AI models are going to become dumber because of that - or are going to become smarter because of that?

... there aren't mass protests in the streets because ~98% of truck drivers lost their jobs because self-driving trucks can do their job better, faster, with a lower accident rate, and more importantly with a measurably lower mortality on the roads. For sure, most truck drivers will be replaced by AI, some day. We just don't know when exactly - that is all.
 
Most AI developers do not have your style of thinking. If they did, they would be forced out of their job positions.

You seem to be living in a fairy-tale world. Look, if Google/Meta/etc manager's decide that their AI development teams should consider processing of PDFs a top priority - because they believe it is economically justifiable - then AI will learn to correctly interpret the majority of existing PDF documents within just a few years.
No, I'm living in reality.

It's not a matter of LLMs processing PDF files being "economically viable", it's about the optics of these massive, money sucking pits of nothing we call "artificial intelligence", being advertised as the answer to ending famine and replacing the entire job market, yet they cannot something as "simple" as read a PDF written in Comic Sans, 12 point font, no graphics, with all of the text written at a 45 degree angle, because they aren't trained on anything but English and only when it is written left to right. That's too difficult for them.

You say "that's not a big deal", I say "it demonstrates 'false advertising'." It doesn't matter what LLMs are supposed to do, only what they can do. Results determine their value.
What do you think will happen when the enormous data centers that are being built right now will go online over the next few years? Do you believe that AI models are going to become dumber because of that - or are going to become smarter because of that?
That depends on the time scale. These data centers are being built on the back of the "promise" of AI, its supposed 100x ROI. They are built with VC funds and there is still no product. A product generates revenue and these models do not generate revenue―certainly not enough to justify their upkeep. The problem, that no developers have solved thus far is, "what do people do with our models, that generates profit?" Like, Grok and Claude have a lot of utilities, but do those utilities even begin to make enough money to cover their operating costs? Or are they basically subsidizing an unprofitable venture?

Eventually, either these companies start generating revenue from people voluntarily paying for services with their models' utilities (because you cannot strong arm people into using your product forever) or the VC capital will run dry. Like, the prevailing theory among tech bros right now is, if customers cannot hide from AI (because it is everywhere and literally impossible to avoid), eventually you will submit and be compelled to use it. Right now, society is still negotiating how much "mandating" it is willing to put up with, but in due time, we will cross the rubicon and then either people will give up or completely reject it. There is no half measure with AI. Either it's ubiquitous or it will find a stable market, but it will not be so omnipresent. It can't be both.
 
No, I'm living in reality.
I don’t think you are… let’s address your inanity, point by point, shall we?
It's not a matter of LLMs processing PDF files being "economically viable", it's about the optics of these massive, money sucking pits of nothing we call "artificial intelligence", being advertised as the answer to ending famine and replacing the entire job market, yet they cannot something as "simple" as read a PDF written in Comic Sans, 12 point font, no graphics, with all of the text written at a 45 degree angle, because they aren't trained on anything but English and only when it is written left to right. That's too difficult for them.
It isn’t too difficult - and they can read the vast majority of PDFs… and they’re improving day by day. What they couldn’t do last week they might be able to do by next week… whereas if the average human can’t do something, it will remain that way for millennia to come.
You say "that's not a big deal", I say "it demonstrates 'false advertising'." It doesn't matter what LLMs are supposed to do, only what they can do. Results determine their value.
It’s not false advertising… results continue to occur - at a far greater success rate each day.
That depends on the time scale. These data centers are being built on the back of the "promise" of AI, its supposed 100x ROI. They are built with VC funds and there is still no product. A product generates revenue and these models do not generate revenue―certainly not enough to justify their upkeep. The problem, that no developers have solved thus far is, "what do people do with our models, that generates profit?" Like, Grok and Claude have a lot of utilities, but do those utilities even begin to make enough money to cover their operating costs? Or are they basically subsidizing an unprofitable venture?
AI is still in its infancy. I know we are in a culture of “now, now, now!”, but reality is a harsh mistress. Rome wasn’t built in a day - neither is AI.
Eventually, either these companies start generating revenue from people voluntarily paying for services with their models' utilities (because you cannot strong arm people into using your product forever) or the VC capital will run dry. Like, the prevailing theory among tech bros right now is, if customers cannot hide from AI (because it is everywhere and literally impossible to avoid), eventually you will submit and be compelled to use it. Right now, society is still negotiating how much "mandating" it is willing to put up with, but in due time, we will cross the rubicon and then either people will give up or completely reject it. There is no half measure with AI. Either it's ubiquitous or it will find a stable market, but it will not be so omnipresent. It can't be both.
We have already crossed that line. And we WILL reap the benefits. We already are in many fields - medicine being a huge one in my books :)
 
I worked it IT support for 27+ years. I love technology. I have always felt that just because we can do something, doesn't mean we should.
I understand AI has some valid uses, but corporations seem more interested in how many people they can lay off with AI.
I personally think we can survive without AI.
 
Back