AMD next-gen APUs reportedly sacrifice a larger cache for AI chips

Blah. I mean, I've tried to use one of these NPUs before and they're damn near useless. My 11th gen Intel system has a 1st-gen NPU in it, and I was like "OK, I want to run some Tensorflow jobs, I'll use the NPU!" Oh no! It only supports 1-dimensional data, with something like a 64KB size limit. Basically it could be used for processing audio and that's it! (Windows apparently uses it for filtering out background noise on the mic.) I'm not running Windows so... Linux does have a driver for the NPU, and Tensorflow or something did support it; but I was not processing audio so I found it to be completely useless and it's sat in there unused to this day.

I think it's a big mistake to compromise the performance of the actual CPU and GPU (which, after all, is what people buy an APU for) in order to cram some NPU into it. Maybe they should make one model with NPU, and one with more cache? I for one would rather have better application performance. They're kind of making the same mistake Intel did (for a while) when they began loading chips down with AVX512 units, only to find if you actually went to use them (like video encoding on all cores) that the heat production was so high it was faster on these systems to not use all the AVX512 cores.
 
Okay, I could (and most likely are) talking out my @ss here, but isn't the current AI computing darling Nvidia's GPUs? Are iGPUs able to compute AI data? And if they are, why not repurpose them for that? It's better than just having the iGPU taking up space isn't it?
 
In the 1980's OOP was most of what AI was. Machine learning (ML) was not AI for 60 years, then a few years ago it became AI. Only yesterday deep learning was AI, but it is completely different than an adversarial network. I get better answers on Stack Exchange than on ChatGPT. Structured meta-tags would beat a zero-sum guesser in the long run, the whole Internet search thing started with meta-tags (they just weren't structured). The magic in AI is from humans who are not embedded in the chip. So this seems to be a clear case of "ready, fire, aim".
 
Oh noes. I hope we can pick between both, when its time to buy a new CPU. I know what I would pick any day of the week. Cache is king!
 
Unfortunately Microsoft does kind of set the policy for AMD and Intel though, so they need to care what Microsoft wants. If Microsoft has some new certifications for PCs that need 40 TOPS of AI performance and AMD doesn’t worry about it, vendors won’t be able to sell their devices that contain AMD hardware as certified. That means AMD needs to make a product/s that Microsoft dictates. I hope AI crashes harder than 3d TVs.

Only because Intel is in bed with MS and AMD has to follow suit. If Intel told MS to shove it we wouldn't be in this situation. I only mentioned AMD specifically because we don't whether Intel had other designs for Arrow Lake or Lunar Lake. Intel were probably equally involved in pushing the NPU spec but funnily enough will still trail AMD in NPU performance next gen despite all the Lunar Lake hype.
 
Even if the "computation" part fit on that 15 Watt laptop, how is it going to make sense for the global knowledge database to be stored there, and kept updated? The edge algorithms that perform the last steps in Google Search might run fine on a laptop, but the knowledge base that you're trying to access ultimately exists only on external resources. I don't see how ChatGPT can be any different. And once you accept you need to go off device, I don't see why it makes sense to put the hardware I'll use a few times a day on the client vs. to have it on the server where the same silicon can serve 100 or 1000 people instead.
You don't store "global" knowledge. You store the parts that are small enough to be used for that particular application. You can already run many of these text and image generators on your PC. If properly optimised many tasks will be done on device with the obviously much harder tasks running in the cloud. That hybrid approach is what's used on phones right now like the Galaxy S24 and this is clearly not a 15W device :)

This is an article from last year: https://www.pcmag.com/news/apple-creates-new-way-for-llms-to-run-on-iphones
 
Okay, I could (and most likely are) talking out my @ss here, but isn't the current AI computing darling Nvidia's GPUs? Are iGPUs able to compute AI data? And if they are, why not repurpose them for that? It's better than just having the iGPU taking up space isn't it?
Yup! And in fact I did that on an AMD integrated GPU (I had a Dell notebook with a Ryzen 3450U, got rocm built (rocm is no cuda I'll tell you that) so I had tensorflow running on it. Unfortunately (at least a few years ago when I did it) the GPU would NOT share time between rocm and, you know, servicing the video driver. So if I ran a tensorflow operation that took more than like 5 seconds, the driver would decide the GPU had locked up and reset it. (X recovered fine, with just a message in dmesg to let you know it had reset the GPU, but resetting the GPU of course killed the longer-running tensorflow operation.) The Intel GPUs also support compute workloads.

You know? I hadn't even thought of that. That makes it even more daft that they are shoveling these NPUs in them when they could run these workloads on the GPU. I will note, Tensor units are touted like they are some AI magic, but they just add support for additional data types (graphics/vertex shaders use like 32-bit and 64-bit floats, but some of these neural net algorithms now use 16-bit and apparently even 8-bit and 4-bit values; like, if some "neuron's" activation function only cares if the incoming value is above or below the halfway point (for instance) you don't need 32 or 64 bits of precsision to decide that, and naturally it can do it's matrix math or whatever on a bunch of 16-bit values faster than 64-bit ones.) So that indeed could EASILY be added to GPU designs that don't already have it. You could run CUDA on even a GTX650 (not that fast, maybe 2-4x the speed of the CPU) and on my GTX1650 (runs pretty fast! No tensor units in this thing.)
 
Yup! And in fact I did that on an AMD integrated GPU (I had a Dell notebook with a Ryzen 3450U,
(clipped for length)
So I'm not talking out my @ss, that's a first. But anyway, when you consider that many if not most iGPUs simply aren't used or at least not more than a 2D desktop, it makes sense IMHO not to hobble performance by having a reduced cache' and use it instead. Then again what do I know. lol
 
Next AI enabled RAM modules, network cards and sound cards.
Poor old PSU makers cant join the hype train. Or wait maybe they will.
 
AI use cases are far from narrow. We often focus on the possibilities already available, but have difficulties imagining what local (Edge) AI can bring. Secure and private AI can sort, search, and enhance all my texts and family photos.

Games often feel dead and mechanical, but with the help of AI, a lot can be done to increase immersion:
- NPCs and bots can react intelligently and unpredictably.
- Animals can exhibit varied, believable behavior.
- Hearing the same "I took an arrow to the knee" for the fifth time kills immersion. With Large Language Models (LLMs), this can be a thing of the past, along with NPCs moving around and acting with purpose.
- AI voice and sound effect generation for NPCs.
- Natural and varied movement, making it possible to slip on a banana without needing to program it.

I can run LLM AI models on my local machine with my GPU, but to have that option in games, we need to be able to offload the tasks.
Where do we need extra speed today? The reason some games are CPU-restricted today often stems from the need for NPC/AI calculations (Baldur's Gate...). I will take 500% AI enhancement on the CPU over 5% generic.
That's a use of AI that would be great. I remember being very disappointed with Oblivion when it came out.
I started playing the game, locked in the cell. Banged on the door. Guard says "if you keep doing that, blah blah blah" I kept doing it and nothing. The same old repeated lines. It felt like a huge let down within minutes of starting the game.
I realised that while it looked good graphically for the time, games hadn't moved on in the area that I most wanted them to. Which was with intelligent interactions between npc's and myself.
 
That's a use of AI that would be great. I remember being very disappointed with Oblivion when it came out.
I started playing the game, locked in the cell. Banged on the door. Guard says "if you keep doing that, blah blah blah" I kept doing it and nothing. The same old repeated lines. It felt like a huge let down within minutes of starting the game.
I realised that while it looked good graphically for the time, games hadn't moved on in the area that I most wanted them to. Which was with intelligent interactions between npc's and myself.
Don't hold your breath on that one. From what I understand even the most "advanced" NPC AI cheat like crazy and are much simpler than most users realize. The problem isn't specific AI data processing, but more keeping all the different NPCs updated in real time. It's the major reason why Morrowind would eventually grind to halt. Most users thought the graphic slowdown was due to poor programing. Instead it was due to the single core CPUs of the time just getting overwhelmed trying to keep track of all the NPC/reputation data.
 
That actually is a fair use for this; it could be done on the GPU, but this is an integrated GPU, it's no 4090, so that actually would be nice to offload game-related AI from the CPU without offloading it to the GPU, which may already be rather busy running a AAA game.

Age of Conan had that beat -- they kind of ran out of cash in the middle and (sensibly) figured they better ship what they have rather than go bankrupt (good decision, they sold plenty of copies and indeed stayed in business.) So it's like the first couple hours of gameplay had unique towns, NPCs with repetitive but well-voice-acted voiceovers. Then you went past that, they had plenty of quests and things to do, but the towns were the same boilerplate layout with the exact same buildings, same NPCs, and no voiceovers (just text.)
 
Back