hwertz
Posts: 517 +273
Re: Itanium. When I was a student at the U of I, the engineering department got an Itanium to replace their HP Superdome (my understanding is this was to bolster Itanium sales, they got something like a $200,000 system for like $1000). They found the several-year-old PA-RISC-based Superdome was faster than the Itanium-based Superdome that was intended to replace it! Compiler improvements did indeed help here to some extent (eventually, the Itanium narrowly outran the previous system), but even if they'd doubled or tripled the speed, they would have had to come out with faster Itanium models at a fairly rapid pace to have them keep up with the fastest chips coming out from others.
I will note, the one thing Itanium did REALLY well was knock out many of the competing CPUs -- the two fastest CPU lines on the planet at that point were the PA-RISC and the Alpha, and HP dropped development on both in favor of Itanium; SGI abandoned MIPS development in favor of Itanium; Sun dropped SPARC development in favor of Itanium (Oracle resumed development of SPARC, but given the cost and bespokeness of Oracle systems I don't know anything about these). Basically the CPU lines you had survive the fallout were IBM POWER (IBM was not interested in Itanium), ARM (which was not a factor in desktops or servers back then), and MIPS (but in embedded systems, not desktops or servers), and of course x86/x86-64 (with AMD coming out with the first x86-64 chips.. thus the architecture being referred to as "amd64" in some Linux distros... since Intel was assuming their 64-bit chip would be Itanium...).
As for the new multi-threading technique -- neat! And the nice thing is, it sounds like something that could be added to tensorflow, pytorch, and things like this where the user may not have to set up that much at all. Split up jobs between AI accelerator (if any), GPU and CPU rather than just pushing EVERYTHING onto the GPU and leave your sometimes quite powerful CPUs nearly idle; or onto the AI accelerator leaving both GPU and CPU largely idle as the case may be.
Having used a Tegra K1, I can say the GPU on there is about the speed of an Nvidia GTX650 (I had a K1 and a GTX650 and they were about dead even.) And the quad-core ARM, you know about how fast that is (not very fast but not terrible, more or less.) You might have trouble with "Amdahl's law" (some single-threaded portion of code making it slow & difficult to dispatch stuff fast enough to many CPU, GPU, and AI cores), but even if some dispatch thread is not fast enough to fully feed all GPU and CPU cores, it's still going to be a bit faster to feed GPU and *some* CPU cores than just run on the GPU alone.
I will note, the one thing Itanium did REALLY well was knock out many of the competing CPUs -- the two fastest CPU lines on the planet at that point were the PA-RISC and the Alpha, and HP dropped development on both in favor of Itanium; SGI abandoned MIPS development in favor of Itanium; Sun dropped SPARC development in favor of Itanium (Oracle resumed development of SPARC, but given the cost and bespokeness of Oracle systems I don't know anything about these). Basically the CPU lines you had survive the fallout were IBM POWER (IBM was not interested in Itanium), ARM (which was not a factor in desktops or servers back then), and MIPS (but in embedded systems, not desktops or servers), and of course x86/x86-64 (with AMD coming out with the first x86-64 chips.. thus the architecture being referred to as "amd64" in some Linux distros... since Intel was assuming their 64-bit chip would be Itanium...).
As for the new multi-threading technique -- neat! And the nice thing is, it sounds like something that could be added to tensorflow, pytorch, and things like this where the user may not have to set up that much at all. Split up jobs between AI accelerator (if any), GPU and CPU rather than just pushing EVERYTHING onto the GPU and leave your sometimes quite powerful CPUs nearly idle; or onto the AI accelerator leaving both GPU and CPU largely idle as the case may be.
Having used a Tegra K1, I can say the GPU on there is about the speed of an Nvidia GTX650 (I had a K1 and a GTX650 and they were about dead even.) And the quad-core ARM, you know about how fast that is (not very fast but not terrible, more or less.) You might have trouble with "Amdahl's law" (some single-threaded portion of code making it slow & difficult to dispatch stuff fast enough to many CPU, GPU, and AI cores), but even if some dispatch thread is not fast enough to fully feed all GPU and CPU cores, it's still going to be a bit faster to feed GPU and *some* CPU cores than just run on the GPU alone.