AMD promotes third-party app for running AI chatbots on local hardware (that works with...

Alfonso Maruccia

Posts: 1,025   +302
Staff
Forward-looking: While Big Tech corporations are developing server-based AI services that live exclusively in the cloud, users are increasingly interested in trying chatbot interactions on their own local PCs. AMD says there's an app for that, and it can even work with third-party GPUs or AI accelerators.

The most popular AI services available today run almost exclusively on powerful Nvidia hardware, and they force customers to use an internet connection. AMD is trying to promote an alternative approach to the chatbot experience based on LM Studio, a tool designed to download and run large-language models (LLM) in a local environment.

AMD's official blog highlights how AI assistants are becoming essential resources for productivity or for just brainstorming new ideas. With LM Studio, people interested in trying these new AI tools can easily discover, download and run local LLMs with no need for complex setups, proper programming knowledge or data center-level infrastructure.

AMD provides detailed instructions for downloading and running the correct LM Studio version based on the user's hardware and operating system, including Linux, Windows, or macOS. The program can seemingly work on Ryzen processors alone, even though minimum hardware requirements include a CPU with native support for AVX2 instructions. The system must have at least 16GB of DRAM, and the GPU should be equipped with a minimum of 6GB of VRAM.

Owners of Radeon RX 7000 GPUs are advised to get the ROCm technical preview of LM Studio. ROCm is AMD's new open-source software stack for optimizing LLMs and other AI workloads on the company's GPU hardware. After installing the right version of LM Studio, users can search an LLM model to download and run on their local PC. AMD suggests Mistral 7b or LLAMA v2 7b, which can be found by searching for 'TheBloke/OpenHermes-2.5-Mistral-7B-GGUF' or 'TheBloke/Llama-2-7B-Chat-GGUF' respectively.

Once LM Studio and some LLM models are properly installed, users need to select the right quantization model. Q4 K M is recommended for most Ryzen AI chips. Owners of Radeon GPUs also need to enable the "GPU Offload" option in the application, otherwise the chosen LLM model will likely run (very slowly) on CPU computational power alone.

By promoting LM Studio as a third-party tool to run local LLMs, AMD is trying to close the gap with Nvidia and its recently announced Chat with RTX solution. Nvidia's proprietary application runs exclusively on GeForce RTX 30 or 40 GPU hardware, while LM Studio provides a more agnostic approach by supporting both AMD and Nvidia GPUs or even most fairly modern, AVX2-equipped generic PC processors.

Permalink to story.

 
Not sure why this is considered controversial at all because LM Studio is the leading product (well, free software) in this space by a mile. Chat with RTX is a poor, extremely limited attempt at mimicking LM Studio, so I'm glad AMD just decided to provide a guide for the best thing anyway.
GGUF formatted models are extremely flexible and can use combos of your GPU, NPU, and CPU. If you only have CPU resources to spare, it can do that too. It's very easy to use once you grasp the basic principles, but you can spend weeks optimizing if you really like that too (me).
 
AMD tactics;

Look at Nvidia, try to copy, release rushed attempt, repeat

... You don't get ahead by copying others ...
 
AMD tactics;

Look at Nvidia, try to copy, release rushed attempt, repeat

... You don't get ahead by copying others ...
Very true, and it's worthwhile to note AMD's one smashing success came when it refused to copy Intel's revolutionary IA-64, and instead released its evolutionary X86-64.
 
Back