Windows ML debuts for all developers, enabling AI apps to run directly on local hardware

Alfonso Maruccia

Posts: 2,561   +951
Staff
TL;DR: While most third-party generative AI services focus on remotely managed models and cloud infrastructure, platform holders are moving quickly to bring AI directly to local machines. Microsoft is now officially introducing its own technology designed to help developers build optimized AI applications that can run across millions of PCs.

After its original introduction at Build 2025, Microsoft has now made Windows ML generally available to all developers targeting Windows 11 24H2. The technology provides a runtime with built-in AI inferencing capabilities, optimized for on-device model execution. Microsoft hopes the solution will help accelerate the flood of AI products now appearing on virtually every (Windows) screen.

Windows ML is designed to make it significantly easier to develop AI tools that run locally. Acting as a new hardware abstraction layer, it can execute AI applications on the appropriate acceleration hardware – whether CPU, GPU, or NPU. Windows ML also serves as the foundation of Windows AI Foundry, Microsoft's broader platform for managing AI workloads on Windows.

The Windows ML HAL automatically selects the correct "Execution Provider" for the acceleration hardware in a user's PC. Acting as a bridge between the runtime and the application ecosystem, it can dynamically download the most suitable EP during execution. By automatically detecting hardware, Windows ML reduces software overhead, eliminating the need for AI applications to bundle all their runtime or EP components to support a wide range of devices.

Microsoft also highlighted the broader compatibility offered by Windows ML, noting that the HAL technology will receive ongoing silicon support. Developers can target specific power profiles, opting for a low-power approach via the NPU or a more performance-focused option using the GPU.

The company said it collaborated closely with major silicon partners AMD, Intel, Nvidia, and Qualcomm. As a result, Windows ML should be capable of optimizing offline generative AI experiences across the latest x86 CPUs and NPUs, Arm-based Snapdragon X series SoCs, and high-performance discrete GPUs from Nvidia and AMD.

Microsoft also named several software vendors already adopting Windows ML for in-app inferencing workloads. Companies such as Adobe, McAfee, Reincubate, and Wondershare plan to use the new abstraction layer to accelerate real-time search, optimize local video streaming, and improve automatic detection of deepfake videos and other scams.

Permalink to story:

 
If your goal as an enthusiast or professional is to reduce inference overhead and compute like a grown up you'll want to get rid of Windows entirely. That is, unless you enjoy your role as an unpaid Microsoft employee, for whom you've now built this endpoint out of pocket to volunteer your telemetry information.
 
Is this a half-baked API like Direct ML, or just an interface that brings together all APIs such as Vulkan, ROCm, and CUDA into a deterministic model?
 
Might be useful for software developers (or rather companies) who want local models to be providing code completion/assistance rather than cloud provided ones. Outside of that I see a hodgepodge of slow adoption since most companies would rather charge you $10-20/month for their AI "service", which running locally doesn't serve (though to be fair, state of the art models won't be expected to run locally on most consumer hardware for some time. Perhaps the 7B parameter size class can, but there's a lot that size class simply can't do.)
 
Is this a half-baked API like Direct ML, or just an interface that brings together all APIs such as Vulkan, ROCm, and CUDA into a deterministic model?
iu
 
Back