TL;DR: While most third-party generative AI services focus on remotely managed models and cloud infrastructure, platform holders are moving quickly to bring AI directly to local machines. Microsoft is now officially introducing its own technology designed to help developers build optimized AI applications that can run across millions of PCs.

After its original introduction at Build 2025, Microsoft has now made Windows ML generally available to all developers targeting Windows 11 24H2. The technology provides a runtime with built-in AI inferencing capabilities, optimized for on-device model execution. Microsoft hopes the solution will help accelerate the flood of AI products now appearing on virtually every (Windows) screen.

Windows ML is designed to make it significantly easier to develop AI tools that run locally. Acting as a new hardware abstraction layer, it can execute AI applications on the appropriate acceleration hardware – whether CPU, GPU, or NPU. Windows ML also serves as the foundation of Windows AI Foundry, Microsoft's broader platform for managing AI workloads on Windows.

The Windows ML HAL automatically selects the correct "Execution Provider" for the acceleration hardware in a user's PC. Acting as a bridge between the runtime and the application ecosystem, it can dynamically download the most suitable EP during execution. By automatically detecting hardware, Windows ML reduces software overhead, eliminating the need for AI applications to bundle all their runtime or EP components to support a wide range of devices.

Microsoft also highlighted the broader compatibility offered by Windows ML, noting that the HAL technology will receive ongoing silicon support. Developers can target specific power profiles, opting for a low-power approach via the NPU or a more performance-focused option using the GPU.

The company said it collaborated closely with major silicon partners AMD, Intel, Nvidia, and Qualcomm. As a result, Windows ML should be capable of optimizing offline generative AI experiences across the latest x86 CPUs and NPUs, Arm-based Snapdragon X series SoCs, and high-performance discrete GPUs from Nvidia and AMD.

Microsoft also named several software vendors already adopting Windows ML for in-app inferencing workloads. Companies such as Adobe, McAfee, Reincubate, and Wondershare plan to use the new abstraction layer to accelerate real-time search, optimize local video streaming, and improve automatic detection of deepfake videos and other scams.