What just happened? For all the advancements that AI has made in the past six months or so, we're yet to really see its full potential used in games. But at Computex 2023, Nvidia boss Jensen Huang gave us a glimpse at what could be the future of gaming.
Jensen unveiled Nvidia Avatar Cloud Engine (ACE) for Games during his Computex keynote speech, a custom AI service that Nvidia says brings intelligence to non-playable characters through AI-powered natural language interactions.
Huang said that ACE for Games enables audio-to-facial-expression, text-to-speech, and natural language conversations. Referring to the latter, the CEO said it was "basically a large language model."
ACE for Games allows an NPC to listen to a player's conversation, which they can input using their own voice, and generate a response – no canned lines that keep getting repeated. The system can also animate a character's face so it matches the generated words they are speaking.
Huang demonstrated the technology in action via a real-time Unreal Engine 5-powered demo, designed by Convai, called Kairos. The very Cyberpunk 2077-like clip shows a player walking into a ramen shop and speaking to NPC Jin. The player is heard asking questions with his voice and receiving answers that are within the context of the story and character.
The dialogue is pretty dry and stiff, but it's still impressive technology. It's easy to imagine what ACE for Games will be like once it's been refined a bit more.
You can see another example of Convai's work in the video below.
Nvidia explained that ACE for Games builds on Nvidia Omniverse and offers access to three components. First is Nvidia NeMo, which is used for building, customizing and deploying language models. It has a feature called NeMo Guardrails that can protect against users having "unsafe" conversations, something that will likely be needed when it's applied to video games.
Another component is Nvidia Riva, used for automatic speech recognition and text-to-speech so players can have live conversations via a microphone.
The final element is Nvidia Omniverse Audio2Face. This component is what allows the characters' facial animations to match the words they're speaking. The technology is already being used in upcoming games STALKER 2: Heart of Chernobyl and Fort Solis.
"The neural networks enabling Nvidia ACE for Games are optimized for different capabilities, with various size, performance, and quality trade-offs. The ACE for Games foundry service will help developers fine-tune models for their games, then deploy via Nvidia DGX Cloud, GeForce RTX PCs or on premises for real-time inferencing," Nvidia says. "The models are optimized for latency – a critical requirement for immersive, responsive interactions in games."
Huang didn't say what the requirements were to use ACE for Games, but they're likely to be pretty hefty in its current form.
There's still a lot of room for improvement in the tech, but ACE for Games could be the first step toward a future where players can ask NPCs any question they like, as long as it's related to the game, and receive the sort of answer they were seeking, not a canned response. The idea of AI-controlled teammates who are human-like in their dialogue and the way they follow spoken commands is also an interesting one.