Experimental AI tool lets you morph images with a simple click and drag workflow

nanoguy

Posts: 1,355   +27
Staff member
In brief: Whether you love them or hate them, generative AI tools like ChatGPT and Stable Diffusion are here to stay and evolving at a rapid pace. Researchers have been working on new implementations that are slowly coming into focus, such as a new tool called DragGAN that looks like Photoshop's Warp tool on steroids.

By now even the most casual followers of tech news are familiar with generative AI tools like ChatGPT, Stable Diffusion, Midjourney, and DALL-E. Big Tech is racing to develop the best large language models and bake them into every piece of software or web service we use, and a flurry of startups are working on specialized AI tools for a wide variety of niche use cases.

Many of these tools can generate useful images or text using simple prompts that describe what the user wants to find out or the kind of work they're trying to achieve. When it works, this makes services like ChatGPT and DALL-E seem like magic. When it doesn't, we get reminded of how far we are from AI replacing human creativity, if ever. In fact, many of these tools are "trained" on works authored by people and require human supervision to improve their output to a meaningful level.

That said, new AI research shows that progress is still being made at a rapid pace, particularly in the area of image manipulation. A group of scientists from Google, MIT, the University of Pennsylvania, and the Max Planck Institute for Informatics in Germany have published a paper detailing an experimental tool that could make image editing easier and more accessible for regular people.

To get an idea of what is possible with the new tool, you can significantly change the appearance of a person or an object by simply clicking and dragging on a particular feature. You can also do things like altering the expression on someone's face, modifying the clothing of a fashion model, or rotating the subject in a photo as if it were a 3D model. The video demos are certainly impressive, though the tool isn't available to the public as of writing this.

This may just look like Photoshop on steroids, but it has generated enough interest to send the research team's website crashing. After all, text prompts may sound simple in theory, but they require a lot of tweaking when you need something very specific or require multiple steps to generate the desired output.

This problem has given rise to a new profession – that of the "AI prompt engineer." Depending on the company and the specifics of the project in question, this kind of job can pay up to $335,000 per year, and it doesn't require a degree.

By contrast, the user interface presented in the demo videos suggests it will soon be possible for the average person to do some of what an AI prompt engineer can do by just clicking and dragging on the first output of any image generation tool. Researchers explain that DragGAN will "hallucinate" occluded content, deform an object, or modify a landscape.

Researchers note that DragGAN can morph the content of an image in just a few seconds when using Nvidia's GeForce RTX 3090 graphics card, as their implementation doesn't need to use multiple neural networks to achieve the desired results. The next step will be to develop a similar model for point-based editing of 3D models. Those of you who want to find out more about DragGAN can read the paper here. The research will also be presented at SIGGRAPH in August.

Permalink to story.

 
Don't expect that you will write prompts for 300k. There will be some next gen software like unreal, photoshop, blender, maya, max etc that will integrate specialised generative models and that will pay well . But in general very few people will be able to do much more ... and loads of small creative jobs will vanish in thin air.
 
Imagine what this type of GPU power would do if, in years to come, it would just resemble a tiny portion of a die, even a core itself on a bigger GPU than we have now. Bring it on in complex games, where AI could manipulate entire scenes, events, crowds, AI itself.


"Researchers note that DragGAN can morph the content of an image in just a few seconds when using Nvidia's GeForce RTX 3090 graphics card, as their implementation doesn't need to use multiple neural networks to achieve the desired results"
 
It's stuff like this which will convince me to buy the RTX 4090. So expensive, but that 24 Gigs of VRAM (which is still not enough for state of the art) enables so many of these generative AI tools to be run locally. Only so much you can do with 12 or so gigs.
 
Don't expect that you will write prompts for 300k. There will be some next gen software like unreal, photoshop, blender, maya, max etc that will integrate specialised generative models and that will pay well . But in general very few people will be able to do much more ... and loads of small creative jobs will vanish in thin air.
The most obvious future right there.
 
I've got to admit that AI can & will be an incredible assistive tool in humanites near future. Providing that during this development stage of baby steps there should be some moral baseline of rules like the 3 laws or something. Then proceed from there with these evolutionary experiments.
This should be a global agreement. No military of any country should integrate AI with defense drones or silos.
Cause the last thing any of us want is to end up as batteries in the end...

Think happy thoughts,
 
Back