As the use of generative AI continues to evolve across different media and entertainment (M&E) applications, a team of researchers from Stanford University, UC Berkeley and Adobe Research recently unveiled an AI-aided model that can realistically integrate specific individuals into various scenes.
To realistically insert the humans into the scenes, special consideration was given to their convincing alignment with the surrounding environment, through a method called affordance awareness. Being “affordance-aware” refers to the ability to perceive and understand the potential actions and functionalities provided by objects, environments, or systems – how to sit in a chair, for example, or lean on a balustrade.
The team addressed this challenge by developing a “conditional diffusion” model, which was trained in a self-supervised manner using video data while following specific rules or instructions.
The goal of a conditional diffusion model is to generate an image that follows the instructions given as input conditions. The model was trained with videos featuring humans in various scenes. The effectiveness of this approach was demonstrated through a series of results showcasing the successful insertion of humans into different scenes.
“Given the learned human-scene composition, our model can also hallucinate realistic people and scenes when prompted without conditioning and also enables interactive editing,” the report noted.
The team also conducted detailed ablation studies – analysis where certain components or factors of a system were systematically removed or disabled to analyze the impacts of various design choices made during the development process.
Possible practical applications for this model include enhancing image editing capabilities in creative software tools for artists and media creators, as well as enabling users to seamlessly insert individuals into their photographs using photo editing smartphone applications.
The researchers themselves plan to explore the possibility for improved control over various generated poses and attempt to create realistic human movements within scenes. Other goals for further research include enhanced model efficiency, and expanding the model’s capabilities to include objects other than humans.
“While the most recent developments in AI take us closer to replicating human activities around actual creation, like writing articles, producing songs, and creating video scenes, these creations still require human intervention and creativity,” says Scott Purdy, KPMG U.S. media industry leader.
He cautions there are several risks around AI as the technology progresses around voice, image and video.
“The evolution of deep fakes and other visual manipulations could create significant opportunities, but also serious risks,” he explains. “With AI, the saying ‘seeing is believing’ may no longer be true.”
He adds although we are in the early innings of AI, newspapers have been using technology to write headlines and articles, especially around sports results and earnings calls, as those are more routine.
“In the broader scheme of things, we are just beginning to tap into the use cases of AI in the creative process,” Purdy says.
Building a strategy for taking advantage of the new opportunities created by the quickly evolving AI capabilities and the access to tools that have these capabilities integrated and easily accessible is vital for all participants in the media and entertainment (M&E) industry.
Plamen Minev, technical director of AI and cloud at Quantum, says although multiple AI-enabled M&E-focused services have been available for a while, only a few teams, companies or higher learning institutions have been using them.
“The main reasons were the quality and performance of the technology combined with the overall difficulty to access and the associated cost,” he explains. “There was also a significant lack of talent who understood and were comfortable with AI.”
Now however, the industry is experiencing a rapid AI democratization process which makes access to AI technology much easier.
“Multiple companies have already integrated or are working on integrating AI services inside their tools, and accessing the functionality no longer requires advanced technical skills,” he says.
Miney notes the recent release of products like ChatGPT and Stable Diffusion created enormous excitement in the industry, and now AI is part of every conversation.
“The significant progress made with the latest AI models with almost perfect accuracy and various deployment options answered many of the previously critical concerns,” he says. “It made the users not just accept, but strongly demand easy access to AI.”
He adds that M&E is one of the top, if not the top, industries disrupted by the latest AI technology which understandably brings with it intense technology and social conversations.