Outsmarting AIGC, use AI to beat AI?
Generative AI has brought innovation, fun, and convenience to our day to day life, but also increased frauds and scams that hurt our society: AI-generated videos become harder to detect. This article talks about methods for AI models identifying AIGC video, e.g. digital image fingerprints, frame consistency, and some new methods based on diffusion models.


Generative AI tools have brought ease, fun, and productivity to our everyday lives, yet they have also contributed to a surge in fraud and scams that significantly harm both consumers and businesses. Deloitte’s Center for Financial Services predicts that generative AI could increase fraud losses from $12.3 billion to $40 billion in the United States by 2027.
As AI experts in the fraud detection space, we understand this battle is tough, but it’s a challenge we must face. So, let’s discuss the current state of defeating AI-generated videos.
Back in 2017, when deepfake first emerged, the industry began to raise concerns about the dangers of AI-generated video. By 2018, relatively mature detection algorithms for AI face-swapping were already in place. Interestingly, one effective method back then was to observe whether the person in the video blinked or not.
However, by 2024, it’s clear this approach is outdated. Now, we need to consider more factors to consider when spotting AI video content. There are many articles that teach us to identify the suspicious video generated from AI, for example:
Spot weird movements or expressions as AI-made videos might have overly smooth or stiff movements, missing the subtle complexity of human behavior
Look for where audio and video do not match; abnormal textures and lighting, objects appearing/disappearing/morphing, and more.
Although these are great tips to help people better identify AI-generated videos, with the vast volume of videos on the internet, manual identification alone won’t be able to keep up. So tech companies and researchers are trying AI models to detect AIGC: train AI models to identify uncommon aspects in a video to flag suspicious AIGC videos. Unusual aspects in a video reside in pixel-level details that are invisible to the human eye, but can be detected by AI. We know that both AI-generated images and videos are created using diffusion models, and videos are simply a sequence of images over time. Therefore, detecting AIGC videos can begin with analyzing each frame. In face-swapping videos, the AI-synthesized segments are stitched frame by frame into the original video. Different videos have their own distinct lighting and textures, "digital image fingerprints." The digital image fingerprints at the stitching edges are noticeably different, and AI models can easily detect these discrepancies.
However, this method is not effective against entirely AI-generated videos. In fact, applying methods for detecting synthetic images to synthetic videos results in a significant drop in accuracy—from 90%+ to around 60%. This is because "the forensic traces in synthetic videos are substantially different from those in synthetic images". Tuning AI models for specific AI video generators can improve accuracy, but such methods have poor transformability: detecting videos from a new generator would lead to poor results. (source)
Since independent frame detection is unsatisfactory, researchers began exploring clues between frames. It shows that AI-generated videos are created frame by frame without continuity. This leads to issues like jitter or flicker during rapid facial changes and unnatural expressions—because digital generation is uniform, while human motion is not. These inconsistencies between frames can serve as an indicator for AI video detection. Additionally, AI-generated videos often exhibit erroneous traits in appearance patterns, projection proportions, spatial relations, volumes and quantities. It’s like watching with one eye: losing depth perception. These can also be used as indicators for AI video detectors: you can train separate classifiers and ensemble them to detect AI-generated videos.
The methods mentioned so far focus on identifying differences in the images and videos themselves. There’s also a super interesting and creative approach: If you feed both AI-generated and human-made videos into diffusion models, you’ll find that the output from AI-generated content is quite similar to the inputs, whereas for human-made content, the outputs are not similar. ‘The framework is based on the idea that AI generation tools create content based on the statistical distribution of large data sets, resulting in more "statistical means" content such as pixel intensity distributions, texture patterns, and noise characteristics in video frames, subtle inconsistencies or artifacts that change unnaturally between frames, or unusual patterns that are more likely in diffusion-generated videos than in real ones. … In contrast, human video creations exhibit individuality and deviate from the statistical norm.‘ This is a new method recently published by Columbia’s researchers: DIRE (Diffusion Reconstruction Error) to detect diffusion-generated images. .
In conclusion, the creation of AI-generated videos is advancing much faster than the development of AI-content detectors and is far ahead of regulatory compliance. Besides continuous technological innovation, experimentation, and development in detecting AI-generated videos, it’s crucial for tech forces, businesses, and social media platforms to collaborate in preventing the harm caused by AI-generated content. This could include AI-content labeling and sharing the origin of different types of media. Hopefully, we can develop more affordable and effective tools for detecting AI-generated content, making it easier for everyone to avoid being deceived by fake videos and information.