This is a PyTorch implementation of 'SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection'.
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html
(Choose the suitable version for your machine.)- Clone this repository.
- Install dependency packages via
pip install -r requirements.txt
. - Install AV-HuBert and face-alignment
git submodule init git submodule update
- Install Fairseq
cd av_hubert git submodule init git submodule update cd fairseq pip install --editable ./
- Install FFmpeg. We use version=4.2.2.
- Put the
modification/retinaface
inpreprocessing/face-alignment/face_alignment/detection
Copy thecp -r modification/retinaface preprocessing/face-alignment/face_alignment/detection
modification/landmark_extract.py
topreprocessing/face-alignment/landmark_extract.py
cp modification/landmark_extract.py preprocessing/face-alignment
- Follow the links below to download the datasets (you will be asked to fill out some forms before downloading):
- FaceForensics++ (Download the audio according to the youtube ids and extract audio clips using the frame numbers that can obtained by downloading the 'original_youtube_videos_info'. )
- FakeAVCeleb
- KoDF
- Place the videos in the corresponding directories.
your_dataset_root |--FaceForensics |--c23 |--Deepfakes |--videos |--000.mp4 |--FakeAVCeleb |--videos |--RealVideo-RealAudio |--Africa |--man
- The directory structure of FaceForensics++:
your_dataset_root/FaceForensics/{compression}/{categories}/videos/{video}
, wherecategorise
isreal
,fake/Deepfakes
,fake/FaceSwap
,fake/Face2Face
orfake/NeuralTextures
.compression
isc0
,c23
orc40
. The test videos we used in our experiments are given indata/datasets/FaceForensics/test_list.txt
. - The directory structure of FakeAVCeleb:
your_dataset_root/FakeAVCeleb/videos/{categories}/{ethnic}/{gender}/{id}/{video}
, wherecategories
includesRealVideo-RealAudio
,RealVideo-FakeAudio
,FakeVideo-RealAudio
andFakeVideo-FakeAudio
. For example,your_dataset_root/FakeAVCeleb/videos/RealVideo-RealAudio/African/men/id00076/00109.mp4
. - The directory structure of KoDF:
your_dataset_root/KoDF/videos/{categories}/{id}/{video}
, wherecategories
includesoriginal_videos
,audio-driven
,dffs
,dfl
andfo
(The videos we downloaded infsgan
do not contain audio, so we couldn't test them). The test videos we used in our experiments are given indata/datasets/KoDF/test_list.txt
- The directory structure of FaceForensics++:
- Detect the faces and extract 68 face landmarks. Download the RetinaFace pretrained model,
and put it to
checkpoints/Resnet50_Final.pth
. Runpython preprocessing/face-alignment/landmark_extract.py --video_root $video_root --file_list $file_list --out_dir $out_dir
- $video_root: root directory of videos.
- $file_list: a txt file containing names of videos. We provide the filelists in the
data/datasets/
directory. - $out_dir: directory for saving landmarks.
- To crop the mouth region from each video, run
python preprocessing/align_mouth.py --video_root $video_root --file_list $file_list --landmarks_dir $landmarks_dir --out_dir $out_dir
- $out_dir: directory for saving cropped mouth videos.
-
Download the pretrained Audio-Visual Speech Representation model here. And put it to
checkpoints/large_vox_iter5.pt
. -
To evaluate on different datasets, run
python evaluation/evaluate.py --video_root $video_root --file_list $file_list --mouth_dir $cropped_mouth_dir
The AUC scores of different forgery datasets are shown in below:
FaceForensic++ FakeAVCeleb KoDF 97.6% 99.0% 91.7%