Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion src/torchcodec/_core/FFMPEGCommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,15 @@ int getNumChannels(const UniqueAVFrame& avFrame) {
(LIBAVFILTER_VERSION_MAJOR == 8 && LIBAVFILTER_VERSION_MINOR >= 44)
return avFrame->ch_layout.nb_channels;
#else
return av_get_channel_layout_nb_channels(avFrame->channel_layout);
int numChannels = av_get_channel_layout_nb_channels(avFrame->channel_layout);
// Handle FFmpeg 4 bug where channel_layout is 0 or unset
// Set it to the default layout for the number of channels
// to allow successful initialization of SwrContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I think the code has slightly changed since the comment was written, so this edit may be closer to what's happening (feel free to edit!)

Suggested change
// Handle FFmpeg 4 bug where channel_layout is 0 or unset
// Set it to the default layout for the number of channels
// to allow successful initialization of SwrContext
// Handle FFmpeg 4 bug where channel_layout and numChannels are 0 or unset
// Set them based on avFrame->channels which appears to be correct
// to allow successful initialization of SwrContext

if (numChannels == 0 && avFrame->channels > 0) {
avFrame->channel_layout = av_get_default_channel_layout(avFrame->channels);
return avFrame->channels;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do this instead, so we can have a single return statement

Suggested change
return avFrame->channels;
numChannels = avFrame->channels;

}
return numChannels;
#endif
}

Expand Down
24 changes: 14 additions & 10 deletions src/torchcodec/_core/SingleStreamDecoder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1276,16 +1276,20 @@ void SingleStreamDecoder::convertAudioAVFrameToFrameOutputOnCPU(
int outSampleRate =
streamInfo.audioStreamOptions.sampleRate.value_or(srcSampleRate);

int srcNumChannels = getNumChannels(streamInfo.codecContext);
TORCH_CHECK(
srcNumChannels == getNumChannels(srcAVFrame),
"The frame has ",
getNumChannels(srcAVFrame),
" channels, expected ",
srcNumChannels,
". If you are hitting this, it may be because you are using "
"a buggy FFmpeg version. FFmpeg4 is known to fail here in some "
"valid scenarios. Try to upgrade FFmpeg?");
Copy link
Contributor

@NicolasHug NicolasHug Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still have this check, but perhaps relax it if getNumChannels(srcAVFrame) returns 0 while getNumChannels(streamInfo.codecContext) returns something < 1 (which is what is happening in #843

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added back the check, and excluded the case where getNumChannels(srcAVFrame) returns 0 while getNumChannels(streamInfo.codecContext) returns a valid fallback.

int srcNumChannels = getNumChannels(srcAVFrame);
int srcNumChannelsFromCodec = getNumChannels(streamInfo.codecContext);
// Use number of channels from codec if 0 returned from frame
if (srcNumChannels == 0 && srcNumChannelsFromCodec > 0) {
srcNumChannels = srcNumChannelsFromCodec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we still need this if branch, now that we are returning the hopefully-correct avFrame->channels instead of the incorrect numChannels in getNumChannels().

Can you double check if that's still needed? Perhaps we can leave the entire TORCH_CHECK() the way it was!

} else {
// Check if the number of channels from the frame matches the codec
TORCH_CHECK(
srcNumChannels == srcNumChannelsFromCodec,
"The frame has ",
srcNumChannels,
" channels, expected ",
srcNumChannelsFromCodec);
};
int outNumChannels =
streamInfo.audioStreamOptions.numChannels.value_or(srcNumChannels);

Expand Down
Binary file not shown.
25 changes: 12 additions & 13 deletions test/test_decoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -1682,26 +1682,25 @@ def test_downsample_empty_frame(self):
frames_44100_to_8000.data, frames_8000.data, atol=0.03, rtol=0
)

def test_s16_ffmpeg4_bug(self):
# s16 fails on FFmpeg4 but can be decoded on other versions.
# Debugging logs show that we're hitting:
# [SWR @ 0x560a7abdaf80] Input channel count and layout are unset
# which seems to point to:
# https://github.com/FFmpeg/FFmpeg/blob/40a6963fbd0c47be358a3760480180b7b532e1e9/libswresample/swresample.c#L293-L305
# ¯\_(ツ)_/¯
def test_decode_s16_ffmpeg4(self):
# Non-regression test for https://github.com/pytorch/torchcodec/issues/843
# Ensures that decoding s16 on FFmpeg4 handles
# unset input channel count and layout

asset = SINE_MONO_S16
decoder = AudioDecoder(asset.path)
assert decoder.metadata.sample_rate == asset.sample_rate
assert decoder.metadata.sample_format == asset.sample_format

cm = (
pytest.raises(RuntimeError, match="The frame has 0 channels, expected 1.")
if get_ffmpeg_major_version() == 4
else contextlib.nullcontext()
test_frames = decoder.get_samples_played_in_range()
assert test_frames.data.shape[0] == decoder.metadata.num_channels
assert test_frames.sample_rate == decoder.metadata.sample_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the term "samples" instead of frame for the stuff that comes out of the decoder. For the asset, the use of

reference_frames = asset.get_frame_data_by_range

is still correct since this is really a frame-based API.

reference_frames = asset.get_frame_data_by_range(
start=0, stop=1, stream_index=0
)
torch.testing.assert_close(
test_frames.data[0], reference_frames, atol=0, rtol=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by the use of [0] here - can you help me understand why that's needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decoded audio has the shape [1, 64000], where the first dimension is the number of channels (1, in the case of SINE_MONO_S16), and the second dimension is the samples.

The checked in tensor is returned by get_frame_data_by_index in utils.py, which returns only the samples at index=1 for the one stream_index.

So to make the comparison, the test must use only the samples from the decoded audio at [0].

Copy link
Contributor

@NicolasHug NicolasHug Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I think eventually we may want to update get_frame_data_by_index to always return 2D audio samples, but that's not urgent. Using [0] like yo did is fine.

)
with cm:
decoder.get_samples_played_in_range()

@pytest.mark.parametrize("asset", (NASA_AUDIO, NASA_AUDIO_MP3))
@pytest.mark.parametrize("sample_rate", (None, 8000, 16_000, 44_1000))
Expand Down
Loading