-
Notifications
You must be signed in to change notification settings - Fork 80
Add performance tips tutorial #1065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mollyxu
merged 23 commits into
meta-pytorch:main
from
mollyxu:performance-tips-tutorial
Dec 10, 2025
Merged
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
304fdf9
first draft of performance tips tutorial
5693776
modify format
e8b2a73
Merge branch 'meta-pytorch:main' into performance-tips-tutorial
mollyxu 7ac0d2f
Merge branch 'meta-pytorch:main' into performance-tips-tutorial
mollyxu a74f653
address feedback
2286285
Merge branch 'meta-pytorch:main' into performance-tips-tutorial
mollyxu 547d8e5
address feedback
cc737b1
Merge branch 'meta-pytorch:main' into performance-tips-tutorial
mollyxu 9e0f33a
address feedback
b32e6f3
expose cpu_fallback
cf5b718
modify comments
6e69c8c
modify comments
5ac8321
address feedback:
e97490e
switch _.code._get_backend_details() to new api
f353758
Merge branch 'meta-pytorch:main' into cpu-fallback
mollyxu 6a05947
address feedback
52ea290
Merge branch 'main' into cpu-fallback
mollyxu 8b75eac
fix lint
14ad6c7
ffmpeg backend logic
f9e0bd1
update with cpufallback
bddfa7c
add cpufallback
0e52bb6
address feedback
ba317a9
Merge branch 'main' into performance-tips-tutorial
mollyxu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,213 @@ | ||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | ||
| # All rights reserved. | ||
| # | ||
| # This source code is licensed under the BSD-style license found in the | ||
| # LICENSE file in the root directory of this source tree. | ||
|
|
||
| """ | ||
| .. meta:: | ||
| :description: Learn how to optimize TorchCodec video decoding performance with batch APIs, approximate seeking, multi-threading, and CUDA acceleration. | ||
|
|
||
| ============================================== | ||
| TorchCodec Performance Tips and Best Practices | ||
| ============================================== | ||
|
|
||
| This tutorial consolidates performance optimization techniques for video | ||
| decoding with TorchCodec. Learn when and how to apply various strategies | ||
| to increase performance. | ||
| """ | ||
|
|
||
|
|
||
| # %% | ||
| # Overview | ||
| # -------- | ||
| # | ||
| # When decoding videos with TorchCodec, several techniques can significantly | ||
| # improve performance depending on your use case. This guide covers: | ||
| # | ||
| # 1. **Batch APIs** - Decode multiple frames at once | ||
| # 2. **Approximate Mode & Keyframe Mappings** - Trade accuracy for speed | ||
| # 3. **Multi-threading** - Parallelize decoding across videos or chunks | ||
| # 4. **CUDA Acceleration** - Use GPU decoding for supported formats | ||
| # | ||
| # We'll explore each technique and when to use it. | ||
|
|
||
| # %% | ||
| # 1. Use Batch APIs When Possible | ||
| # -------------------------------- | ||
| # | ||
| # If you need to decode multiple frames at once, the batch methods are faster than calling single-frame decoding methods multiple times. | ||
| # For example, :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` is faster than calling :meth:`~torchcodec.decoders.VideoDecoder.get_frame_at` multiple times. | ||
| # TorchCodec's batch APIs reduce overhead and can leverage internal optimizations. | ||
| # | ||
| # **Key Methods:** | ||
| # | ||
| # For index-based frame retrieval: | ||
| # | ||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices | ||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges | ||
| # | ||
| # For timestamp-based frame retrieval: | ||
| # | ||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps | ||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges | ||
| # | ||
| # %% | ||
| # **When to use:** | ||
| # | ||
| # - Decoding multiple frames | ||
|
|
||
| # %% | ||
| # .. note:: | ||
| # | ||
| # For complete examples with runnable code demonstrating batch decoding, | ||
| # iteration, and frame retrieval, see :ref:`sphx_glr_generated_examples_decoding_basic_example.py` | ||
|
|
||
| # %% | ||
| # 2. Approximate Mode & Keyframe Mappings | ||
| # ---------------------------------------- | ||
| # | ||
| # By default, TorchCodec uses ``seek_mode="exact"``, which performs a :term:`scan` when | ||
| # you create the decoder to build an accurate internal index of frames. This | ||
| # ensures frame-accurate seeking but takes longer for decoder initialization, | ||
| # especially on long videos. | ||
|
|
||
| # %% | ||
| # **Approximate Mode** | ||
| # ~~~~~~~~~~~~~~~~~~~~ | ||
| # | ||
| # Setting ``seek_mode="approximate"`` skips the initial :term:`scan` and relies on the | ||
| # video file's metadata headers. This dramatically speeds up | ||
| # :class:`~torchcodec.decoders.VideoDecoder` creation, particularly for long | ||
| # videos, but may result in slightly less accurate seeking in some cases. | ||
| # | ||
| # | ||
| # **Which mode should you use:** | ||
| # | ||
| # - If you care about exactness of frame seeking, use “exact”. | ||
| # - If the video is long and you're only decoding a small amount of frames, approximate mode should be faster. | ||
|
|
||
| # %% | ||
| # **Custom Frame Mappings** | ||
| # ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| # | ||
| # For advanced use cases, you can pre-compute a custom mapping between desired | ||
| # frame indices and actual keyframe locations. This allows you to speed up :class:`~torchcodec.decoders.VideoDecoder` | ||
| # instantiation while maintaining the frame seeking accuracy of ``seek_mode="exact"`` | ||
| # | ||
| # **When to use:** | ||
| # | ||
| # - Frame accuracy is critical, so you cannot use approximate mode | ||
| # - You can preprocess videos once and then decode them many times | ||
| # | ||
| # **Performance impact:** speeds up decoder instantiation, similarly to ``seek_mode="approximate"``. | ||
|
|
||
| # %% | ||
| # .. note:: | ||
| # | ||
| # For complete benchmarks showing actual speedup numbers, accuracy comparisons, | ||
| # and implementation examples, see :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` | ||
| # and :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py` | ||
|
|
||
| # %% | ||
| # 3. Multi-threading for Parallel Decoding | ||
| # ----------------------------------------- | ||
| # | ||
| # When decoding multiple videos or decoding a large number of frames from a single video, there are a few parallelization strategies to speed up the decoding process: | ||
| # | ||
| # - **FFmpeg-based parallelism** - Using FFmpeg's internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames. For that, use the `num_ffmpeg_threads` parameter of the :class:`~torchcodec.decoders.VideoDecoder` | ||
| # - **Multiprocessing** - Distributing work across multiple processes | ||
| # - **Multithreading** - Using multiple threads within a single process | ||
| # | ||
| # You can use both multiprocessing and multithreading to decode multiple videos in parallel, or to decode a single long video in parallel by splitting it into chunks. | ||
|
|
||
mollyxu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # %% | ||
| # .. note:: | ||
| # | ||
| # For complete examples comparing | ||
| # sequential, ffmpeg-based parallelism, multi-process, and multi-threaded approaches, see | ||
| # :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py` | ||
|
|
||
| # %% | ||
| # 4. CUDA Acceleration | ||
| # -------------------- | ||
| # | ||
| # TorchCodec supports GPU-accelerated decoding using NVIDIA's hardware decoder | ||
| # (NVDEC) on supported hardware. This keeps decoded tensors in GPU memory, | ||
| # avoiding expensive CPU-GPU transfers for downstream GPU operations. | ||
| # | ||
| # %% | ||
| # **Recommended: use the Beta Interface!!** | ||
| # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| # | ||
| # We recommend you use the new "beta" CUDA interface which is significantly faster than the previous one, and supports the same features: | ||
| # | ||
| # .. code-block:: python | ||
| # | ||
| # with set_cuda_backend("beta"): | ||
| # decoder = VideoDecoder("file.mp4", device="cuda") | ||
| # | ||
| # %% | ||
| # **When to use:** | ||
| # | ||
| # - Decoding large resolution videos | ||
| # - Large batch of videos saturating the CPU | ||
| # | ||
| # **When NOT to use:** | ||
| # | ||
| # - You need bit-exact results with CPU decoding | ||
| # - Small resolution videos and the PCI-e transfer latency is large | ||
| # - GPU is already busy and CPU is idle | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # | ||
| # **Performance impact:** CUDA decoding can significantly outperform CPU decoding, | ||
| # especially for high-resolution videos and when decoding a lot of frames. | ||
| # Actual speedup varies by hardware, resolution, and codec. | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # %% | ||
| # **Checking for CPU Fallback** | ||
| # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| # | ||
| # In some cases, CUDA decoding may silently fall back to CPU decoding when the | ||
| # video codec or format is not supported by NVDEC. You can detect this using | ||
| # the :attr:`~torchcodec.decoders.VideoDecoder.cpu_fallback` attribute: | ||
| # | ||
| # .. code-block:: python | ||
| # | ||
| # with set_cuda_backend("beta"): | ||
| # decoder = VideoDecoder("file.mp4", device="cuda") | ||
| # | ||
| # # Print detailed fallback status | ||
| # print(decoder.cpu_fallback) | ||
| # | ||
| # .. note:: | ||
| # | ||
| # The timing of when you can detect CPU fallback differs between backends: | ||
| # with the **FFmpeg backend**, you can only check fallback status after decoding at | ||
| # least one frame, because FFmpeg determines codec support lazily during decoding; | ||
| # with the **BETA backend**, you can check fallback status immediately after | ||
| # decoder creation, as the backend checks codec support upfront. | ||
| # | ||
| # For installation instructions, detailed examples, and visual comparisons | ||
| # between CPU and CUDA decoding, see :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py` | ||
|
|
||
| # %% | ||
| # Conclusion | ||
| # ---------- | ||
| # | ||
| # TorchCodec offers multiple performance optimization strategies, each suited to | ||
| # different scenarios. Use batch APIs for multi-frame decoding, approximate mode | ||
| # for faster initialization, parallel processing for high throughput, and CUDA | ||
| # acceleration to offload the CPU. | ||
| # | ||
| # The best results often come from combining techniques. Profile your specific | ||
| # use case and apply optimizations incrementally, using the benchmarks in the | ||
| # linked examples as a guide. | ||
| # | ||
| # For more information, see: | ||
| # | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to my other comment above, the list below should definitely be kept as a bullet list! |
||
| # - :ref:`sphx_glr_generated_examples_decoding_basic_example.py` - Basic decoding examples | ||
| # - :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` - Approximate mode benchmarks | ||
| # - :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py` - Custom frame mappings | ||
| # - :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py` - Parallel decoding strategies | ||
| # - :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py` - CUDA acceleration guide | ||
| # - :class:`torchcodec.decoders.VideoDecoder` - Full API reference | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.