Skip to content

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191

Open
gaurav02081 wants to merge 3 commits intoCCExtractor:masterfrom
gaurav02081:gaurav-ffmpeg
Open

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191
gaurav02081 wants to merge 3 commits intoCCExtractor:masterfrom
gaurav02081:gaurav-ffmpeg

Conversation

@gaurav02081
Copy link
Contributor

@gaurav02081 gaurav02081 commented Mar 8, 2026

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Add optional FFmpeg-based MP4 parser as an alternative to GPAC

This PR introduces an alternative MP4 parsing backend using FFmpeg's libavformat, while keeping the existing GPAC-based implementation unchanged and as the default.

Motivation

In a previous discussion (Gsoc meeting 2 MARCH) we talked about updating the GPAC dependency used for MP4 processing in CCExtractor. One suggestion was to explore whether there is a Debian-friendly alternative rather than only focusing on upgrading GPAC.

FFmpeg is already used in other parts of the codebase (for example in the demuxing/decoding integration and in the HardsubX module), so extending its use for MP4 parsing seemed like a reasonable option to explore.

Implementation

A new implementation (mp4_ffmpeg.c) was added which uses FFmpeg's libavformat to open and parse MP4 containers.

The general workflow is:

  • Open the MP4 container with avformat_open_input()
  • Discover streams using avformat_find_stream_info()
  • Read packets sequentially using av_read_frame()
  • Dispatch packets based on stream type

Video packets (H.264 / HEVC) are passed to the existing do_NAL() processing logic, while caption tracks (CEA-608 / CEA-708) and subtitle tracks (tx3g) continue to use the existing CCExtractor parsing functions.

One difference from the GPAC implementation is that FFmpeg reads packets sequentially across all streams, whereas the GPAC implementation reads samples per track. The downstream caption extraction pipeline remains unchanged.

For H.264 / HEVC streams, codec configuration data is obtained from the stream extradata (avcC / hvcC) in order to determine the NAL unit length prefix size and extract SPS/PPS before processing packets.

Build configuration

This backend is optional and controlled through a compile-time flag:

-DUSE_FFMPEG_MP4=ON
  • Default build → uses GPAC (mp4.c)
  • FFmpeg build → uses the new implementation (mp4_ffmpeg.c)

The runtime behavior of CCExtractor remains unchanged — the difference only affects how the MP4 container is parsed internally.

Summary

This PR:

  • Adds an FFmpeg-based MP4 parser
  • Keeps GPAC as the default implementation
  • Introduces a compile-time option to switch between the two backends
  • Leaves the caption extraction pipeline unchanged

This provides a potential alternative MP4 backend using a widely available multimedia framework while preserving the existing behavior.

@gaurav02081 gaurav02081 force-pushed the gaurav-ffmpeg branch 2 times, most recently from 6c31e1c to c83ebd9 Compare March 9, 2026 17:25
Add a new CI job (cmake_ffmpeg_mp4) that builds CCExtractor with the
optional FFmpeg-based MP4 parser enabled via -DUSE_FFMPEG_MP4=ON.

The workflow now verifies two builds:
- Default build using GPAC
- FFmpeg MP4 build using a separate build directory

This ensures the FFmpeg backend compiles successfully alongside the
default GPAC implementation.
Replace the custom GPAC-based MP4 parser with an FFmpeg/libavformat
implementation for subtitle extraction. This provides broader codec
support, better container compatibility, and leverages FFmpeg's mature
demuxing infrastructure for handling MP4/MOV files.

- Add mp4_ffmpeg.c with full libavformat-based demuxing pipeline
- Update CMakeLists to detect and link FFmpeg libraries (libavformat,
  libavcodec, libavutil) with fallback to GPAC when unavailable
- Integrate FFmpeg pathway into ccextractor main entry point
- Preserve existing GPAC codepath as fallback
  1. src/lib_ccx/CMakeLists.txt — Added pkg_check_modules(AVCODEC REQUIRED libavcodec) and appended its include dirs and libraries to EXTRA_INCLUDES/EXTRA_LIBS
  in the USE_FFMPEG_MP4 block.
  2. src/lib_ccx/mp4_ffmpeg.c — Updated the header comment to accurately reflect the libavcodec dependency.
@gaurav02081
Copy link
Contributor Author

Update: Added a new CI job to build CCExtractor with the optional FFmpeg MP4 backend.

The workflow now performs two builds:

Default build using the GPAC implementation

FFmpeg build using -DUSE_FFMPEG_MP4=ON

Both builds run --version to verify the binaries execute correctly, and separate build directories are used to avoid CMake cache conflicts.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 90128d8...:
Report Name Tests Passed
Broken 10/13
CEA-708 2/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 79/86
Teletext 20/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

NOTE: The following tests have been failing on the master branch as well as the PR:

  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed:

    Test 8730

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --parsePAT --out=srt c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --endcreditsforatleast 3 --endcreditstext "CCextractor Ends crdit Testing" addf5e2fc9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit e4bcade...:
Report Name Tests Passed
Broken 10/13
CEA-708 2/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 20/21
WTV 13/13
XDS 34/34

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

This PR does not introduce any new test failures. However, some tests are failing on both master and this PR (see above).

Check the result page for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants