A focused desktop tool to analyze A/V timing and perform a lossless MKV remux with predictable, auditable behavior.
The app discovers delays, applies a positive‑only global shift (so no content gets trimmed), handles chapters and subtitles, and writes a mkvmerge options file you can inspect and replay.
Scope of this README
This document describes the baseline application — the exact code you shared (before any experimental “expert/advanced repair” features). It is intentionally exhaustive so a new contributor can understand end‑to‑end behavior and implementation details.
- Key Ideas
- Architecture
- Job Lifecycle
- Delay Discovery Engines
- Positive‑Only Timing Model
- Merge Planning
- Subtitles
- Chapters
- Attachments
- mkvmerge Options File
- Temporary Files, Outputs, Logs
- GUI Overview
- Configuration Reference
- Dependencies
- Run It
- Troubleshooting
- Developer Notes
- Design Invariants & Edge Cases
- Performance Notes
- Known Limitations
- Appendix A: Log Line Guide
- Appendix B: Demux Extension Map
- Appendix C: Configuration Keys (Table)
- Lossless by design: the pipeline never applies negative per‑track delays that could discard leading content. We transform all timings into a non‑negative scheme via a global shift.
- Deterministic & auditable: we construct the mkvmerge command as a token array and persist it (
opts.json). You can read, diff, and replay it. - Separation of concerns: analysis, planning, extraction, subtitles, chapters, and merging are isolated but connected through explicit data structures.
- Language‑aware analysis (optional): reference/target audio stream selection can prefer a specific language tag (e.g.,
jpn,eng) to improve correlation robustness.
repo-root/
├─ main.py # App entry point
├─ vsg_core/ # Headless engine
│ ├─ analysis.py # Delay discovery (Audio XCorr, VideoDiff)
│ ├─ config.py # Settings load/save, defaults, dir creation
│ ├─ job_discovery.py # Single-file & folder batch discovery
│ ├─ mkv_utils.py # mkvmerge/mkvextract helpers, demux, chapters
│ ├─ pipeline.py # Orchestration of a full job
│ ├─ process.py # Command runner with compact logging
│ └─ subtitle_utils.py # SRT→ASS, rescale, font-size multiply
└─ vsg_qt/ # PySide6 GUI
├─ main_window.py # Main window, file pickers, log, actions
├─ worker.py # Threaded job runner (no UI freeze)
├─ manual_selection_dialog.py# Drag/drop track picker + flags
├─ options_dialog.py # Settings tabs & Merge Profile editor
└─ track_widget.py # Per-track widget with toggles
Key modules
vsg_core.process.CommandRunner— canonical way to run external processes with uniform logging, progress throttling, error tails.vsg_core.analysis— two delay engines:- Audio cross‑correlation (librosa + scipy) with chunked extraction via ffmpeg.
- VideoDiff integration (external tool), with error bounds.
vsg_core.pipeline.JobPipeline— the orchestrator. Calls analysis, converts delays into positive‑only residuals, builds extraction + mkvmerge plans, executes merge, writes artifacts.vsg_core.mkv_utils— inspects MKVs (mkvmerge -J), extracts tracks (mkvextract+ ffmpeg for special cases), processes chapters (XML), and attaches files.vsg_core.subtitle_utils— SRT→ASS conversion, ASS PlayRes rescale, font size multiplication (style‑aware, line‑by‑line safe).
Processing one Reference file with optional Secondary and Tertiary files:
-
Analysis
- For each Secondary/Tertiary, compute delay vs Reference using selected engine.
- Log per‑chunk results and final “determined” delay.
-
Merge Planning
- Convert raw delays to a positive‑only scheme (see below).
- Build a track plan:
- Merge Plan mode uses the JSON‑like rule set in settings.
- Manual Selection mode uses the user’s drag/drop layout.
-
Extraction
- Demux only those tracks that appear in the final plan. Special handling for
A_MS/ACM(attempt copy, else decode to PCM with correct bit depth).
- Demux only those tracks that appear in the final plan. Special handling for
-
Subtitles (optional per track)
- Convert SRT→ASS, rescale PlayRes to match video, multiply text size.
-
Chapters
- Optional rename to “Chapter NN”.
- Shift all
ChapterTimeStart/Endby the global shift. - Optional snap starts (and optionally ends) to keyframes within a threshold.
-
Merge Execution
- Create
opts.json(mkvmerge token array) and (optionally) a pretty dump. - Run mkvmerge with
@opts.json.
- Create
-
Cleanup
- On success (or on analyze‑only), remove the temp job directory.
- Logs are written next to the output MKV (or in the chosen output folder).
Objective: Estimate relative delay (ms) between the Reference audio stream and a target (Secondary/Tertiary) stream.
Even when languages differ, many mixes share music/SFX transients. By correlating short segments that avoid long silences, we locate a robust lag.
- Inspect
mkvmerge -JJSON and enumeratetracksof typeaudio. - If the user provided a language code (e.g.,
analysis_lang_sec="jpn"), pick the first audio whoseproperties.languagematches. - Otherwise, use the first audio track.
- Indexes map to
ffmpeg -map 0:a:<index>for extraction.
For best robustness, prefer like‑for‑like (e.g., JPN vs JPN) when available.
For each scan point tᵢ, extract mono, 48 kHz WAV windows from both files:
ffmpeg -y -v error -ss <tᵢ> -i <file> -map 0:a:<idx> -t <dur> -vn \
-acodec pcm_s16le -ar 48000 -ac 1 <out.wav>- Default duration
dur= 15s (configurable). - Conservative logging: the exact command line is mirrored to the GUI log.
- Determine program duration
Dfromffprobe(format duration). - Defaults:
scan_chunk_count = 10scan_chunk_duration = 15- We analyze a band
[0.10*D, 0.90*D)and distribute windows evenly (skip early logos and late credits to avoid silence).
This balances robustness with I/O/runtime.
We load WAVs (mono) with librosa, preserving native rate (48 kHz). Then normalize to z‑scores to remove gain bias:
x = (x - mean(x)) / (std(x) + 1e-9)
y = (y - mean(y)) / (std(y) + 1e-9)Compute full discrete cross‑correlation (scipy), then find the peak lag:
c = correlate(x, y, mode='full', method='auto')
k* = argmax(c) - (len(y) - 1) # lag in samples (y vs x)
τ = k* / fs # seconds
delay_ms = round(1000 * τ)We also compute a match/confidence heuristic:
norm = sqrt( sum(x^2) * sum(y^2) )
match_pct = 100 * (max(|c|) / (norm + 1e-9))This is a normalized peak height — higher is “sharper” alignment. It is not a probability.
The z‑score step behaves like a simple pre‑whitening/normalization, reducing the impact of level shifts and DC offsets so that edge energy (transients) drives the peak.
- 15s windows capture multiple transients; longer windows (20–30s) increase robustness at the cost of time.
- If matches are weak, increase duration or adjust language selection.
- Drop windows with
match_pct <= min_match_pct(default 5.0). - Compute the mode (most frequent) delay among remaining windows.
- From the modal group, pick the window with max match %. That tuple
(delay_ms, match%)is the determined delay.
This favors consistency over any single window’s outlier result.
| Symptom | What to try |
|---|---|
| Low match% overall | Increase scan_chunk_duration to 20–30s; ensure language selection compares similar mixes (e.g., JPN vs JPN). |
| Two clusters of delays | Baseline uses one global delay; confirm with Analyze Only and consider whether underlying media truly has a splice (outside baseline scope). |
| Slow analysis | Reduce scan_chunk_count; shrink scan band if needed. |
If analysis_mode="VideoDiff":
- Execute external
videodiffwith(ref, target)and parse the last[Result]line. - Extract either
ss:oritsoffset:seconds anderror:value. - If kind is
ss, invert sign for our delay semantics. - Enforce that
error ∈ [videodiff_error_min, videodiff_error_max]; otherwise reject result.
Use this when audio mixes are too divergent for correlation (e.g., commentary tracks).
Problem: mkvmerge --sync with negative values can drop leading content.
Solution: Convert raw delays to non‑negative residuals by applying a global shift equal to the absolute most negative delay.
Let raw delays (ms) be:
ref = 0
sec = -1001
ter = -1000
- Global shift:
global_shift = -min(ref, sec, ter) = 1001 - Residuals:
ref_resid = ref + global_shift = 1001sec_resid = sec + global_shift = 0ter_resid = ter + global_shift = 1
- Merge sync flags (per input group):
--sync 0:<residual_ms> - Chapters: shift all timestamps by
+global_shiftso chapters align with delayed streams.
This guarantees that no input is asked to start before t=0 in mkvmerge, eliminating trimming.
After delays are converted, the planner builds a final list of (track, flags) entries and a --track-order to match GUI order.
A prioritized list of rules (Settings → Merge Plan) defines what to include. Each rule:
source:REF|SEC|TERtype:Video|Audio|Subtitleslang: CSV orany(match againstproperties.language)exclude_langs: CSV (omit these even iflang=any)enabled: boolis_default: bool (first match of this type becomes the default track)is_forced_display: bool (subs)swap_first_two: bool (subs; swap first two matches)apply_track_name: bool (pass the input’strack_nameto output)rescale: bool (ASS/SSA only; rewrite PlayRes to video)
Global codec exclusions (exclude_codecs) filter all matches whose codec id contains any excluded token (e.g., ac3, dts, pcm).
Default logic
- The first video is implicitly default.
- Exactly one audio and one subtitles track can be marked default via flags.
Per‑track mkvmerge arguments (generated in order):
--language 0:<lang>
--track-name 0:<name> # if apply_track_name and input had a name
--sync 0:<global_shift + role_residual>
--default-track-flag 0:<yes/no>
--forced-display-flag 0:yes # if is_forced_display
--compression 0:none
--remove-dialog-normalization-gain 0 # if enabled and codec is AC3/E-AC3
( <extracted-file-path> )
Finally we add attachments (if any) and --track-order <inputIdx0>:0,<inputIdx1>:0,....
Instead of rules, the user drags tracks into Final Output:
- Reorder entries to match desired output.
- Per‑entry toggles: Default (A/V/S), Forced (subs), Keep Name, Convert to ASS (SRT), Rescale (ASS/SSA), Size Multiplier (subs).
Batch auto‑apply: If enabled, and the shape signature (counts of [source × type]) of the next file matches the previous, automatically carry over the layout.
All subsequent stages (extraction, chapters, merge) are identical to the rule‑based plan.
- SRT → ASS (optional): via ffmpeg; if output exists, we replace the path.
- Rescale PlayRes (ASS/SSA only): probe reference video width/height via ffprobe and rewrite
PlayResX/PlayResYif they differ. - Font size multiplier (ASS/SSA only): parse
Style:lines and multiply the font size value, keeping other fields intact. Safe parsing avoids corrupting the file.
- Rename (optional): clear
ChapterDisplayand write “Chapter NN”. - Shift timestamps: add
global_shift(ms → ns) to bothChapterTimeStartandChapterTimeEnd. - Snap to keyframes (optional): probe keyframes via ffprobe and, for starts (and optionally ends), move within
snap_threshold_msaccording to mode (previousornearest). - Normalize ends: ensure each chapter has a valid end; cap ends to next chapter’s start; guarantee strictly increasing intervals.
All chapter edits are written to a temporary XML file and passed to mkvmerge via --chapters.
If the Tertiary file contains attachments (fonts, images), we extract and --attach-file them to the output. These are input‑agnostic artifacts and do not interact with sync timing.
We emit tokens as JSON (opts.json) and run:
mkvmerge @<opts.json>
Optionally we also write a pretty text dump (opts.pretty.txt) for human inspection:
--output "<out.mkv>" \
--chapters "<chapters.xml>" \
--language 0:jpn --sync 0:1001 --default-track-flag 0:yes --compression 0:none ( "<ref_video.h264>" ) \
...
This makes the merge reproducible and easy to debug.
Each job creates a unique temp dir: temp_root/job_<ref-stem>_<epoch>/ with:
ref_track_*,sec_track_*,ter_track_*: demuxed streams_chapters_modified.xml: edited chaptersopts.json(+ optionalopts.pretty.txt): mkvmerge argswav_*: short analysis windows for audio correlationatt_*: attachments from TER if present
Output: <output_folder>/<ReferenceFileName>.mkv (same filename as Reference)
Run log: <output_folder>/<ReferenceFileName>.log
Compact logging shows throttled progress and prints the tail of stderr on error for signal‑to‑noise.
- Inputs: Reference, Secondary, Tertiary (files or directories).
- Modes:
- Merge Plan (profile rules)
- Manual Selection (drag/drop final list; optional auto‑apply across batch)
- Actions:
- Analyze Only → Compute and display delays (no merge).
- Analyze & Merge → Full pipeline.
- Settings Tabs:
- Storage: output folder, temp root, optional VideoDiff path
- Analysis: engine choice, chunk count/duration, min match %, VideoDiff error bounds, language prefs (REF/SEC/TER)
- Chapters: rename, snap mode/threshold, starts‑only toggle
- Merge Behavior: remove dialog normalization gain, codec blacklist, disable track statistics tags
- Logging: compact mode, autoscroll, progress step %, error tail lines, pretty/json options dump
- Merge Plan: rule editor with priority ordering
Settings are persisted to settings.json. Missing keys are auto‑added with defaults.
-
Storage & Tools
output_folder(str) — defaultsync_output/under repo roottemp_root(str) — defaulttemp_work/under repo rootvideodiff_path(str) — blank uses PATH
-
Analysis
analysis_mode(str) —"Audio Correlation"|"VideoDiff"scan_chunk_count(int) — default 10scan_chunk_duration(int, seconds) — default 15min_match_pct(float) — default 5.0analysis_lang_ref/analysis_lang_sec/analysis_lang_ter(str, optional ISO likejpn,eng)videodiff_error_min/videodiff_error_max(float) — bounds for VideoDiff acceptance
-
Workflow
merge_mode(str) —"plan"|"manual"
-
Chapters
rename_chapters(bool)snap_chapters(bool)snap_mode(str) —"previous"|"nearest"snap_threshold_ms(int) — default 250snap_starts_only(bool) — only snap chapter starts
-
Merge Behavior
apply_dialog_norm_gain(bool) — remove dialnorm for AC3/E‑AC3exclude_codecs(str) — comma list (e.g.,"ac3, dts, pcm")disable_track_statistics_tags(bool)merge_profile(list[rule]) — see Merge Plan
-
Logging
log_compact(bool) — compact stdoutlog_autoscroll(bool) — GUI behaviorlog_progress_step(int %) — progress throttling (e.g., 20 → 0/20/40/60/80/100)log_error_tail(int lines) — tail lines printed on errorlog_tail_lines(int lines) — tail lines printed on successlog_show_options_pretty/log_show_options_json(bool)
-
Archival
archive_logs(bool) — after batch, zip per‑file logs and delete the originals
- Python 3.9+
- MKVToolNix:
mkvmerge,mkvextract - FFmpeg:
ffmpeg,ffprobe - VideoDiff (optional): if using that mode
- Python packages:
PySide6,librosa,numpy,scipy
Ensure binaries are on PATH (or set explicit paths in Settings).
python main.py- Select Reference (and optional Secondary/Tertiary). Files or matching folders.
- Choose Analyze Only to validate delays, or Analyze & Merge to produce the final MKV.
- Watch the log for:
- Per‑chunk XCorr lines and final delays
- Positive‑only global shift
- Chapter processing summary
- mkvmerge options file path
- Success path for the output file
- Tool not found — make sure
mkvmerge,mkvextract,ffmpeg,ffprobeare installed and onPATH. - XCorr unstable/low confidence — increase
scan_chunk_durationand/orscan_chunk_count; ensure language selection targets comparable mixes (JPN vs JPN, ENG vs ENG). - Defaults/forced flags not what you expected — In Merge Plan, check rule ordering and flags; in Manual mode, adjust the final list toggles.
- Chapters misaligned — Verify
global_shiftin logs equals the shift applied to chapter XML; if snapping is on, try increasingsnap_threshold_msor switchsnap_mode. - mkvmerge failure — Open
opts.json, replay via terminal, and examine stderr; the app also prints the error tail.
- Demux strategy:
mkvextractfor general tracks; forA_MS/ACMwe first try stream copy with ffmpeg, else decode to PCM using a bit‑depth‑aware codec (pcm_s16le,pcm_s24le,pcm_s32le,pcm_f64le). - Language selection (analysis only) is independent from merge inclusion rules.
- Track ordering is fully deterministic: we append inputs in the exact GUI/plan order and then emit a matching
--track-order. - Logging style balances signal/noise; compact mode prints progress and only a tail of verbose output on success/failure.
- No negative
--syncis ever passed to mkvmerge; all per‑input--syncvalues are ≥ 0 after applying the global shift. - Reference video dictates chapter rescale and subtitle PlayRes.
- Manual layout auto‑apply is only used when the shape signature (counts of
[source × type]) matches the prior job to prevent accidental mismatches. - Codec exclusions are substring checks against
codec_idlowercased (e.g.,a_ac3,a_dts,a_pcm). - Chapters normalization ensures strictly increasing intervals and prevents open‑ended atoms from overlapping into the next.
- XCorr windowing is the main cost. Defaults (
10 × 15s) trade speed vs. robustness. - SSD churn is minimized by deleting temp job directories on success.
- For faster previews, reduce
scan_chunk_countand/orscan_chunk_duration— then confirm with a second run if needed.
- Baseline engine uses a single global delay per Secondary/Tertiary. It does not splice or model time‑varying drift (that would be an “advanced repair” feature outside this README’s scope).
- XCorr can be confused by long uniform ambiences; tuning the window schedule usually fixes it.
- VideoDiff requires a separate binary and is subject to its error metric semantics.
Examples you’ll see in the GUI log:
$ ffprobe -v error -select_streams v:0 -show_entries format=duration -of csv=p=0 "ref.mkv"
Chunk @1278s -> Delay -1001 ms (Match 95.28%)
Secondary delay determined: -1001 ms
[Delay] Raw delays (ms): ref=0, sec=-1001, ter=-1000
[Delay] Applying lossless global shift: +1001 ms
[Chapters] Renamed chapters to "Chapter NN".
[Chapters] Shifted all timestamps by +1001 ms.
[Chapters] Snap result: moved=3, on_kf=5, too_far=1 (kfs=1234, mode=previous, thr=250ms, starts_only=True)
mkvmerge options file written to: temp_work/job_ref_.../opts.json
[SUCCESS] Output file created: sync_output/RefTitle.mkv
| Track type | codec_id contains | Demux extension |
|---|---|---|
| video | V_MPEGH/ISO/HEVC |
.h265 |
V_MPEG4/ISO/AVC |
.h264 |
|
V_MPEG1/2 |
.mpg |
|
V_VP9 |
.vp9 |
|
V_AV1 |
.av1 |
|
| (else) | .bin |
|
| audio | A_TRUEHD |
.thd |
A_EAC3 |
.eac3 |
|
A_AC3 |
.ac3 |
|
A_DTS |
.dts |
|
A_AAC |
.aac |
|
A_FLAC |
.flac |
|
A_OPUS |
.opus |
|
A_VORBIS |
.ogg |
|
A_PCM |
.wav |
|
| (else) | .bin |
|
| subs | S_TEXT/ASS |
.ass |
S_TEXT/SSA |
.ssa |
|
S_TEXT/UTF8 |
.srt |
|
S_HDMV/PGS |
.sup |
|
S_VOBSUB |
.sub |
|
| (else) | .sub |
Special case: A_MS/ACM → attempt stream copy; if refused, decode to PCM with bit‑depth‑aware codec.
| Key | Type | Default | Notes |
|---|---|---|---|
output_folder |
str | sync_output |
Output target for merged MKV & job logs |
temp_root |
str | temp_work |
Per‑job scratch directory root |
videodiff_path |
str | "" |
If blank, use PATH |
analysis_mode |
str | Audio Correlation |
or VideoDiff |
scan_chunk_count |
int | 10 |
Number of windows for XCorr |
scan_chunk_duration |
int | 15 |
Seconds per window |
min_match_pct |
float | 5.0 |
Discard XCorr results below this |
analysis_lang_ref |
str | "" |
ISO (jpn, eng) or blank for first stream |
analysis_lang_sec |
str | "" |
Same as above |
analysis_lang_ter |
str | "" |
Same as above |
videodiff_error_min |
float | 0.0 |
Reject if error < min |
videodiff_error_max |
float | 100.0 |
Reject if error > max |
merge_mode |
str | plan |
or manual |
rename_chapters |
bool | false |
Rename to Chapter NN |
snap_chapters |
bool | false |
Enable keyframe snapping |
snap_mode |
str | previous |
or nearest |
snap_threshold_ms |
int | 250 |
Max move distance |
snap_starts_only |
bool | true |
Only snap starts |
apply_dialog_norm_gain |
bool | false |
Remove dialnorm for AC3/E‑AC3 |
disable_track_statistics_tags |
bool | false |
mkvmerge flag |
exclude_codecs |
str | "" |
CSV blacklist (ac3,dts,pcm) |
merge_profile |
list | (see defaults) | Rule list, priority‑ordered |
log_compact |
bool | true |
Compact command runner logs |
log_autoscroll |
bool | true |
GUI behavior |
log_progress_step |
int | 20 |
% step for progress lines |
log_error_tail |
int | 20 |
stderr tail lines on error |
log_tail_lines |
int | 0 |
stdout tail on success |
log_show_options_pretty |
bool | false |
Dump pretty opts |
log_show_options_json |
bool | false |
Dump raw JSON opts |
archive_logs |
bool | true |
Zip logs after batch |
License
This project wraps external tools; respect their licenses. The GUI/engine code is released under the project’s chosen license (add a LICENSE file if needed).