Production Sound Reference · Film Konnections

Audio Enhancement Bible

A practical rescue manual for ugly production dialogue from Sony FX6 shoots. Built from real Last Water Day tests — not generic post-audio advice. The core rule is ruthless: protect the performance first, then remove noise, and only trust AI when it wins in a real A/B.

The one sentence version

The best workflow so far is: make the strongest local full-clip master first, then test AI only on the worst 5–15 second phrases, and patch back only the moments that clearly improve intelligibility without making the actor sound fake.

Field reality

AI can help rescue noisy dialogue. It cannot magically recreate destroyed production sound with zero tradeoffs. Good mic placement, disciplined levels, and room tone still matter more than any model.

Now with actual A/B media

The interactive comparison page includes embedded audio players and waveform views for 3002, 3079, and 3084, always showing the original raw snippet beside the enhanced versions.

Open the Audio Comparison Lab →

FX6 clips deeply tested

MXF channels profiled per clip

Winning patterns found

Rule that stayed true: speech first

← Back to Production Hub Open compare lab See report summary

Principles

Everything below comes from actual comparison work on problem clips, not theory. These principles are what survived repeated tests.

1. Dialogue is sacred

If a cleanup removes consonants, breath rhythm, or performance nuance, it is not a win — even if the waveform looks cleaner.

2. Use the real track, not the assumption

On these FX6 clips, track 2 was often the true production dialogue source. Track 1 sometimes helped as room-tone context, but often was too uncorrelated or too quiet to be a subtraction reference.

3. Full-clip AI is not automatically smart

Cloud tools can help, but full-length isolation often sounded worse than disciplined local cleanup. Phrase-level testing gave much better outcomes.

Recommended Workflow

This is the default Film Konnections rescue path unless a better clean source exists.

Profile all channels

Extract all 8 mono tracks from the MXF. Measure peaks, RMS, and active content.

Pick the true speech track

Use listening + metrics. Don’t assume lav/boom routing.

Find in-clip noise windows

Use real room tone or low-speech pockets from the same take.

Build a local master

Denoise, spectral gate if needed, then light EQ, de-ess, compression, limiting.

Compare in picture

Judge against raw in a review MP4, not by waveform screenshots.

Patch only winning phrases

If AI wins on a tiny section, patch back only that section with a short crossfade.

What “good” sounds like

Speech is easier to understand without sounding over-smoothed.
Sibilants and transients still feel human.
Background is reduced without obvious pumping.
Emotion and timing still feel like the on-set performance.

Red flags

“Underwater” or phasey words
Vanishing consonants
Voice suddenly sounding podcast-like or synthetic
Noise floor breathing up and down unnaturally

What Actually Won on Real Clips

These are the most important real-world results so far. The details matter because they stop us from repeating bad assumptions.

Clip	Winning Approach	What Lost	Main Read
INTERFILM_3002	Custom local spectral-gate + polish chain	DeepFilterNet over-suppressed speech; ElevenLabs full-clip isolation collapsed badly	Local cleanup clearly beat AI for the full clip.
INTERFILM_3079	Local full clip + ElevenLabs only on first 18 seconds	Full-clip ElevenLabs did not beat the local master	Phrase-level AI rescue can absolutely work when the phrase is isolated and ugly enough.
INTERFILM_3084	Track-2-led `anlmdn` full rescue chain	Track-1 subtraction idea; ElevenLabs opening phrase test	Sometimes the boring local denoise chain is simply the strongest full-length option.

3002 lesson

ElevenLabs was not the answer on the full take or on the tested short phrases. The best result came from a custom local chain with proper in-clip noise analysis and restrained polish.

3079 lesson

This clip proved the key hybrid idea: the local full-clip master won overall, but ElevenLabs won decisively on the first 18 seconds. A hybrid master made more sense than dogmatically choosing one tool.

3084 lesson

A strong anlmdn pass on the real dialogue track gave the best full rescue. The “more advanced” model path was not automatically better.

Tool Stack: What to Use First

Order matters more than hype. This is the current priority ladder.

Tier 1 — Start here

Local signal processing. Real noise-print denoise, spectral gating where needed, intelligibility EQ, de-ess, compression, limiting. This remains the most controllable and often the most natural.

Tier 2 — Best premium pro route

iZotope RX 11 Advanced. Dialogue Isolate, Spectral De-noise, Dialogue De-reverb, Spectral Repair, and manual phrase surgery are still the most promising untried high-end workflow.

Tier 3 — AI assist, not blind faith

ElevenLabs, Adobe Podcast, Resemble Enhance are best tested on short phrases first. If they help, patch selectively. If not, keep them out of the master.

What we learned about the current tools

DeepFilterNet3: workable, but often too aggressive on these clips.
ElevenLabs full clip: useful technically, but not the winning full-length solution on the tested material.
ElevenLabs phrase rescue: worth testing on the ugliest 5–15 second dialogue moments.
anlmdn / local denoise: surprisingly strong when used carefully on the right track.

The Hybrid Workflow We Actually Recommend

This is the most realistic “premium” path for difficult dialogue when you do not have a better clean source.

Local master + AI phrase patches

1. Build the best full-length local master.
2. Mark only the ugliest 3–5 phrases.
3. Run cloud enhancement / isolation on those phrases only.
4. Compare phrase-by-phrase against the local master.
5. Reinsert only the clear winners.
6. Deliver both a strongest cleanup and a more natural alternate for editorial choice.

Why this hybrid method works

Most AI enhancement tools are inconsistent across a whole take. They may fix one line beautifully and destroy the next. A hybrid approach treats them like a phrase-level specialist, not a one-button final master.

What not to do

Do not run full-clip AI blindly and assume “cleaner” equals better. Do not replace production dialogue with synthetic speech unless the creative team explicitly approves an ADR-style intervention. Do not judge results without watching the actor’s face and timing in sync.

What About fal.ai?

fal may be useful — but only if it exposes the right underlying model.

The honest answer

fal itself is not the magic. The question is whether fal hosts a truly good speech-enhancement, dereverb, or source-separation model. If yes, it is worth A/B testing. If not, it is just more tooling around mediocre outputs.

What would be worth testing on fal

Hosted access to things like Resemble Enhance, serious speech enhancement, dereverb, or source-separation models could be useful. Generic audio generation, voice changing, or music tools are not the priority for production dialogue rescue.

Best fal-style use case

Use fal only as an additional candidate in a phrase-level bake-off or as a helper in a hybrid workflow — for example, generating a cleaner speech-focused stem that is blended lightly into the local master. It should earn its place by sounding better, not by sounding more “AI.”

Best Remaining Tests

If we want one more jump in quality beyond the current workflow, these are the most valuable next experiments.

1. RX 11 chain

The strongest untried pro route: Dialogue Isolate → Spectral De-noise → Dialogue De-reverb → Spectral Repair on the ugliest words.

2. Adobe Podcast / Resemble on phrases

Only on 5–15 second dialogue moments. These are not trusted for blind full-clip mastering.

3. Dereverb-first workflow

If the ugliness is partly room splash rather than steady noise, dereverb may help more than another denoiser.

4. Source-separation-assisted blend

Try a speech-focused separation stem as a helper layer, not as the final master by itself.

Bottom line

Default recommendation: keep using the local full-clip rescue workflow as the backbone. Then test cloud/model tools phrase-by-phrase only where the local master still fails. That is the highest-probability path to genuinely better dialogue on this project.