Audio Enhancement Bible
A practical rescue manual for ugly production dialogue from Sony FX6 shoots. Built from real Last Water Day tests — not generic post-audio advice. The core rule is ruthless: protect the performance first, then remove noise, and only trust AI when it wins in a real A/B.
The one sentence version
The best workflow so far is: make the strongest local full-clip master first, then test AI only on the worst 5–15 second phrases, and patch back only the moments that clearly improve intelligibility without making the actor sound fake.
Field reality
AI can help rescue noisy dialogue. It cannot magically recreate destroyed production sound with zero tradeoffs. Good mic placement, disciplined levels, and room tone still matter more than any model.
Now with actual A/B media
The interactive comparison page includes embedded audio players and waveform views for 3002, 3079, and 3084, always showing the original raw snippet beside the enhanced versions.
Everything below comes from actual comparison work on problem clips, not theory. These principles are what survived repeated tests.
1. Dialogue is sacred
If a cleanup removes consonants, breath rhythm, or performance nuance, it is not a win — even if the waveform looks cleaner.
2. Use the real track, not the assumption
On these FX6 clips, track 2 was often the true production dialogue source. Track 1 sometimes helped as room-tone context, but often was too uncorrelated or too quiet to be a subtraction reference.
3. Full-clip AI is not automatically smart
Cloud tools can help, but full-length isolation often sounded worse than disciplined local cleanup. Phrase-level testing gave much better outcomes.
This is the default Film Konnections rescue path unless a better clean source exists.
What “good” sounds like
- Speech is easier to understand without sounding over-smoothed.
- Sibilants and transients still feel human.
- Background is reduced without obvious pumping.
- Emotion and timing still feel like the on-set performance.
Red flags
- “Underwater” or phasey words
- Vanishing consonants
- Voice suddenly sounding podcast-like or synthetic
- Noise floor breathing up and down unnaturally
These are the most important real-world results so far. The details matter because they stop us from repeating bad assumptions.
| Clip | Winning Approach | What Lost | Main Read |
|---|---|---|---|
| INTERFILM_3002 | Custom local spectral-gate + polish chain | DeepFilterNet over-suppressed speech; ElevenLabs full-clip isolation collapsed badly | Local cleanup clearly beat AI for the full clip. |
| INTERFILM_3079 | Local full clip + ElevenLabs only on first 18 seconds | Full-clip ElevenLabs did not beat the local master | Phrase-level AI rescue can absolutely work when the phrase is isolated and ugly enough. |
| INTERFILM_3084 | Track-2-led anlmdn full rescue chain | Track-1 subtraction idea; ElevenLabs opening phrase test | Sometimes the boring local denoise chain is simply the strongest full-length option. |
3002 lesson
ElevenLabs was not the answer on the full take or on the tested short phrases. The best result came from a custom local chain with proper in-clip noise analysis and restrained polish.
3079 lesson
This clip proved the key hybrid idea: the local full-clip master won overall, but ElevenLabs won decisively on the first 18 seconds. A hybrid master made more sense than dogmatically choosing one tool.
3084 lesson
A strong anlmdn pass on the real dialogue track gave the best full rescue. The “more advanced” model path was not automatically better.
Order matters more than hype. This is the current priority ladder.
Tier 1 — Start here
Local signal processing. Real noise-print denoise, spectral gating where needed, intelligibility EQ, de-ess, compression, limiting. This remains the most controllable and often the most natural.
Tier 2 — Best premium pro route
iZotope RX 11 Advanced. Dialogue Isolate, Spectral De-noise, Dialogue De-reverb, Spectral Repair, and manual phrase surgery are still the most promising untried high-end workflow.
Tier 3 — AI assist, not blind faith
ElevenLabs, Adobe Podcast, Resemble Enhance are best tested on short phrases first. If they help, patch selectively. If not, keep them out of the master.
What we learned about the current tools
DeepFilterNet3: workable, but often too aggressive on these clips.
ElevenLabs full clip: useful technically, but not the winning full-length solution on the tested material.
ElevenLabs phrase rescue: worth testing on the ugliest 5–15 second dialogue moments.
anlmdn / local denoise: surprisingly strong when used carefully on the right track.
This is the most realistic “premium” path for difficult dialogue when you do not have a better clean source.
Local master + AI phrase patches
1. Build the best full-length local master.
2. Mark only the ugliest 3–5 phrases.
3. Run cloud enhancement / isolation on those phrases only.
4. Compare phrase-by-phrase against the local master.
5. Reinsert only the clear winners.
6. Deliver both a strongest cleanup and a more natural alternate for editorial choice.
Why this hybrid method works
Most AI enhancement tools are inconsistent across a whole take. They may fix one line beautifully and destroy the next. A hybrid approach treats them like a phrase-level specialist, not a one-button final master.
What not to do
Do not run full-clip AI blindly and assume “cleaner” equals better. Do not replace production dialogue with synthetic speech unless the creative team explicitly approves an ADR-style intervention. Do not judge results without watching the actor’s face and timing in sync.
fal may be useful — but only if it exposes the right underlying model.
The honest answer
fal itself is not the magic. The question is whether fal hosts a truly good speech-enhancement, dereverb, or source-separation model. If yes, it is worth A/B testing. If not, it is just more tooling around mediocre outputs.
What would be worth testing on fal
Hosted access to things like Resemble Enhance, serious speech enhancement, dereverb, or source-separation models could be useful. Generic audio generation, voice changing, or music tools are not the priority for production dialogue rescue.
Best fal-style use case
Use fal only as an additional candidate in a phrase-level bake-off or as a helper in a hybrid workflow — for example, generating a cleaner speech-focused stem that is blended lightly into the local master. It should earn its place by sounding better, not by sounding more “AI.”
If we want one more jump in quality beyond the current workflow, these are the most valuable next experiments.
1. RX 11 chain
The strongest untried pro route: Dialogue Isolate → Spectral De-noise → Dialogue De-reverb → Spectral Repair on the ugliest words.
2. Adobe Podcast / Resemble on phrases
Only on 5–15 second dialogue moments. These are not trusted for blind full-clip mastering.
3. Dereverb-first workflow
If the ugliness is partly room splash rather than steady noise, dereverb may help more than another denoiser.
4. Source-separation-assisted blend
Try a speech-focused separation stem as a helper layer, not as the final master by itself.
Bottom line
Default recommendation: keep using the local full-clip rescue workflow as the backbone. Then test cloud/model tools phrase-by-phrase only where the local master still fails. That is the highest-probability path to genuinely better dialogue on this project.