Basic podcast editing with Audacity
This post describes a simple workflow for editing a podcast in Audacity, with some extra bonus Uninformed Opinions™.
Prerequisites
Audacity
Audacity is free audio-editing software, available in Linux, Windows and macOS.
Headphones
You'll want some decent headphones. (If you have good monitors then I'm sure that'll work fine too, but most of us don't.) You don't need anything super fancy, but definitely don't get the cheapest set available. You really notice a lot more detail with decent headphones, compared to, say, some shitty Bluetooth speakers.Shortcuts
It is extremely important to set up keyboard shortcuts. The ideal scenario is to use a physical, programmable button/control panel, like a Stream Deck, because these let you execute actions with a single button press (rather than a combination of button presses); but a regular keyboard is also fine.
If you do use a Stream Deck, you'll use the Hotkey action under the built-in System plugin. Set the hotkey to whatever Audacity's default hotkey for the particular action is (or, change Audacity's hotkey if you want).
These are the shortcuts/hotkeys that I use, with mostly the default hotkeys (in Windows, anyway; I may have updated some of these, but most if not all should be the defaults):
- Play (space): start and stop the recording.
- Delete (Del): delete a section of audio.
- Zoom in (Ctrl + 1): I find myself zooming in and out quite frequently.
- Zoom out (Ctrl + 3)
- Silence (Ctrl + L): silence audio. This can be helpful in many situations (described below).
- Split (Ctrl + I): split an audio section into two parts. I typically do this when I need to insert intro/outro theme music.
- Join (Ctrl + J): join two audio sections into a single one. Pretty situational, used rarely.
- Undo (Ctrl + Z)
- Redo (Ctrl + Y)
- Preview cut (C): preview what a section of audio would sound like if you cut part of it out. More on that below.
In addition to those, I have these custom macros mapped to hotkeys:
- Amplify +1 dB
- Amplify -1 dB
To set these up, select Tools, Macro Manager, New; give it a name (e.g., "Amplify 1 dB"), and click Okay. Then, select your newly made macro, and on the right, hit Insert, and find the command you're after, e.g., Amplify. You can construct macros with multiple steps, but these amplify ones are single-command macros. You can double-click on the command to edit the parameters; for the amplify macros, I select -1 and +1 dB because those are reasonably small steps in volume.
You'll then want to map them to a hotkey. Go to Edit, Preferences, Shortcuts; search for your macro name; and map it to a keyboard shortcut.
And, finally, I also have a few multi-action hotkeys in my Stream Deck (which I'm sure could also be set up as macros and mapped to hotkeys in Audacity). These ones are less important, since I only really hit them once per recording, but:
- Compress (two actions: Select all (Ctrl + A), then open the compressor dialog (Ctrl + Shift + A)).
- Mix and render (two actions: Select all (Ctrl + A), then Mix and render (Ctrl + Shift + M)).
- Save and export (two actions: Save (Ctrl + S), then Export (Ctrl + Shift + E)).
Spending a bit of time setting these up and becoming familiar with them will save a lot of time in the long run.
Editing
Overview
I'm going to talk in generalities here, and these are just my opinions, but you'll probably want your podcast to be relatively dense with information and/or, as it were, "content". This means that your listeners will probably appreciate it if you cut out unnecessarily long pauses or other distractions. I also prefer to remove placeholder words like "ummm" or "like", as long as that doesn't mess up the flow of the conversation. You don't want it to be obvious that you've cut out every filler word, and it's fine to leave some in if the alternative is choppy audio.
Removal of filler words
This can be anywhere from very easy to literally impossible. Somebody that talks relatively slowly, with distinct pauses, is easy to edit because you can delete their "like" and "ummm" without affecting the rest of their sentence. On the other hand, somebody that speaks in a more stream-of-consciousness approach, or just speaks faster in general, will be much harder to edit because it will become very obvious if you've removed some words in the middle of a sentence.
So, be practical. I only remove filler words if they are clearly separated from the rest of the sentence, and can therefore be removed without anyone actually noticing.
Removal of unnecessary pauses
People pause during conversation. This is fine in moderation, but sometimes people really do need to think for a good few seconds before speaking, and in an audio-only podcast, that can be a bit weird for the listeners. So, I'll usually shorten pauses that are more than a second or two, if they're in the middle of a monologue.
You don't want to go too overboard with this, and you don't want to remove the pause entirely. As a listener, we sometimes need pauses to give us time to process whatever the person is saying. So, I'll usually just remove part of a long pause. Say, reduce a three-second pause to a one-second pause.
The thing you need to look out for here is people breathing. When someone's talking, and they pause between sentences, they'll typically breathe in before they start the next sentence. This breath can be pretty audible at times. You can sometimes get around this by simply silencing that breath (or using a noise gate or something, though I haven't had much luck with that). But, if you want to cut or reduce the pause, you'll generally want to cut the first part of the pause, rather than the second part—because the person is audibly breathing in the second part.
This isn't always the case; sometimes people breathe audibly in the first part of the pause. But it's a good general rule, so I default to cutting out the first part of the pause, and only change that approach if I then realise that it sounds weird. And it will sound very weird if you get this wrong: like the person is suddenly gasping for breath.
Removal of dumb shit
Sometimes people say stupid things. And it's also very easy for something you say to be taken out of context. So, anything that sounds stupid in hindsight—or even just a boring digression—should be removed without mercy.
Removal previews
The preview cut shortcut is really useful, because it allows you to select a section of audio to be deleted, and hear the result of that action before you actually hit the delete button.
I use it mostly when I'm uncertain whether deleting a particular section of audio will sound okay or not. (As described above, you don't want to be too gung-ho with deleting audio, because it can be very obvious and therefore jarring to the listeners if not done with care.) On the other hand, if you are pretty sure that deleting that particular section of audio is going to sound just fine (say, if you're deleting one second of a three-second pause), then it's quicker to just delete it, and play the audio manually. (Always listen to the result of your edits!) You can always undo it if it sounds bad.
Silencing other audio sources
I prefer recording in multi-track because it allows you to silence audio sources (i.e., microphones) that aren't currently being used. Unless you have very high-end shotgun microphones, the audio from the person talking will bleed over into the other microphones. Depending on how many mics you have set up, and the acoustics of the room, this can lead to significant reverb that doesn't always sound great.
So, if I can be bothered, I'll usually silence all the audio sources that aren't currently being used. If there are two people in the recording, when person A is talking, I've silenced person B's audio source, and vice versa.
This is also useful when there's some external audio that you want to dampen as much as possible. If your cat goes through the cat flap, that noise will probably be captured by all the mics; if only one person is talking, then muting all the other mics will help to reduce the relative volume of the cat flap.
This can take a lot of extra work for not a huge amount of benefit; it really adds up (but becomes more important) the more mics you have going simultaneously. Again, you could probably set up a noise gate to do this automatically, but I've struggled to do that effectively in my recordings. If you end up doing this, you really start to appreciate the people that avoid interjecting with "yep", "mmhmm", "true" every second or two when the other person is talking...
(De)amplification
The compressor (built into the mixer and/or applied later) will help to even out the amplitude of the recording, so the loud sounds are quietened and the quiet sounds are amplified. But I usually also try to reduce the volume of loud noises, and increase the volume of quiet sections, manually.
Loud sections typically occur when someone (or everyone) is laughing. And the more people are talking or laughing at the same time, the louder the entire recording tends to be. People could also be talking uncharacteristically loudly if they're particularly excited about something. Most people have quite a large range of volumes at which they speak, from quiet to loud. A bit of variation is fine, but I try to even out the extremes.
For particularly loud noises, select the section, and de-amplify it in 1-dB steps until it sounds more balanced. In most instances, this will be 1–3 dB. Laughter tends to be a sudden, very loud occurrence, and isn't a great experience if it's beamed directly into your brain via earplugs. So, be pretty ruthless with reducing the volume of laughter.
People also tend to finish their sentences at a lower volume, so I'll often increase those sections by 1–3 dB, give or take. Just keep in mind that if you increase the volume too much, you'll start pulling in a lot of background noise; there's only so much you can do when someone is speaking very quietly (apart from passive-aggressively shoving the microphone in their face).
Very generally speaking, I aim to have the levels peaking between 12 and 6 dB.
Directorial editing
Sometimes you want to cut entire five- or ten-minute sections, and possibly replace them with some other part of the recording. I cut large sections quite frequently, because our podcast recordings are unscripted (and almost entirely unprepared), so we often go on tangents that would be dead boring for any listeners.
If there's a section that you know should never see the light of day, just delete it. But often, there'll be a five-minute digression that could actually be interesting in a different context, but doesn't make sense where it currently is. In that case, use the split hotkey to surgically remove that part of the recording, and copy and paste it into a new, blank Audacity project to be saved for later use. You can then use the join hotkey to stitch the remaining audio sections back together.
It can be tempting to reorder parts of the recording, particularly if you know that you talked about topic A, then switched to topic B, then switched back to topic A. But it is very difficult to do this in a way that won't be jarring for the listeners. So much of an unscripted podcast conversation is built upon the context of all the previous sentences put together; I usually don't bother with this sort of directorial editing, and just accept that sometimes we'll get off track and then meander our way back to the original topic.
Bleeping out words
If you want to bleep out a Bad Word, do not use a sine wave; use a triangle wave. Hit Generate, Tone, select the triangle waveform, set the frequency to 440 Hz, and play around with the amplitude to get your desired effect. This is a much nicer sound than a regular sine wave, which is pretty jarring on the ears. This advice comes from here.
Also, make sure to apply this tone over the top of the audio you want to replace. If you have three microphone audio sources, and one person says the Bad Word, silence the other two mics and apply the tone over the audio source of the naughty person.
Deleting audio from a multitrack recording
When you have a multitrack recording, it's very easy to accidentally delete a section of audio from most but not all of the tracks. Fortunately, you'll quickly notice that something's wrong, because the remaining audio track will be out of sync with the pruned tracks, and you'll start getting very weird echo effects. Therefore, if you hear a weird echo effect, that's probably the reason—so try to undo recent actions until you've figured out where you went wrong.
Mixing and mastering
Alright, you've edited the shit out of this podcast, and now it's time to turn it into an actual audio file. My process here is extremely rudimentary; these are the most basic of steps.
Apply a compressor to everything, using the default Audacity settings. Select all, and hit the compressor shortcut.
Mix and render to a single track. Select all, hit the mix and render shortcut.
Now we have a single mixed track. Hit the export shortcut. I am not an audio engineer, but I export as .wav
, in mono (definitely avoid stereo for typical conversational podcasts), at a sample rate of 44100 Hz, with 32-bit PCM encoding. This post has some more helpful information.