Using Sox on MacOS for Audiobook Narrators
Updated: Aug 28, 2022
If you are an audiobook narrator worth your salt you'll know exactly what I mean when I namedrop 2ndOpinion. It's an awesome program written by Steven Jay Cohen and available from his website. I used it for around 18 months to check the audio quality of my audio files before uploading them to the distribution platform (like ACX) so that I had confidence that the standard of audio that I was submitting would at meet specifications.
It worked really well, and it absolutely improved my productivity, speed and consistency. There were some small niggles but recent releases by Steven have fixed all of them, I cannot fault it. However, as an software engineer I like to write scripts to automate processes, and well... I just couldn't help myself.
I'd been in the software industry for 25 years before I started producing audiobooks. I'm used to creating tools in software to save time, improve consistency, and automate jobs for me. An internet search found a command line utility called SoX Sound Exchange and it looked like a tool I could use. It's not super straight-forward to build the command line incantation but once you have it down it's super useful.
Let's just recap the minimum audio requirements for audio files set by ACX:
192 kbps or higher MP3, constant bit rate (CBR) at 44.1 kHz.
Average loudness must be between -23dB and -18dB RMS.
No peak values can exceed -3dBs.
Have a maximum -60dB noise floor.
All files must be the same channel format (All mono or all stereo files)
Each file must have 0.5 to 1 second of room tone at its beginning and 1 to 5 seconds of room tone at its end
The example below takes an mp3 file called DeborahBalm_HerFathersLegacy.mp3 and gets the audio stats.
I have underlined the key interesting measurements for audiobook producers.
Pk lev dB is the standard peak sound in dBFS in your file.
RMS lev dB is the RMS level in dBFS
RMS Pk dB is the peak value for RMS - there is no explicit ACX spec for this measurement
RMS Tr dB is the trough value for RMS and is an approximation of the file's noise floor
Length s is the file running time in seconds
SoX comes bundled with a utility called soxi which give slightly different statistics and the output is very slightly easier to decipher than sox itself.
It shows you the Sample Rate, Channels (showing mono or stereo)
SoX also has a play utility which is more useful that you might think at first glance. Want to play only the last 10 seconds of a file?
And if you start to combine this with shell script things get more interesting. What if you wanted a really basic fast ear check of all the files in a set? How about we play the first 2 seconds and last 10 seconds of each chapter of a book?
for file in DeborahBalm_HerFathersLegacy*.mp3; do echo $file ; play $file trim 0 0:2 -0:10; done
Or maybe you want to add your head and tail roomtone to your file?
This command sandwiches my RoomTone head and tail files at either end of my audiofile to create a new file called DeborahBalm_HerFathersLegacy_RT.mp3
sox RoomTone_0_7_sec.mp3 DeborahBalm_HerFathersLegacy.mp3 RoomTone_3_6sec.mp3 DeborahBalm_HerFathersLegacy_RT.mp3
Want Sox rightaway? Here are some instructions on how to install homebrew (Mac package manager) and then Sox. Here are also some instructions on how to install on Linux or Windows, although being a Mac User, I haven't verified them.
What else can I do with SoX?
It's possible to call Sox libraries from other script languages such as Python. This is where it becomes even more powerful and you can build tools to fit your exact processing requirements.
There are additional Python libraries written as wrappers around Sox available in Github https://pypi.org/project/sox/ is one I have used but it does not have all the utilities I needed and I just make os command calls out to sox instead. Beware if you use the OS call that output goes to STDERR not STDOUT. Here is a great article with some more suggestions on how you can use sox to work with audio files.
ACX Audio Analysis
If you submit work for ACX they now offer basic audio analysis of uploaded files which is a huge new step forward for ensuring that you don't have your book sat in waiting for QA for 2 months only to get it kicked back. For me though, I want to know that my files are all good before I waste time and bandwidth uploading.