Introduction.
Audio file situations often involve large data volumes: if you record a meeting or interview that lasts more than an hour, you often run into capacity limits when uploading that file to a transcription service. The size of the file can also be a bottleneck when storing and sharing it in the cloud or transferring it to a smartphone. Compression of audio files is therefore essential.
This article shows how to use Python’s pydub
library to reduce the size of audio files while maintaining their quality.
The challenges (hassle and limitations) of free software will be touched upon, while user modifiable factors such as sample rate and bit rate will also be explained.
Scenes where audio file compression is required
Upload restrictions for transcription
- The file size of long audio (more than 1 hour) is large, and the transcription service is caught by the upload limit (usually 500MB to 1GB).
Cloud storage and sharing
- When you want to share files quickly and avoid the capacity limitations of Google Drive or emailing.
Transfer to smartphones and tablets
- Need to be lightweight for playback on devices with small capacity.
Management of long recordings
- When efficiently storing and organizing large amounts of recorded data from meetings, interviews, lectures, etc.
Inconveniences and Challenges of Conversion Free Software
However, when using many free sites, you may face the following problems
Claiming to be “free” but leading you to a paid plan
In many cases, even if the conversion appears to be free at first, there is a capacity limit or a limit on the number of times the conversion can be performed, and “billing required” is indicated in the middle of the conversion.
Batch conversion of multiple files is not possible, requiring manual uploading of one file at a time.
Many free online tools allow only one file at a time to be uploaded, so when compressing multiple audio files, the user must repeat the process of uploading, compressing, and downloading.
To solve these problems, I will show you how to compress audio files quickly and on your own computer using a simple script in Python.
Procedure for audio compression in Python
Required Preparation
First, install Pydub, the required library. Execute the following command
pip install pydub
Pydub also requires ffmpeg, an audio processing library. Install ffmpeg with the following command (example using Homebrew).
Mac:.
brew install ffmpeg
Windows: Windows
Download the installer from the official ffmpeg website and pass it through.
Example conversion script to compress audio files
The following script allows you to compress a batch of audio files in a specified directory, freely adjusting the sample rate and bit rate. The paths are set up in a generic way so that anyone can easily modify them to suit their own environment.
from pydub import AudioSegment
import os
# 入出力ディレクトリの設定
input_directory = "./input_audio"
output_directory = "./compressed_audio"
os.makedirs(output_directory, exist_ok=True)
# MP3ファイルの一覧を取得
mp3_files = [f for f in os.listdir(input_directory) if f.endswith('.mp3')]
# 音声ファイルの圧縮処理
for mp3_file in mp3_files:
audio = AudioSegment.from_mp3(os.path.join(input_directory, mp3_file))
# ユーザーが変更できる部分(サンプルレートとビットレート)
audio = audio.set_frame_rate(20000) # サンプルレートを20kHzに変更
output_file = os.path.join(output_directory, mp3_file)
audio.export(output_file, format="mp3", bitrate="36k") # ビットレート36kbpsで保存
print(f"Compressed file saved as: {output_file}")
User changeable values: sample rate, bit rate
sample rate
Indicates how many times per second audio data is sampled (recorded). Example: 44.1 kHz means 44,100 samples are recorded per second. Higher sample rate = smoother, higher quality audio.
20 kHz (recommended) → Maintains speech intelligibility.
- Example of adjustment: Conversation is understandable even when turned down to 16 kHz or 12 kHz.
- Default value: 44.1 kHz (for music and high-quality audio).
Since music and high-quality audio are often provided in 44.1 kHz mono WAV format, it is effective to convert WAV files to MP3 and further compress them at a lower sample rate and bit rate to maintain quality while reducing size.
Please see this article regarding “How to convert [WAV to MP3] for free: a script that can be easily realized in Python.
bit rate
Indicates the amount of data used per second, which affects sound quality and file size.
Example: 128 kbps uses 128 kilobits of data per second. High bit rate = high sound quality but large size.
- Adjustment example: Maintain high sound quality at 64 kbps or 128 kbps.
- Recommended: 36-64 kbps is sufficient for mainly conversational use.
Interview AI can transcribe an hour-long audio file in as little as 15 seconds, after which the AI automatically performs “filtering” and “sentence alignment” as well.
If you are caught in the upload capacity limit, try compressing the audio files in this article and then uploading the audio files to Interview AI for automatic transcription.
summary
Python
‘s pydub
can be used to efficiently compress audio files to optimize their size while maintaining quality. Users are free to set the sample rate and bit rate according to their needs, giving them more control and flexibility than online tools.
Manage your audio data more efficiently by freeing yourself from cumbersome compression tasks!