为任意音频文件生成缩略图

Posted 2023-02-25

技术标签:

【中文标题】为任意音频文件生成缩略图【英文标题】：Generate thumbnail for arbitrary audio file 【发布时间】：2012-02-08 19:41:02 【问题描述】：

我想在最大尺寸为 180×180 像素的图像中表示音频文件。

我想生成这个图像，以便它以某种方式给出音频文件的表示，把它想象成 SoundCloud 的波形（幅度图）？。

我想知道你们中是否有人对此有所了解。我一直在寻找一些东西，主要是“音频可视化”和“音频缩略图”，但我没有找到任何有用的东西。

我首先posted this to ux.stackexchange.com，这是我尝试联系任何从事此工作的程序员。

【问题讨论】：

您想要制作一个工具来执行此操作还是想要一个预先存在的解决方案？那不是频谱图，而是幅度图。音频的频谱图是 3 维的：通常是 x 轴上的时间，y 轴上的频率，以及由颜色表示的幅度。谢谢 Josh Caswell，如您所见，我不确定这种波形表示的名称。 @Koof - 没关系，任何想法都会有所帮助。不客气。认为澄清可能有助于您的搜索。 【参考方案1】：

您还可以将音频分成多个块并测量 RMS（响度测量值）。假设您想要一个 180 像素宽的图像。

我将使用 pydub，这是我围绕 std lib wave 模块编写的轻量级包装器：

from pydub import Audiosegment

# first I'll open the audio file
sound = AudioSegment.from_mp3("some_song.mp3")

# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / 180

loudness_of_chunks = []
for i in range(180):
    start = i * chunk_length
    end = chunk_start + chunk_length

    chunk = sound[start:end]
    loudness_of_chunks.append(chunk.rms)

for 循环可以表示为以下列表理解，我只是想让它清楚：

loudness_of_chunks = [
    sound[ i*chunk_length : (i+1)*chunk_length ].rms
    for i in range(180)]

现在唯一要做的就是将 RMS 缩小到 0 - 180 的比例（因为您希望图像的高度为 180 像素）

max_rms = max(loudness_of_chunks)

scaled_loudness = [ (loudness / max_rms) * 180 for loudness in loudness_of_chunks]

我会把实际像素的绘制留给你，我对 PIL 或 ImageMagik 不是很有经验：/

【讨论】：

首先将max_rms 转换为浮点数。它有帮助。【参考方案2】：

基于 Jiaaro 的回答（感谢您编写 pydub！），并为 web2py 构建，这是我的两分钱：

def generate_waveform():
    img_width = 1170
    img_height = 140
    line_color = 180
    filename = os.path.join(request.folder,'static','sounds','adg3.mp3')


    # first I'll open the audio file
    sound = pydub.AudioSegment.from_mp3(filename)

    # break the sound 180 even chunks (or however
    # many pixels wide the image should be)
    chunk_length = len(sound) / img_width

    loudness_of_chunks = [
        sound[ i*chunk_length : (i+1)*chunk_length ].rms
        for i in range(img_width)
    ]
    max_rms = float(max(loudness_of_chunks))
    scaled_loudness = [ round(loudness * img_height/ max_rms)  for loudness in loudness_of_chunks]

    # now convert the scaled_loudness to an image
    im = Image.new('L',(img_width, img_height),color=255)
    draw = ImageDraw.Draw(im)
    for x,rms in enumerate(scaled_loudness):
        y0 = img_height - rms
        y1 = img_height
        draw.line((x,y0,x,y1), fill=line_color, width=1)
    buffer = cStringIO.StringIO()
    del draw
    im = im.filter(ImageFilter.SMOOTH).filter(ImageFilter.DETAIL)
    im.save(buffer,'PNG')
    buffer.seek(0)
    return response.stream(buffer, filename=filename+'.png')

【讨论】：

以上是关于为任意音频文件生成缩略图的主要内容，如果未能解决你的问题，请参考以下文章