在 Python 中检测和录制音频

Posted 2023-02-16

技术标签:

【中文标题】在 Python 中检测和录制音频【英文标题】：Detect & Record Audio in Python 【发布时间】：2010-10-27 21:23:43 【问题描述】：

我需要将音频剪辑捕获为 WAV 文件，然后我可以将其传递给另一位 python 进行处理。问题是我需要确定何时存在音频然后记录它，当它静音时停止，然后将该文件传递给处理模块。

我认为使用 wave 模块应该可以检测何时存在纯静音并在检测到除静音以外的其他内容时立即丢弃它开始录制，然后当线路再次静音时停止录制。

只是无法完全理解它，任何人都可以让我从一个基本示例开始。

【问题讨论】：

【参考方案1】：

我相信 WAVE 模块不支持录制，只处理现有文件。您可能想查看PyAudio 以进行实际录制。 WAV 是世界上最简单的文件格式。在 paInt16 中，您只会得到一个表示级别的有符号整数，并且越接近 0 越安静。我不记得WAV文件是先高字节还是低字节，但是这样的东西应该可以工作（对不起，我不是真正的python程序员：

from array import array

# you'll probably want to experiment on threshold
# depends how noisy the signal
threshold = 10 
max_value = 0

as_ints = array('h', data)
max_value = max(as_ints)
if max_value > threshold:
    # not silence

PyAudio 录音代码留作参考：

import pyaudio
import sys

chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS, 
                rate=RATE, 
                input=True,
                output=True,
                frames_per_buffer=chunk)

print "* recording"
for i in range(0, 44100 / chunk * RECORD_SECONDS):
    data = stream.read(chunk)
    # check for silence here by comparing the level with 0 (or some threshold) for 
    # the contents of data.
    # then write data or not to a file

print "* done"

stream.stop_stream()
stream.close()
p.terminate()

【讨论】：

谢谢尼克，是的，我应该说我也在使用 portaudio 进行捕获，我坚持的一点是检查静音，如何获得数据块中的级别？我在上面添加了一些非常简单的未经测试的代码，但它应该可以完成你想要的工作我以前的版本有一个错误，没有正确处理标志。我现在已经使用库函数 array() 正确解析了 WAV 文件格式是一个容器，它可能包含通过各种编解码器（如 GSM 或 MP3）编码的音频，其中一些远非“世界上最简单的”。我相信打开流时的选项“output=True”对于录制来说不是必需的，而且它似乎会在我的设备上导致“IOError: [Errno Input overflowed] -9981”。否则感谢代码示例，它非常有帮助。【参考方案2】：

您可能还想查看csounds。它有几个 API，包括 Python。它可能能够与 A-D 界面交互并收集声音样本。

【讨论】：

【参考方案3】：

作为 Nick Fortescue 回答的后续，这里有一个更完整的示例，说明如何从麦克风录制并处理结果数据：

from sys import byteorder
from array import array
from struct import pack

import pyaudio
import wave

THRESHOLD = 500
CHUNK_SIZE = 1024
FORMAT = pyaudio.paInt16
RATE = 44100

def is_silent(snd_data):
    "Returns 'True' if below the 'silent' threshold"
    return max(snd_data) < THRESHOLD

def normalize(snd_data):
    "Average the volume out"
    MAXIMUM = 16384
    times = float(MAXIMUM)/max(abs(i) for i in snd_data)

    r = array('h')
    for i in snd_data:
        r.append(int(i*times))
    return r

def trim(snd_data):
    "Trim the blank spots at the start and end"
    def _trim(snd_data):
        snd_started = False
        r = array('h')

        for i in snd_data:
            if not snd_started and abs(i)>THRESHOLD:
                snd_started = True
                r.append(i)

            elif snd_started:
                r.append(i)
        return r

    # Trim to the left
    snd_data = _trim(snd_data)

    # Trim to the right
    snd_data.reverse()
    snd_data = _trim(snd_data)
    snd_data.reverse()
    return snd_data

def add_silence(snd_data, seconds):
    "Add silence to the start and end of 'snd_data' of length 'seconds' (float)"
    silence = [0] * int(seconds * RATE)
    r = array('h', silence)
    r.extend(snd_data)
    r.extend(silence)
    return r

def record():
    """
    Record a word or words from the microphone and 
    return the data as an array of signed shorts.

    Normalizes the audio, trims silence from the 
    start and end, and pads with 0.5 seconds of 
    blank sound to make sure VLC et al can play 
    it without getting chopped off.
    """
    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=1, rate=RATE,
        input=True, output=True,
        frames_per_buffer=CHUNK_SIZE)

    num_silent = 0
    snd_started = False

    r = array('h')

    while 1:
        # little endian, signed short
        snd_data = array('h', stream.read(CHUNK_SIZE))
        if byteorder == 'big':
            snd_data.byteswap()
        r.extend(snd_data)

        silent = is_silent(snd_data)

        if silent and snd_started:
            num_silent += 1
        elif not silent and not snd_started:
            snd_started = True

        if snd_started and num_silent > 30:
            break

    sample_width = p.get_sample_size(FORMAT)
    stream.stop_stream()
    stream.close()
    p.terminate()

    r = normalize(r)
    r = trim(r)
    r = add_silence(r, 0.5)
    return sample_width, r

def record_to_file(path):
    "Records from the microphone and outputs the resulting data to 'path'"
    sample_width, data = record()
    data = pack('<' + ('h'*len(data)), *data)

    wf = wave.open(path, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(sample_width)
    wf.setframerate(RATE)
    wf.writeframes(data)
    wf.close()

if __name__ == '__main__':
    print("please speak a word into the microphone")
    record_to_file('demo.wav')
    print("done - result written to demo.wav")

【讨论】：

要在 Python 3 中进行这项工作，只需将 xrange 替换为 range。很好的例子！当我试图了解如何使用 Python 录制语音时，它真的很有用。我有一个快速的问题是是否有一种方法来定义录音的时间段。现在它记录一个单词？我可以玩它并有一个记录期，例如10秒？谢谢！检测和归一化不正确，因为它们计算的是字节而不是短路。该缓冲区必须在处理之前转换为 numpy 数组。 xrange 和 range 在add_silence 中都不是必需的（所以现在它已经消失了）。我认为 Arek 可能在这里有所作为——从沉默到“词”的过渡听起来太生涩了。我认为还有其他答案也可以解决这个问题。对于它的价值，这里指出这个sn-p现在可能与静音装饰部分有问题：***.com/questions/64491394/…（由没有积分的新人评论在这里发表评论）我自己没有测试它，所以我只是转发这个信息。【参考方案4】：

pyaudio 网站上有许多非常简短明了的示例： http://people.csail.mit.edu/hubert/pyaudio/

2019 年 12 月 14 日更新 - 2017 年上述链接网站的主要示例：


"""PyAudio Example: Play a WAVE file."""

import pyaudio
import wave
import sys

CHUNK = 1024

if len(sys.argv) < 2:
    print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
    sys.exit(-1)

wf = wave.open(sys.argv[1], 'rb')

p = pyaudio.PyAudio()

stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                channels=wf.getnchannels(),
                rate=wf.getframerate(),
                output=True)

data = wf.readframes(CHUNK)

while data != '':
    stream.write(data)
    data = wf.readframes(CHUNK)

stream.stop_stream()
stream.close()

p.terminate()

【讨论】：

问题不是播放音频，而是录音+检测和消除静音。【参考方案5】：

感谢 cryo 的改进版本，我基于以下测试代码：

#Instead of adding silence at start and end of recording (values=0) I add the original audio . This makes audio sound more natural as volume is >0. See trim()
#I also fixed issue with the previous code - accumulated silence counter needs to be cleared once recording is resumed.

from array import array
from struct import pack
from sys import byteorder
import copy
import pyaudio
import wave

THRESHOLD = 500  # audio levels not normalised.
CHUNK_SIZE = 1024
SILENT_CHUNKS = 3 * 44100 / 1024  # about 3sec
FORMAT = pyaudio.paInt16
FRAME_MAX_VALUE = 2 ** 15 - 1
NORMALIZE_MINUS_ONE_dB = 10 ** (-1.0 / 20)
RATE = 44100
CHANNELS = 1
TRIM_APPEND = RATE / 4

def is_silent(data_chunk):
    """Returns 'True' if below the 'silent' threshold"""
    return max(data_chunk) < THRESHOLD

def normalize(data_all):
    """Amplify the volume out to max -1dB"""
    # MAXIMUM = 16384
    normalize_factor = (float(NORMALIZE_MINUS_ONE_dB * FRAME_MAX_VALUE)
                        / max(abs(i) for i in data_all))

    r = array('h')
    for i in data_all:
        r.append(int(i * normalize_factor))
    return r

def trim(data_all):
    _from = 0
    _to = len(data_all) - 1
    for i, b in enumerate(data_all):
        if abs(b) > THRESHOLD:
            _from = max(0, i - TRIM_APPEND)
            break

    for i, b in enumerate(reversed(data_all)):
        if abs(b) > THRESHOLD:
            _to = min(len(data_all) - 1, len(data_all) - 1 - i + TRIM_APPEND)
            break

    return copy.deepcopy(data_all[_from:(_to + 1)])

def record():
    """Record a word or words from the microphone and 
    return the data as an array of signed shorts."""

    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, output=True, frames_per_buffer=CHUNK_SIZE)

    silent_chunks = 0
    audio_started = False
    data_all = array('h')

    while True:
        # little endian, signed short
        data_chunk = array('h', stream.read(CHUNK_SIZE))
        if byteorder == 'big':
            data_chunk.byteswap()
        data_all.extend(data_chunk)

        silent = is_silent(data_chunk)

        if audio_started:
            if silent:
                silent_chunks += 1
                if silent_chunks > SILENT_CHUNKS:
                    break
            else: 
                silent_chunks = 0
        elif not silent:
            audio_started = True              

    sample_width = p.get_sample_size(FORMAT)
    stream.stop_stream()
    stream.close()
    p.terminate()

    data_all = trim(data_all)  # we trim before normalize as threshhold applies to un-normalized wave (as well as is_silent() function)
    data_all = normalize(data_all)
    return sample_width, data_all

def record_to_file(path):
    "Records from the microphone and outputs the resulting data to 'path'"
    sample_width, data = record()
    data = pack('<' + ('h' * len(data)), *data)

    wave_file = wave.open(path, 'wb')
    wave_file.setnchannels(CHANNELS)
    wave_file.setsampwidth(sample_width)
    wave_file.setframerate(RATE)
    wave_file.writeframes(data)
    wave_file.close()

if __name__ == '__main__':
    print("Wait in silence to begin recording; wait in silence to terminate")
    record_to_file('demo.wav')
    print("done - result written to demo.wav")

【讨论】：

谢谢，效果很好。就我而言，我必须将return copy.deepcopy(data_all[_from:(_to + 1)]) 编辑为copy.deepcopy(data_all[int(_from):(int(_to) + 1)]) 需要 lukassliacky 建议的修复程序才能让这个非常好的解决方案发挥作用，应该接受编辑。【参考方案6】：

import pyaudio
import wave
from array import array

FORMAT=pyaudio.paInt16
CHANNELS=2
RATE=44100
CHUNK=1024
RECORD_SECONDS=15
FILE_NAME="RECORDING.wav"

audio=pyaudio.PyAudio() #instantiate the pyaudio

#recording prerequisites
stream=audio.open(format=FORMAT,channels=CHANNELS, 
                  rate=RATE,
                  input=True,
                  frames_per_buffer=CHUNK)

#starting recording
frames=[]

for i in range(0,int(RATE/CHUNK*RECORD_SECONDS)):
    data=stream.read(CHUNK)
    data_chunk=array('h',data)
    vol=max(data_chunk)
    if(vol>=500):
        print("something said")
        frames.append(data)
    else:
        print("nothing")
    print("\n")


#end of recording
stream.stop_stream()
stream.close()
audio.terminate()
#writing to file
wavfile=wave.open(FILE_NAME,'wb')
wavfile.setnchannels(CHANNELS)
wavfile.setsampwidth(audio.get_sample_size(FORMAT))
wavfile.setframerate(RATE)
wavfile.writeframes(b''.join(frames))#append frames recorded to file
wavfile.close()

我认为这会有所帮助。这是一个简单的脚本，可以检查是否有静音。如果检测到静音，则不会录制，否则会录制。

【讨论】：

以上是关于在 Python 中检测和录制音频的主要内容，如果未能解决你的问题，请参考以下文章