将多通道 PyAudio 转换为 NumPy 数组

Posted 2023-02-18

技术标签:

【中文标题】将多通道 PyAudio 转换为 NumPy 数组【英文标题】：Convert multi-channel PyAudio into NumPy array 【发布时间】：2014-05-03 09:49:13 【问题描述】：

我能找到的所有例子都是单声道的，CHANNELS = 1。如何使用 PyAudio 中的回调方法读取立体声或多声道输入并将其转换为 2D NumPy 数组或多个 1D 数组？

对于单声道输入，这样的工作：

def callback(in_data, frame_count, time_info, status):
    global result
    global result_waiting

    if in_data:
        result = np.fromstring(in_data, dtype=np.float32)
        result_waiting = True
    else:
        print('no input')

    return None, pyaudio.paContinue

stream = p.open(format=pyaudio.paFloat32,
                channels=1,
                rate=fs,
                output=False,
                input=True,
                frames_per_buffer=fs,
                stream_callback=callback)

但不适用于立体声输入，result 数组的长度是原来的两倍，所以我假设通道是交错的，但我找不到这方面的文档。

【问题讨论】：

我正在尝试编写一个数组并使用 PyAudio 播放它。对此有什么想法吗？ @SolessChong 我在下面的答案中添加了函数 【参考方案1】：

它似乎是逐个样本交错的，首先是左声道。左声道输入信号和右声道静音，我得到：

result = [0.2776, -0.0002,  0.2732, -0.0002,  0.2688, -0.0001,  0.2643, -0.0003,  0.2599, ...

所以要将其分离成立体流，重新整形为二维数组：

result = np.fromstring(in_data, dtype=np.float32)
result = np.reshape(result, (frames_per_buffer, 2))

现在访问左声道，使用result[:, 0]，右声道使用result[:, 1]。

def decode(in_data, channels):
    """
    Convert a byte stream into a 2D numpy array with 
    shape (chunk_size, channels)

    Samples are interleaved, so for a stereo stream with left channel 
    of [L0, L1, L2, ...] and right channel of [R0, R1, R2, ...], the output 
    is ordered as [L0, R0, L1, R1, ...]
    """
    # TODO: handle data type as parameter, convert between pyaudio/numpy types
    result = np.fromstring(in_data, dtype=np.float32)

    chunk_length = len(result) / channels
    assert chunk_length == int(chunk_length)

    result = np.reshape(result, (chunk_length, channels))
    return result


def encode(signal):
    """
    Convert a 2D numpy array into a byte stream for PyAudio

    Signal should be a numpy array with shape (chunk_size, channels)
    """
    interleaved = signal.flatten()

    # TODO: handle data type as parameter, convert between pyaudio/numpy types
    out_data = interleaved.astype(np.float32).tostring()
    return out_data

【讨论】：

非常有帮助。与this question部分相关 For using other data formats for audio ancoding（例如np.int16） interleaved 是什么意思？我玩过这些东西，flatten 函数实际上是一个解决方案，但是 flatten 没有参数将二维数组展平为一个维度，但第一行的所有值都在第二行的所有值之前。在numpy documentation 中，我发现您可以提供'F' 字符作为第一个参数，它会以我们期望的方式执行展平。它等同于您的interleaved.astype(np.float32).tostring() 电话吗？如果是，它看起来是最简单的解决方案。 @pt12lol 正如它所说，“样本是交错的，所以对于左声道为 [L0, L1, L2, ...] 和右声道为 [R0, R1, R2, ...]，输出排序为 [L0, R0, L1, R1, ...]" @endolith 我刚刚测试了 Numpy 的 flatten 方法，@pt12lol 是正确的，'F' 是实际交错二维数组所必需的。您的encode 方法会将所有左声道放在右声道之前，例如 [L0, L1, ..., R0, R1, ...]

以上是关于将多通道 PyAudio 转换为 NumPy 数组的主要内容，如果未能解决你的问题，请参考以下文章

将numpy数组转换为rgb图像

Python：使用 PyAudio（或其他东西）的实时音频流？

将 Numpy 数组转换为 OpenCV 数组

Python 3：将波形数据（字节数组）转换为浮点值的 numpy 数组

numpy with python：将3d数组转换为2d

python中的wave库：2个通道和2个样本宽度