如何使用 pyglet 播放流音频？

Posted 2023-03-14

技术标签:

【中文标题】如何使用 pyglet 播放流音频？【英文标题】：How to play streaming audio using pyglet? 【发布时间】：2018-11-22 08:56:58 【问题描述】：

这个问题的目的是试图弄清楚如何使用 pyglet 播放流音频。第一个只是确保您能够使用 pyglet 播放 mp3 文件，这就是第一个 sn-p 的目的：

import sys
import inspect
import requests

import pyglet
from pyglet.media import *

pyglet.lib.load_library('avbin')
pyglet.have_avbin = True


def url_to_filename(url):
    return url.split('/')[-1]


def download_file(url, filename=None):
    filename = filename or url_to_filename(url)

    with open(filename, "wb") as f:
        print("Downloading %s" % filename)
        response = requests.get(url, stream=True)
        total_length = response.headers.get('content-length')

        if total_length is None:
            f.write(response.content)
        else:
            dl = 0
            total_length = int(total_length)
            for data in response.iter_content(chunk_size=4096):
                dl += len(data)
                f.write(data)
                done = int(50 * dl / total_length)
                sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50 - done)))
                sys.stdout.flush()


url = "https://freemusicarchive.org/file/music/ccCommunity/DASK/Abiogenesis/DASK_-_08_-_Protocell.mp3"
filename = "mcve.mp3"
download_file(url, filename)

music = pyglet.media.load(filename)
music.play()
pyglet.app.run()

如果您已经安装了库 pip install pyglet requests 并且此时还安装了 AVBin，您应该能够在下载 mp3 后收听它。

一旦我们达到了这一点，我想弄清楚如何以与大多数使用 pyglet+requests 的现有网络视频/音频播放器类似的方式播放和缓冲文件。这意味着无需等待文件完全下载即可播放文件。

阅读 pyglet 媒体docs 后，您可以看到这些类可用：

media
    sources
        base
            AudioData
            AudioFormat
            Source
            SourceGroup
            SourceInfo
            StaticSource
            StreamingSource
            VideoFormat
    player
        Player
        PlayerGroup

我看到还有另一个类似的 SO 问题，但它们没有得到正确解决，并且它们的内容没有提供很多相关细节：

Play streaming audio using pyglet How can I play audio stream without saving it into the file with pyglet?

这就是我创建一个新问题的原因。你你能提供一个使用上述 mcve 作为基础的小例子吗？

【问题讨论】：

您是否尝试过您在任务中发布的最后一个链接，如果是，是否会产生相同的结果？如果你修改/usr/local/lib/python3.5/dist-packages/pyglet/media/__init__.py 的~698 行呢？看看为什么正确的操作数是 NoneType？ :) 我猜我们也需要从那里开始并弄清楚发生了什么。我最好的选择是因为他们试图多次排队同一个来源。 A source that is decoded as it is being played, and can only be queued once.——官方文档。在那里，做一次player.queue(source) 和player.play() 可能是这里唯一的问题。供参考：pyglet.readthedocs.io/en/pyglet-1.2-maintenance/api/pyglet/… @Torxed 是的，我一直在使用那个 sn-p 一段时间，但我还没有弄清楚所有 pyglet 类是如何组合在一起的，这就是为什么一段时间后决定打开这个新的希望有人能带来一些启发，让我试试你的修改抱歉，我现在不能再使用了，暂时退出，但我想我会在这个问题上投入两分钱。我很确定我已经做过类似的事情，但决定切换到 pyaudio 或其他更高级的任务，并且只控制来自 pyglet 的音频 I/O。从长远来看，我可以用声音调制和其他音频库做更多的事情，但是 pyglet 在播放音频源方面做得很好（但通用）。 @Torxed 好的，如果你最终有时间试一试这个，请成为我的客人 :) 。我可以在您的个人资料中看到您是 pyglet 的专家，所以... ;) 仅作记录，我的目标是使用流式 Web 内容，如视频/音频，在 gpu (glsl) 上使用...所以如果你知道在 python 上使用更好的解决方案（实时声音分析）只是让我知道......我记得几个月前我使用 pyaudio 和一些自定义 python 合成器，它非常慢：/......所以可能是我的最终解决方案将是一些 c/c++ python 包装器，无论如何，暂时 pyglet 很好：D 【参考方案1】：

假设您不想导入一个新包来为您执行此操作 - 这可以通过一些努力来完成。

首先，让我们看看 Pyglet 源代码，看看media/__init__.py 中的media.load。

"""Load a Source from a file.

All decoders that are registered for the filename extension are tried.
If none succeed, the exception from the first decoder is raised.
You can also specifically pass a decoder to use.

:Parameters:
    `filename` : str
        Used to guess the media format, and to load the file if `file` is
        unspecified.
    `file` : file-like object or None
        Source of media data in any supported format.
    `streaming` : bool
        If `False`, a :class:`StaticSource` will be returned; otherwise
        (default) a :class:`~pyglet.media.StreamingSource` is created.
    `decoder` : MediaDecoder or None
        A specific decoder you wish to use, rather than relying on
        automatic detection. If specified, no other decoders are tried.

:rtype: StreamingSource or Source
"""
if decoder:
    return decoder.decode(file, filename, streaming)
else:
    first_exception = None
    for decoder in get_decoders(filename):
        try:
            loaded_source = decoder.decode(file, filename, streaming)
            return loaded_source
        except MediaDecodeException as e:
            if not first_exception or first_exception.exception_priority < e.exception_priority:
                first_exception = e

    # TODO: Review this:
    # The FFmpeg codec attempts to decode anything, so this codepath won't be reached.
    if not first_exception:
        raise MediaDecodeException('No decoders are available for this media format.')
    raise first_exception


add_default_media_codecs()

这里的关键线是loaded_source = decoder.decode(...)。本质上，为了加载音频，Pyglet 需要一个文件并将其拖到媒体解码器（例如 FFMPEG），然后返回一个 Pyglet 可以使用内置 Player 类播放的“帧”或数据包列表。如果音频格式被压缩（例如 mp3 或 aac），Pyglet 将使用外部库（目前仅支持 AVBin）将其转换为原始的解压缩音频。您可能已经知道其中的一些。

因此，如果我们想了解如何将字节流填充到 Pyglet 的音频引擎而不是文件中，我们需要看看其中一个解码器。对于这个例子，让我们使用 FFMPEG，因为它是最容易访问的。

在media/codecs/ffmpeg.py:

class FFmpegDecoder(object):

def get_file_extensions(self):
    return ['.mp3', '.ogg']

def decode(self, file, filename, streaming):
    if streaming:
        return FFmpegSource(filename, file)
    else:
        return StaticSource(FFmpegSource(filename, file))

它继承的“对象”是MediaDecoder，在media/codecs/__init__.py 中找到。回到media/__init__.py中的load函数，你会看到pyglet会根据文件扩展名选择一个MediaDecoder，然后以文件为参数返回它的decode函数，以数据包流的形式获取音频.该数据包流是Source 对象；每个解码器都有自己的风格，以 StaticSource 或 StreamingSource 的形式。前者用于将音频存储在内存中，后者用于立即播放。 FFmpeg 的解码器只支持 StreamingSource。

我们可以看到FFMPEG是FFmpegSource，同样位于media/codecs/ffmpeg.py。我们发现这个类的巨人：

class FFmpegSource(StreamingSource):
# Max increase/decrease of original sample size
SAMPLE_CORRECTION_PERCENT_MAX = 10

def __init__(self, filename, file=None):
    if file is not None:
        raise NotImplementedError('Loading from file stream is not supported')

    self._file = ffmpeg_open_filename(asbytes_filename(filename))
    if not self._file:
        raise FFmpegException('Could not open "0"'.format(filename))

    self._video_stream = None
    self._video_stream_index = None
    self._audio_stream = None
    self._audio_stream_index = None
    self._audio_format = None

    self.img_convert_ctx = POINTER(SwsContext)()
    self.audio_convert_ctx = POINTER(SwrContext)()

    file_info = ffmpeg_file_info(self._file)

    self.info = SourceInfo()
    self.info.title = file_info.title
    self.info.author = file_info.author
    self.info.copyright = file_info.copyright
    self.info.comment = file_info.comment
    self.info.album = file_info.album
    self.info.year = file_info.year
    self.info.track = file_info.track
    self.info.genre = file_info.genre

    # Pick the first video and audio streams found, ignore others.
    for i in range(file_info.n_streams):
        info = ffmpeg_stream_info(self._file, i)

        if isinstance(info, StreamVideoInfo) and self._video_stream is None:

            stream = ffmpeg_open_stream(self._file, i)

            self.video_format = VideoFormat(
                width=info.width,
                height=info.height)
            if info.sample_aspect_num != 0:
                self.video_format.sample_aspect = (
                    float(info.sample_aspect_num) /
                    info.sample_aspect_den)
            self.video_format.frame_rate = (
                float(info.frame_rate_num) /
                info.frame_rate_den)
            self._video_stream = stream
            self._video_stream_index = i

        elif (isinstance(info, StreamAudioInfo) and
                      info.sample_bits in (8, 16) and
                      self._audio_stream is None):

            stream = ffmpeg_open_stream(self._file, i)

            self.audio_format = AudioFormat(
                channels=min(2, info.channels),
                sample_size=info.sample_bits,
                sample_rate=info.sample_rate)
            self._audio_stream = stream
            self._audio_stream_index = i

            channel_input = avutil.av_get_default_channel_layout(info.channels)
            channels_out = min(2, info.channels)
            channel_output = avutil.av_get_default_channel_layout(channels_out)

            sample_rate = stream.codec_context.contents.sample_rate
            sample_format = stream.codec_context.contents.sample_fmt
            if sample_format in (AV_SAMPLE_FMT_U8, AV_SAMPLE_FMT_U8P):
                self.tgt_format = AV_SAMPLE_FMT_U8
            elif sample_format in (AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_S16P):
                self.tgt_format = AV_SAMPLE_FMT_S16
            elif sample_format in (AV_SAMPLE_FMT_S32, AV_SAMPLE_FMT_S32P):
                self.tgt_format = AV_SAMPLE_FMT_S32
            elif sample_format in (AV_SAMPLE_FMT_FLT, AV_SAMPLE_FMT_FLTP):
                self.tgt_format = AV_SAMPLE_FMT_S16
            else:
                raise FFmpegException('Audio format not supported.')

            self.audio_convert_ctx = swresample.swr_alloc_set_opts(None,
                                                                   channel_output,
                                                                   self.tgt_format, sample_rate,
                                                                   channel_input, sample_format,
                                                                   sample_rate,
                                                                   0, None)
            if (not self.audio_convert_ctx or
                        swresample.swr_init(self.audio_convert_ctx) < 0):
                swresample.swr_free(self.audio_convert_ctx)
                raise FFmpegException('Cannot create sample rate converter.')

    self._packet = ffmpeg_init_packet()
    self._events = []  # They don't seem to be used!

    self.audioq = deque()
    # Make queue big enough to accomodate 1.2 sec?
    self._max_len_audioq = 50  # Need to figure out a correct amount
    if self.audio_format:
        # Buffer 1 sec worth of audio
        self._audio_buffer = \
            (c_uint8 * ffmpeg_get_audio_buffer_size(self.audio_format))()

    self.videoq = deque()
    self._max_len_videoq = 50  # Need to figure out a correct amount

    self.start_time = self._get_start_time()
    self._duration = timestamp_from_ffmpeg(file_info.duration)
    self._duration -= self.start_time

    # Flag to determine if the _fillq method was already scheduled
    self._fillq_scheduled = False
    self._fillq()
    # Don't understand why, but some files show that seeking without
    # reading the first few packets results in a seeking where we lose
    # many packets at the beginning. 
    # We only seek back to 0 for media which have a start_time > 0
    if self.start_time > 0:
        self.seek(0.0)
---
[A few hundred lines more...]
---

def get_next_video_timestamp(self):
    if not self.video_format:
        return

    if self.videoq:
        while True:
            # We skip video packets which are not video frames
            # This happens in mkv files for the first few frames.
            video_packet = self.videoq[0]
            if video_packet.image == 0:
                self._decode_video_packet(video_packet)
            if video_packet.image is not None:
                break
            self._get_video_packet()

        ts = video_packet.timestamp
    else:
        ts = None

    if _debug:
        print('Next video timestamp is', ts)
    return ts

def get_next_video_frame(self, skip_empty_frame=True):
    if not self.video_format:
        return

    while True:
        # We skip video packets which are not video frames
        # This happens in mkv files for the first few frames.
        video_packet = self._get_video_packet()
        if video_packet.image == 0:
            self._decode_video_packet(video_packet)
        if video_packet.image is not None or not skip_empty_frame:
            break

    if _debug:
        print('Returning', video_packet)

    return video_packet.image

def _get_start_time(self):
    def streams():
        format_context = self._file.context
        for idx in (self._video_stream_index, self._audio_stream_index):
            if idx is None:
                continue
            stream = format_context.contents.streams[idx].contents
            yield stream

    def start_times(streams):
        yield 0
        for stream in streams:
            start = stream.start_time
            if start == AV_NOPTS_VALUE:
                yield 0
            start_time = avutil.av_rescale_q(start,
                                             stream.time_base,
                                             AV_TIME_BASE_Q)
            start_time = timestamp_from_ffmpeg(start_time)
            yield start_time

    return max(start_times(streams()))

@property
def audio_format(self):
    return self._audio_format

@audio_format.setter
def audio_format(self, value):
    self._audio_format = value
    if value is None:
        self.audioq.clear()

您在这里感兴趣的行是self._file = ffmpeg_open_filename(asbytes_filename(filename))。这将我们带到这里，再次在media/codecs/ffmpeg.py：

def ffmpeg_open_filename(filename):
"""Open the media file.

:rtype: FFmpegFile
:return: The structure containing all the information for the media.
"""
file = FFmpegFile()  # TODO: delete this structure and use directly AVFormatContext
result = avformat.avformat_open_input(byref(file.context),
                                      filename,
                                      None,
                                      None)
if result != 0:
    raise FFmpegException('Error opening file ' + filename.decode("utf8"))

result = avformat.avformat_find_stream_info(file.context, None)
if result < 0:
    raise FFmpegException('Could not find stream info')

return file

这就是事情变得混乱的地方：它调用一个 ctypes 函数（avformat_open_input），当给定一个文件时，它将获取其详细信息并填写我们的 FFmpegSource 类所需的所有信息。通过一些工作，您应该能够让 avformat_open_input 获取一个字节对象，而不是一个文件路径，它将打开该文件以获得相同的信息。我很想这样做并包含一个工作示例，但我现在没有时间。然后，您需要使用新的 avformat_open_input 函数创建一个新的 ffmpeg_open_filename 函数，然后使用新的 ffmpeg_open_filename 函数创建一个新的 FFmpegSource 类。您现在只需要一个使用新 FFmpegSource 类的新 FFmpegDecoder 类。

然后您可以通过直接将其添加到您的 pyglet 包中来实现它。之后，您希望在 load() 函数（位于 media/__init__.py 中添加对字节对象参数的支持，并将解码器覆盖为新的解码器。在那里，您现在可以在不保存音频的情况下流式传输音频。

或者，您可以简单地使用已经支持它的包。 Python-vlc 确实如此。您可以使用示例 here 从链接中播放您想要的任何音频。如果您不只是为了挑战而这样做，我强烈建议您使用另一个包。否则：祝你好运。

【讨论】：

我会给你赏金，但唯一的原因是因为感觉你在你的答案中付出了很多努力，但我担心这个答案根本没有解决问题和唯一要做的是提供 pyglet 源代码参考（给出一个非常冗长的答案）并大致解释可以做什么......实际上，我曾考虑过否决它，但因为你是 SO 我的新手想激励您从现在开始提高答案的质量。就是说，当然，感谢您的帮助我想我可能会得到这样的评论.....不幸的是，你真的不会得到比这更好的答案。没有人愿意为你做所有的工作；完成它需要相当多的工作。我已经为您指出了正确的方向，这似乎是您需要继续进行的。请记住，SO 不是一个让其他人做你的工作的地方，而是一个帮助你跳过你无法克服的障碍的地方。（但我真的很感激赏金！）对我来说，这根本不是“为你做所有的工作”，而是这个问题是否有趣到足以让你付出努力。我无数次地做自己，花很多时间回答很多有趣的问题，但不是因为“我为任何人做所有的工作”，而是因为我对了解更多关于这个主题的兴趣足够大。我很确定我不是唯一一个这样想的人，因为我见过很多次人们花费大量时间并给出高质量的答案。所以，是的，你不能这么明确地说 SO 到底是什么，因为它取决于...... ;) 不要误会我的意思，如果我有时间，我肯定会完成所需的修改并发布一个工作示例。我真的很喜欢这种挖掘和黑客行为。我的盘子里还有很多其他项目，在解决这部分问题大约 3 个小时后，我决定完成整个事情是不可行的。

以上是关于如何使用 pyglet 播放流音频？的主要内容，如果未能解决你的问题，请参考以下文章

如何使用 Soundcloud api 获取流到 html5 音频播放器？

如何在 Ruby 中将音频作为流播放

如何在 HTML 5 中播放经过身份验证的音频流？

如何检测我的音频流 url 是不是无法被 iOS 设备播放 - swift3

如何从实时流中播放音频

如何播放从 Bing Text to Speech API 返回的音频流？