为啥将音频流添加到 ffmpeg 的 libavcodec 输出容器会导致崩溃？

Posted 2023-03-13

技术标签:

【中文标题】为啥将音频流添加到 ffmpeg 的 libavcodec 输出容器会导致崩溃？【英文标题】：Why does adding audio stream to ffmpeg's libavcodec output container cause a crash?为什么将音频流添加到 ffmpeg 的 libavcodec 输出容器会导致崩溃？ 【发布时间】：2021-06-16 13:24:00 【问题描述】：

就目前而言，我的项目正确地使用 libavcodec 来解码视频，其中每一帧都被操纵（不管如何）并输出到一个新视频。我从网上找到的例子拼凑起来，它有效。结果是一个完美的 .mp4 被操纵帧，减去了音频。

我的问题是，当我尝试将音频流添加到输出容器时，我在 mux.c 中遇到了无法解释的崩溃。它位于static int compute_muxer_pkt_fields(AVFormatContext *s, AVStream *st, AVPacket *pkt)。在尝试 st->internal->priv_pts->val = pkt->dts; 的地方，priv_pts 是 nullptr。

我不记得版本号，但这是 2020 年 11 月 4 日从 git 构建的 ffmpeg。

我的MediaContentMgr 比我这里的要大得多。我正在删除与框架操作有关的所有内容，所以如果我遗漏了任何内容，请告诉我，我会进行编辑。

添加后触发 nullptr 异常的代码被内联调用

.h：

#ifndef _API_EXAMPLE_H
#define _API_EXAMPLE_H

#include <glad/glad.h>
#include <GLFW/glfw3.h>
#include "glm/glm.hpp"

extern "C" 
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/avutil.h>
#include <libavutil/opt.h>
#include <libswscale/swscale.h>


#include "shader_s.h"

class MediaContainerMgr 
public:
    MediaContainerMgr(const std::string& infile, const std::string& vert, const std::string& frag, 
                      const glm::vec3* extents);
    ~MediaContainerMgr();
    void render();
    bool recording()  return m_recording; 

    // Major thanks to "shi-yan" who helped make this possible:
    // https://github.com/shi-yan/videosamples/blob/master/libavmp4encoding/main.cpp
    bool init_video_output(const std::string& video_file_name, unsigned int width, unsigned int height);
    bool output_video_frame(uint8_t* buf);
    bool finalize_output();

private:
    AVFormatContext*   m_format_context;
    AVCodec*           m_video_codec;
    AVCodec*           m_audio_codec;
    AVCodecParameters* m_video_codec_parameters;
    AVCodecParameters* m_audio_codec_parameters;
    AVCodecContext*    m_codec_context;
    AVFrame*           m_frame;
    AVPacket*          m_packet;
    uint32_t           m_video_stream_index;
    uint32_t           m_audio_stream_index;
    
    void init_rendering(const glm::vec3* extents);
    int decode_packet();

    // For writing the output video:
    void free_output_assets();
    bool                   m_recording;
    AVOutputFormat*        m_output_format;
    AVFormatContext*       m_output_format_context;
    AVCodec*               m_output_video_codec;
    AVCodecContext*        m_output_video_codec_context;
    AVFrame*               m_output_video_frame;
    SwsContext*            m_output_scale_context;
    AVStream*              m_output_video_stream;
    
    AVCodec*               m_output_audio_codec;
    AVStream*              m_output_audio_stream;
    AVCodecContext*        m_output_audio_codec_context;
;

#endif

还有，地狱般的 .cpp：

#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>

#include "media_container_manager.h"

MediaContainerMgr::MediaContainerMgr(const std::string& infile, const std::string& vert, const std::string& frag,
    const glm::vec3* extents) :
    m_video_stream_index(-1),
    m_audio_stream_index(-1),
    m_recording(false),
    m_output_format(nullptr),
    m_output_format_context(nullptr),
    m_output_video_codec(nullptr),
    m_output_video_codec_context(nullptr),
    m_output_video_frame(nullptr),
    m_output_scale_context(nullptr),
    m_output_video_stream(nullptr)

    // AVFormatContext holds header info from the format specified in the container:
    m_format_context = avformat_alloc_context();
    if (!m_format_context) 
        throw "ERROR could not allocate memory for Format Context";
    
    
    // open the file and read its header. Codecs are not opened here.
    if (avformat_open_input(&m_format_context, infile.c_str(), NULL, NULL) != 0) 
        throw "ERROR could not open input file for reading";
    

    printf("format %s, duration %lldus, bit_rate %lld\n", m_format_context->iformat->name, m_format_context->duration, m_format_context->bit_rate);
    //read avPackets (?) from the avFormat (?) to get stream info. This populates format_context->streams.
    if (avformat_find_stream_info(m_format_context, NULL) < 0) 
        throw "ERROR could not get stream info";
    

    for (unsigned int i = 0; i < m_format_context->nb_streams; i++) 
        AVCodecParameters* local_codec_parameters = NULL;
        local_codec_parameters = m_format_context->streams[i]->codecpar;
        printf("AVStream->time base before open coded %d/%d\n", m_format_context->streams[i]->time_base.num, m_format_context->streams[i]->time_base.den);
        printf("AVStream->r_frame_rate before open coded %d/%d\n", m_format_context->streams[i]->r_frame_rate.num, m_format_context->streams[i]->r_frame_rate.den);
        printf("AVStream->start_time %" PRId64 "\n", m_format_context->streams[i]->start_time);
        printf("AVStream->duration %" PRId64 "\n", m_format_context->streams[i]->duration);
        printf("duration(s): %lf\n", (float)m_format_context->streams[i]->duration / m_format_context->streams[i]->time_base.den * m_format_context->streams[i]->time_base.num);
        AVCodec* local_codec = NULL;
        local_codec = avcodec_find_decoder(local_codec_parameters->codec_id);
        if (local_codec == NULL) 
            throw "ERROR unsupported codec!";
        

        if (local_codec_parameters->codec_type == AVMEDIA_TYPE_VIDEO) 
            if (m_video_stream_index == -1) 
                m_video_stream_index = i;
                m_video_codec = local_codec;
                m_video_codec_parameters = local_codec_parameters;
            
            m_height = local_codec_parameters->height;
            m_width = local_codec_parameters->width;
            printf("Video Codec: resolution %dx%d\n", m_width, m_height);
        
        else if (local_codec_parameters->codec_type == AVMEDIA_TYPE_AUDIO) 
            if (m_audio_stream_index == -1) 
                m_audio_stream_index = i;
                m_audio_codec = local_codec;
                m_audio_codec_parameters = local_codec_parameters;
            
            printf("Audio Codec: %d channels, sample rate %d\n", local_codec_parameters->channels, local_codec_parameters->sample_rate);
        

        printf("\tCodec %s ID %d bit_rate %lld\n", local_codec->name, local_codec->id, local_codec_parameters->bit_rate);
    

    m_codec_context = avcodec_alloc_context3(m_video_codec);
    if (!m_codec_context) 
        throw "ERROR failed to allocate memory for AVCodecContext";
    

    if (avcodec_parameters_to_context(m_codec_context, m_video_codec_parameters) < 0) 
        throw "ERROR failed to copy codec params to codec context";
    

    if (avcodec_open2(m_codec_context, m_video_codec, NULL) < 0) 
        throw "ERROR avcodec_open2 failed to open codec";
    

    m_frame = av_frame_alloc();
    if (!m_frame) 
        throw "ERROR failed to allocate AVFrame memory";
    

    m_packet = av_packet_alloc();
    if (!m_packet) 
        throw "ERROR failed to allocate AVPacket memory";
    


MediaContainerMgr::~MediaContainerMgr() 
    avformat_close_input(&m_format_context);
    av_packet_free(&m_packet);
    av_frame_free(&m_frame);
    avcodec_free_context(&m_codec_context);


    glDeleteVertexArrays(1, &m_VAO);
    glDeleteBuffers(1, &m_VBO);



bool MediaContainerMgr::advance_frame() 
    while (true) 
        if (av_read_frame(m_format_context, m_packet) < 0) 
            // Do we actually need to unref the packet if it failed?
            av_packet_unref(m_packet);
            continue;
            //return false;
        
        else 
            if (m_packet->stream_index == m_video_stream_index) 
                //printf("AVPacket->pts %" PRId64 "\n", m_packet->pts);
                int response = decode_packet();
                av_packet_unref(m_packet);
                if (response != 0) 
                    continue;
                    //return false;
                
                return true;
            
            else 
                printf("m_packet->stream_index: %d\n", m_packet->stream_index);
                printf("  m_packet->pts: %lld\n", m_packet->pts);
                printf("  mpacket->size: %d\n", m_packet->size);
                if (m_recording) 
                    int err = 0;
                    //err = avcodec_send_packet(m_output_video_codec_context, m_packet);
                    printf("  encoding error: %d\n", err);
                
            
        

        // We're done with the packet (it's been unpacked to a frame), so deallocate & reset to defaults:
/*
        if (m_frame == NULL)
            return false;

        if (m_frame->data[0] == NULL || m_frame->data[1] == NULL || m_frame->data[2] == NULL) 
            printf("WARNING: null frame data");
            continue;
        
*/
    


int MediaContainerMgr::decode_packet() 
    // Supply raw packet data as input to a decoder
    // https://ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga58bc4bf1e0ac59e27362597e467efff3
    int response = avcodec_send_packet(m_codec_context, m_packet);

    if (response < 0) 
        char buf[256];
        av_strerror(response, buf, 256);
        printf("Error while receiving a frame from the decoder: %s\n", buf);
        return response;
    

    // Return decoded output data (into a frame) from a decoder
    // https://ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga11e6542c4e66d3028668788a1a74217c
    response = avcodec_receive_frame(m_codec_context, m_frame);
    if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) 
        return response;
     else if (response < 0) 
        char buf[256];
        av_strerror(response, buf, 256);
        printf("Error while receiving a frame from the decoder: %s\n", buf);
        return response;
     else 
        printf(
            "Frame %d (type=%c, size=%d bytes) pts %lld key_frame %d [DTS %d]\n",
            m_codec_context->frame_number,
            av_get_picture_type_char(m_frame->pict_type),
            m_frame->pkt_size,
            m_frame->pts,
            m_frame->key_frame,
            m_frame->coded_picture_number
        );
    
    return 0;



bool MediaContainerMgr::init_video_output(const std::string& video_file_name, unsigned int width, unsigned int height) 
    if (m_recording)
        return true;
    m_recording = true;

    advance_to(0L); // I've deleted the implmentation. Just seeks to beginning of vid. Works fine.

    if (!(m_output_format = av_guess_format(nullptr, video_file_name.c_str(), nullptr))) 
        printf("Cannot guess output format.\n");
        return false;
    

    int err = avformat_alloc_output_context2(&m_output_format_context, m_output_format, nullptr, video_file_name.c_str());
    if (err < 0) 
        printf("Failed to allocate output context.\n");
        return false;
    

    //TODO(P0): Break out the video and audio inits into their own methods.
    m_output_video_codec = avcodec_find_encoder(m_output_format->video_codec);
    if (!m_output_video_codec) 
        printf("Failed to create video codec.\n");
        return false;
    
    m_output_video_stream = avformat_new_stream(m_output_format_context, m_output_video_codec);
    if (!m_output_video_stream) 
        printf("Failed to find video format.\n");
        return false;
     
    m_output_video_codec_context = avcodec_alloc_context3(m_output_video_codec);
    if (!m_output_video_codec_context) 
        printf("Failed to create video codec context.\n");
        return(false);
    
    m_output_video_stream->codecpar->codec_id = m_output_format->video_codec;
    m_output_video_stream->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
    m_output_video_stream->codecpar->width = width;
    m_output_video_stream->codecpar->height = height;
    m_output_video_stream->codecpar->format = AV_PIX_FMT_YUV420P;
    // Use the same bit rate as the input stream.
    m_output_video_stream->codecpar->bit_rate = m_format_context->streams[m_video_stream_index]->codecpar->bit_rate;
    m_output_video_stream->avg_frame_rate = m_format_context->streams[m_video_stream_index]->avg_frame_rate;
    avcodec_parameters_to_context(m_output_video_codec_context, m_output_video_stream->codecpar);
    m_output_video_codec_context->time_base = m_format_context->streams[m_video_stream_index]->time_base;
    
    //TODO(P1): Set these to match the input stream?
    m_output_video_codec_context->max_b_frames = 2;
    m_output_video_codec_context->gop_size = 12;
    m_output_video_codec_context->framerate = m_format_context->streams[m_video_stream_index]->r_frame_rate;
    //m_output_codec_context->refcounted_frames = 0;
    if (m_output_video_stream->codecpar->codec_id == AV_CODEC_ID_H264) 
        av_opt_set(m_output_video_codec_context, "preset", "ultrafast", 0);
     else if (m_output_video_stream->codecpar->codec_id == AV_CODEC_ID_H265) 
        av_opt_set(m_output_video_codec_context, "preset", "ultrafast", 0);
     else 
        av_opt_set_int(m_output_video_codec_context, "lossless", 1, 0);
    
    avcodec_parameters_from_context(m_output_video_stream->codecpar, m_output_video_codec_context);

    m_output_audio_codec = avcodec_find_encoder(m_output_format->audio_codec);
    if (!m_output_audio_codec) 
        printf("Failed to create audio codec.\n");
        return false;

我已经注释掉了下一行之外的所有音频流初始化，因为这是麻烦开始了。创建此输出流会导致我提到的空引用。如果我在这里取消注释下面的所有内容，我仍然得到 null deref。如果我注释掉这一行， deref 异常消失。（IOW，我注释掉了越来越多的代码，直到我发现这个是导致问题的触发器。）

我认为在其余被注释掉的代码中我做错了，那就是，修复后，将修复 nullptr 并为我提供工作音频流。

    m_output_audio_stream = avformat_new_stream(m_output_format_context, m_output_audio_codec);
    if (!m_output_audio_stream) 
        printf("Failed to find audio format.\n");
        return false;
    
    /*
    m_output_audio_codec_context = avcodec_alloc_context3(m_output_audio_codec);
    if (!m_output_audio_codec_context) 
        printf("Failed to create audio codec context.\n");
        return(false);
    
    m_output_audio_stream->codecpar->codec_id = m_output_format->audio_codec;
    m_output_audio_stream->codecpar->codec_type = AVMEDIA_TYPE_AUDIO;
    m_output_audio_stream->codecpar->format = m_format_context->streams[m_audio_stream_index]->codecpar->format;
    m_output_audio_stream->codecpar->bit_rate = m_format_context->streams[m_audio_stream_index]->codecpar->bit_rate;
    m_output_audio_stream->avg_frame_rate = m_format_context->streams[m_audio_stream_index]->avg_frame_rate;
    avcodec_parameters_to_context(m_output_audio_codec_context, m_output_audio_stream->codecpar);
    m_output_audio_codec_context->time_base = m_format_context->streams[m_audio_stream_index]->time_base;
    */

    //TODO(P2): Free assets that have been allocated.
    err = avcodec_open2(m_output_video_codec_context, m_output_video_codec, nullptr);
    if (err < 0) 
        printf("Failed to open codec.\n");
        return false;
    

    if (!(m_output_format->flags & AVFMT_NOFILE)) 
        err = avio_open(&m_output_format_context->pb, video_file_name.c_str(), AVIO_FLAG_WRITE);
        if (err < 0) 
            printf("Failed to open output file.");
            return false;
        
    

    err = avformat_write_header(m_output_format_context, NULL);
    if (err < 0) 
        printf("Failed to write header.\n");
        return false;
    

    av_dump_format(m_output_format_context, 0, video_file_name.c_str(), 1);

    return true;



//TODO(P2): make this a member. (Thanks to https://emvlo.wordpress.com/2016/03/10/sws_scale/)
void PrepareFlipFrameJ420(AVFrame* pFrame) 
    for (int i = 0; i < 4; i++) 
        if (i)
            pFrame->data[i] += pFrame->linesize[i] * ((pFrame->height >> 1) - 1);
        else
            pFrame->data[i] += pFrame->linesize[i] * (pFrame->height - 1);
        pFrame->linesize[i] = -pFrame->linesize[i];

这是我们获取更改后的帧并将其写入输出容器的地方。这工作正常只要我们没有在输出容器中设置音频流。

bool MediaContainerMgr::output_video_frame(uint8_t* buf) 
    int err;

    if (!m_output_video_frame) 
        m_output_video_frame = av_frame_alloc();
        m_output_video_frame->format = AV_PIX_FMT_YUV420P;
        m_output_video_frame->width = m_output_video_codec_context->width;
        m_output_video_frame->height = m_output_video_codec_context->height;
        err = av_frame_get_buffer(m_output_video_frame, 32);
        if (err < 0) 
            printf("Failed to allocate output frame.\n");
            return false;
        
    

    if (!m_output_scale_context) 
        m_output_scale_context = sws_getContext(m_output_video_codec_context->width, m_output_video_codec_context->height, 
                                                AV_PIX_FMT_RGB24,
                                                m_output_video_codec_context->width, m_output_video_codec_context->height, 
                                                AV_PIX_FMT_YUV420P, SWS_BICUBIC, nullptr, nullptr, nullptr);
    

    int inLinesize[1] =  3 * m_output_video_codec_context->width ;
    sws_scale(m_output_scale_context, (const uint8_t* const*)&buf, inLinesize, 0, m_output_video_codec_context->height,
              m_output_video_frame->data, m_output_video_frame->linesize);
    PrepareFlipFrameJ420(m_output_video_frame);
    //TODO(P0): Switch m_frame to be m_input_video_frame so I don't end up using the presentation timestamp from
    //          an audio frame if I threadify the frame reading.
    m_output_video_frame->pts = m_frame->pts;
    printf("Output PTS: %d, time_base: %d/%d\n", m_output_video_frame->pts,
        m_output_video_codec_context->time_base.num, m_output_video_codec_context->time_base.den);
    err = avcodec_send_frame(m_output_video_codec_context, m_output_video_frame);
    if (err < 0) 
        printf("  ERROR sending new video frame output: ");
        switch (err) 
        case AVERROR(EAGAIN):
            printf("AVERROR(EAGAIN): %d\n", err);
            break;
        case AVERROR_EOF:
            printf("AVERROR_EOF: %d\n", err);
            break;
        case AVERROR(EINVAL):
            printf("AVERROR(EINVAL): %d\n", err);
            break;
        case AVERROR(ENOMEM):
            printf("AVERROR(ENOMEM): %d\n", err);
            break;
        

        return false;
    

    AVPacket pkt;
    av_init_packet(&pkt);
    pkt.data = nullptr;
    pkt.size = 0;
    pkt.flags |= AV_PKT_FLAG_KEY;
    int ret = 0;
    if ((ret = avcodec_receive_packet(m_output_video_codec_context, &pkt)) == 0) 
        static int counter = 0;
        printf("pkt.key: 0x%08x, pkt.size: %d, counter:\n", pkt.flags & AV_PKT_FLAG_KEY, pkt.size, counter++);
        uint8_t* size = ((uint8_t*)pkt.data);
        printf("sizes: %d %d %d %d %d %d %d %d %d\n", size[0], size[1], size[2], size[2], size[3], size[4], size[5], size[6], size[7]);
        av_interleaved_write_frame(m_output_format_context, &pkt);
    
    printf("push: %d\n", ret);
    av_packet_unref(&pkt);

    return true;


bool MediaContainerMgr::finalize_output() 
    if (!m_recording)
        return true;

    AVPacket pkt;
    av_init_packet(&pkt);
    pkt.data = nullptr;
    pkt.size = 0;

    for (;;) 
        avcodec_send_frame(m_output_video_codec_context, nullptr);
        if (avcodec_receive_packet(m_output_video_codec_context, &pkt) == 0) 
            av_interleaved_write_frame(m_output_format_context, &pkt);
            printf("final push:\n");
         else 
            break;
        
    

    av_packet_unref(&pkt);

    av_write_trailer(m_output_format_context);
    if (!(m_output_format->flags & AVFMT_NOFILE)) 
        int err = avio_close(m_output_format_context->pb);
        if (err < 0) 
            printf("Failed to close file. err: %d\n", err);
            return false;
        
    

    return true;

编辑崩溃时的调用堆栈（我应该包含在原始问题中）：

avformat-58.dll!compute_muxer_pkt_fields(AVFormatContext * s, AVStream * st, AVPacket * pkt) Line 630   C
avformat-58.dll!write_packet_common(AVFormatContext * s, AVStream * st, AVPacket * pkt, int interleaved) Line 1122  C
avformat-58.dll!write_packets_common(AVFormatContext * s, AVPacket * pkt, int interleaved) Line 1186    C
avformat-58.dll!av_interleaved_write_frame(AVFormatContext * s, AVPacket * pkt) Line 1241   C
CamBot.exe!MediaContainerMgr::output_video_frame(unsigned char * buf) Line 553  C++
CamBot.exe!main() Line 240  C++

如果我将调用移动到 avformat_write_header 以便它紧接在音频流初始化之前，我仍然会崩溃，但在不同的地方。崩溃发生在 movenc.c 的第 6459 行，我们有：

/* Non-seekable output is ok if using fragmentation. If ism_lookahead
 * is enabled, we don't support non-seekable output at all. */
if (!(s->pb->seekable & AVIO_SEEKABLE_NORMAL) &&  //  CRASH IS HERE
    (!(mov->flags & FF_MOV_FLAG_FRAGMENT) || mov->ism_lookahead)) 
    av_log(s, AV_LOG_ERROR, "muxer does not support non seekable output\n");
    return AVERROR(EINVAL);

异常是 nullptr 异常，其中 s->pb 为 NULL。调用栈是：

avformat-58.dll!mov_init(AVFormatContext * s) Line 6459 C
avformat-58.dll!init_muxer(AVFormatContext * s, AVDictionary * * options) Line 407  C
[Inline Frame] avformat-58.dll!avformat_init_output(AVFormatContext *) Line 489 C
avformat-58.dll!avformat_write_header(AVFormatContext * s, AVDictionary * * options) Line 512   C
CamBot.exe!MediaContainerMgr::init_video_output(const std::string & video_file_name, unsigned int width, unsigned int height) Line 424  C++
CamBot.exe!main() Line 183  C++

【问题讨论】：

在音频流初始化之前移动avformat_write_header 有帮助吗？另外，如果可能的话，你有崩溃的调用堆栈吗？ @zoso 如果我在包含所有流之前编写标题，标题是否会丢失有关容器中流的关键信息？还是标题比这更笼统？可能是显而易见的，但是：valgrind? 【参考方案1】：

请注意，您应该始终尝试提供一个独立的最小工作示例，以便其他人更容易提供帮助。使用实际代码、匹配的 FFmpeg 版本和触发分段错误的输入视频（可以肯定），问题将是分析控制流以确定为什么未分配 st->internal->priv_pts 的问题。如果没有完整的场景，我必须报告做出可能与您的实际代码对应或不对应的假设。

根据您的描述，我尝试通过克隆 https://github.com/FFmpeg/FFmpeg.git 并从提交 b52e0d95（2020 年 11 月 4 日）创建一个新分支以近似您的 FFmpeg 版本来重现该问题。

我使用提供的代码 sn-ps 重新创建了您的场景

包括对音频流的 avformat_new_stream() 调用保留剩余的音频初始化注释掉包括原来的avformat_write_header()呼叫站点（顺序不变）

在这种情况下，带有 MP4 视频/音频输入的视频写入在avformat_write_header() 中失败：

[mp4 @ 0x2b39f40] sample rate not set 0

错误位置的调用栈：

#0  0x00007ffff75253d7 in raise () from /lib64/libc.so.6
#1  0x00007ffff7526ac8 in abort () from /lib64/libc.so.6
#2  0x000000000094feca in init_muxer (s=0x2b39f40, options=0x0) at libavformat/mux.c:309
#3  0x00000000009508f4 in avformat_init_output (s=0x2b39f40, options=0x0) at libavformat/mux.c:490
#4  0x0000000000950a10 in avformat_write_header (s=0x2b39f40, options=0x0) at libavformat/mux.c:514
[...]

init_muxer()中无条件检查流参数中的采样率：

        case AVMEDIA_TYPE_AUDIO:
            if (par->sample_rate <= 0) 
                av_log(s, AV_LOG_ERROR, "sample rate not set %d\n", par->sample_rate); abort();
                ret = AVERROR(EINVAL);
                goto fail;

该条件至少从 2014 年 6 月 18 日开始生效（没有进一步追溯）并且仍然存在。对于 2020 年 11 月的版本，检查必须处于活动状态并且必须相应地设置参数。

如果我取消注释剩余的音频初始化，情况保持不变（如预期的那样）。所以，满足条件，我添加了缺少的参数如下：

m_output_audio_stream->codecpar->sample_rate =
  m_format_context->streams[m_audio_stream_index]->codecpar->sample_rate;

这样，校验成功，avformat_write_header()成功，实际视频写入成功。

正如您在问题中指出的那样，分段错误是由 st->internal->priv_pts 在此位置的 NULL 引起的：

#0  0x00000000009516db in compute_muxer_pkt_fields (s=0x2b39f40, st=0x2b3a580, pkt=0x7fffffffe2d0) at libavformat/mux.c:632
#1  0x0000000000953128 in write_packet_common (s=0x2b39f40, st=0x2b3a580, pkt=0x7fffffffe2d0, interleaved=1) at libavformat/mux.c:1125
#2  0x0000000000953473 in write_packets_common (s=0x2b39f40, pkt=0x7fffffffe2d0, interleaved=1) at libavformat/mux.c:1188
#3  0x0000000000953634 in av_interleaved_write_frame (s=0x2b39f40, pkt=0x7fffffffe2d0) at libavformat/mux.c:1243
[...]

在 FFmpeg 代码库中，priv_pts 的分配由 init_pts() 处理，用于上下文引用的所有流。 init_pts() 有两个呼叫站点：

libavformat/mux.c:496:

    if (s->oformat->init && ret) 
        if ((ret = init_pts(s)) < 0)
            return ret;
        
        return AVSTREAM_INIT_IN_INIT_OUTPUT;

libavformat/mux.c:530:

    if (!s->internal->streams_initialized) 
       if ((ret = init_pts(s)) < 0)
          goto fail;

在这两种情况下，调用都是由avformat_write_header() 触发的（第一个通过avformat_init_output() 间接触发，第二个直接通过avformat_init_output()）。根据控制流分析，没有任何成功案例会使priv_pts 未分配。

考虑到我们的 FFmpeg 版本在行为方面兼容的可能性很高，我必须假设 1）必须为音频流提供采样率和 1）priv_pts 总是由avformat_write_header() 在没有错误。因此，我想到了两个可能的根本原因：

mp4

avformat_write_header()

解决方案：确保如果avformat_write_header() 失败，处理不会继续。通过添加音频流，avformat_write_header() 开始失败，除非您设置流采样率。如果错误被忽略，av_interleaved_write_frame() 通过访问未分配的st->internal->priv_pts 触发分段错误。

正如最初提到的，场景是不完整的。如果您确实致电avformat_write_header() 并在出现错误时停止处理（这意味着您不致电av_interleaved_write_frame()），则需要更多信息。就目前而言，这不太可能。为了进一步分析，需要可执行输出（stdout、stderr）来查看您的跟踪和 FFmpeg 日志消息。如果这不能揭示新信息，则需要一个独立的最小工作示例和视频输入来获得所有完整图片。

【讨论】：

以上是关于为啥将音频流添加到 ffmpeg 的 libavcodec 输出容器会导致崩溃？的主要内容，如果未能解决你的问题，请参考以下文章