用ffmpeg c连接视频和音频时如何计算pts和dts

Posted

技术标签:

【中文标题】用ffmpeg c连接视频和音频时如何计算pts和dts【英文标题】:How to calculate pts and dts when concatenating video and audio with ffmpeg c 【发布时间】:2022-01-24 05:26:02 【问题描述】:
typedef struct file

   AVFormatContext *container;
   AVCodecContext **codec;
   int *frames;
 file;

int stream_clip(file *input, file *output)

    AVPacket *packet = av_packet_alloc();
    AVFrame *frame = av_frame_alloc();
    int res;

    while (1)
    
        res = decode_frame(input, frame, packet);

        if (res == 1)
        
            printf("Error decoding a frame\n");
            av_frame_free(&frame);
            av_packet_free(&packet);

            return 1;
        
        else if (res == 0)
        

            AVCodecContext *codec = output->codec[packet->stream_index];
            AVRational fps = output->codec[packet->stream_index]->framerate;
            AVRational time_base = output->container->streams[packet->stream_index]->time_base;

            /*
            if (input->container->streams[packet->stream_index]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
            
                fps.num = 1,
                fps.den = input->container->streams[packet->stream_index]->codecpar->sample_rate;
            
            */
           
            frame->pts = (int64_t)(av_q2d(av_div_q((AVRational)time_base.den, 1, fps)) * output->frames[packet->stream_index]);

            frame->pkt_dts = frame->pts;
            frame->pkt_duration = frame->pts;

            printf("%i FRAME %i PTS %i\n", packet->stream_index, output->frames[packet->stream_index], frame->pts);

            output->frames[packet->stream_index]++;

            res = encode_frame(output, frame, packet->stream_index);
            if (res == 1)
            
                av_frame_free(&frame);
                printf("Failde encoding frame\n");
                return 1;
            
            av_frame_unref(frame);
        

        else if (res == -1)
        
            printf("\nfile \"%s\" ended\n", input->container->url);
            break;
        
    

    av_frame_free(&frame);
    //flush decoder
    decode_frame(input, NULL, packet);

    av_packet_free(&packet);

    return 0;

https://github.com/leandromoreira/ffmpeg-libav-tutorial#chapter-1---syncing-audio-and-video

我尝试通过对视频执行 timescale / fps * frame_number 来计算 pts, 对于音频,我只是让 ffmpeg 为我完成,视频和音频播放正常,但音频和视频不同步,音频结束的速度比视频快

我也收到此错误 [mp4 @ 0xe9c7780] 流 1 的数据包中未设置时间戳。这已被弃用,将来将停止工作。修复您的代码以正确设置时间戳

[mp4 @ 0xe9c7780] 编码器没有产生正确的点,弥补了一些。

如果我计算音频pts,vlc和mpv都不能正确播放视频,mpv播放音频正确但视频不正确,vlc播放正确视频但没有音频

mpv 输出此错误: “检测到音频/视频不同步!可能的原因包括太慢 硬件、临时 CPU 峰值、损坏的驱动程序和损坏的文件。声音的 位置将与视频不匹配(请参阅 A-V 状态字段)。”

下面是每帧计算的pts,0是音频,1是视频

frame_type FRAME frame_number PTS
0 FRAME 0 PTS 0
0 FRAME 1 PTS 512
0 FRAME 2 PTS 1024
0 FRAME 3 PTS 1536
1 FRAME 0 PTS 0
0 FRAME 4 PTS 2048
1 FRAME 1 PTS 0
0 FRAME 5 PTS 2560
1 FRAME 2 PTS 0
1 FRAME 3 PTS 0
0 FRAME 6 PTS 3072
1 FRAME 4 PTS 0
1 FRAME 5 PTS 0
0 FRAME 7 PTS 3584
1 FRAME 6 PTS 0
1 FRAME 7 PTS 0
0 FRAME 8 PTS 4096
1 FRAME 8 PTS 0
1 FRAME 9 PTS 0
0 FRAME 9 PTS 4608
1 FRAME 10 PTS 0
0 FRAME 10 PTS 5120
1 FRAME 11 PTS 0
1 FRAME 12 PTS 0
0 FRAME 11 PTS 5632
1 FRAME 13 PTS 0
1 FRAME 14 PTS 0
0 FRAME 12 PTS 6144
1 FRAME 15 PTS 0
1 FRAME 16 PTS 0
0 FRAME 13 PTS 6656
1 FRAME 17 PTS 0
1 FRAME 18 PTS 0
0 FRAME 14 PTS 7168
1 FRAME 19 PTS 0
0 FRAME 15 PTS 7680
1 FRAME 20 PTS 0
1 FRAME 21 PTS 0
0 FRAME 16 PTS 8192
1 FRAME 22 PTS 0
1 FRAME 23 PTS 0
0 FRAME 17 PTS 8704
1 FRAME 24 PTS 0
1 FRAME 25 PTS 0
0 FRAME 18 PTS 9216
1 FRAME 26 PTS 0
1 FRAME 27 PTS 0
0 FRAME 19 PTS 9728
1 FRAME 28 PTS 0
0 FRAME 20 PTS 10240
1 FRAME 29 PTS 0
1 FRAME 30 PTS 0
0 FRAME 21 PTS 10752
1 FRAME 31 PTS 0
1 FRAME 32 PTS 0
0 FRAME 22 PTS 11264
1 FRAME 33 PTS 0
1 FRAME 34 PTS 0
0 FRAME 23 PTS 11776
1 FRAME 35 PTS 0
1 FRAME 36 PTS 0
0 FRAME 24 PTS 12288
1 FRAME 37 PTS 0
0 FRAME 25 PTS 12800
1 FRAME 38 PTS 0
1 FRAME 39 PTS 0
0 FRAME 26 PTS 13312
1 FRAME 40 PTS 0
1 FRAME 41 PTS 0
0 FRAME 27 PTS 13824
1 FRAME 42 PTS 0
1 FRAME 43 PTS 0

file "in_short.mp4" ended
0 FRAME 28 PTS 14336
0 FRAME 29 PTS 14848
0 FRAME 30 PTS 15360
0 FRAME 31 PTS 15872
1 FRAME 44 PTS 0
0 FRAME 32 PTS 16384
1 FRAME 45 PTS 0
0 FRAME 33 PTS 16896
1 FRAME 46 PTS 0
1 FRAME 47 PTS 0
0 FRAME 34 PTS 17408
1 FRAME 48 PTS 0
1 FRAME 49 PTS 0
0 FRAME 35 PTS 17920
1 FRAME 50 PTS 0
1 FRAME 51 PTS 0
0 FRAME 36 PTS 18432
1 FRAME 52 PTS 0
1 FRAME 53 PTS 0
0 FRAME 37 PTS 18944
1 FRAME 54 PTS 0
0 FRAME 38 PTS 19456
1 FRAME 55 PTS 0
1 FRAME 56 PTS 0
0 FRAME 39 PTS 19968
1 FRAME 57 PTS 0
1 FRAME 58 PTS 0
0 FRAME 40 PTS 20480
1 FRAME 59 PTS 0
1 FRAME 60 PTS 0
0 FRAME 41 PTS 20992
1 FRAME 61 PTS 0
1 FRAME 62 PTS 0
0 FRAME 42 PTS 21504
1 FRAME 63 PTS 0
0 FRAME 43 PTS 22016
1 FRAME 64 PTS 0
1 FRAME 65 PTS 0
0 FRAME 44 PTS 22528
1 FRAME 66 PTS 0
1 FRAME 67 PTS 0
0 FRAME 45 PTS 23040
1 FRAME 68 PTS 0
1 FRAME 69 PTS 0
0 FRAME 46 PTS 23552
1 FRAME 70 PTS 0
1 FRAME 71 PTS 0
0 FRAME 47 PTS 24064
1 FRAME 72 PTS 0
0 FRAME 48 PTS 24576
1 FRAME 73 PTS 0
1 FRAME 74 PTS 0
0 FRAME 49 PTS 25088
1 FRAME 75 PTS 0
1 FRAME 76 PTS 0
0 FRAME 50 PTS 25600
1 FRAME 77 PTS 0
1 FRAME 78 PTS 0
0 FRAME 51 PTS 26112
1 FRAME 79 PTS 0
1 FRAME 80 PTS 0
0 FRAME 52 PTS 26624
1 FRAME 81 PTS 0
0 FRAME 53 PTS 27136
1 FRAME 82 PTS 0
1 FRAME 83 PTS 0
0 FRAME 54 PTS 27648
1 FRAME 84 PTS 0
1 FRAME 85 PTS 0
0 FRAME 55 PTS 28160
1 FRAME 86 PTS 0
1 FRAME 87 PTS 0

file "in_short.mp4" ended
0 FRAME 56 PTS 28672
0 FRAME 57 PTS 29184
0 FRAME 58 PTS 29696
0 FRAME 59 PTS 30208
1 FRAME 88 PTS 0
0 FRAME 60 PTS 30720
1 FRAME 89 PTS 0
0 FRAME 61 PTS 31232
1 FRAME 90 PTS 0
1 FRAME 91 PTS 0
0 FRAME 62 PTS 31744
1 FRAME 92 PTS 0
1 FRAME 93 PTS 0
0 FRAME 63 PTS 32256
1 FRAME 94 PTS 0
1 FRAME 95 PTS 0
0 FRAME 64 PTS 32768
1 FRAME 96 PTS 0
1 FRAME 97 PTS 0
0 FRAME 65 PTS 33280
1 FRAME 98 PTS 0
0 FRAME 66 PTS 33792
1 FRAME 99 PTS 0
1 FRAME 100 PTS 0
0 FRAME 67 PTS 34304
1 FRAME 101 PTS 0
1 FRAME 102 PTS 0
0 FRAME 68 PTS 34816
1 FRAME 103 PTS 0
1 FRAME 104 PTS 0
0 FRAME 69 PTS 35328
1 FRAME 105 PTS 0
1 FRAME 106 PTS 0
0 FRAME 70 PTS 35840
1 FRAME 107 PTS 0
0 FRAME 71 PTS 36352
1 FRAME 108 PTS 0
1 FRAME 109 PTS 0
0 FRAME 72 PTS 36864
1 FRAME 110 PTS 0
1 FRAME 111 PTS 0
0 FRAME 73 PTS 37376
1 FRAME 112 PTS 0
1 FRAME 113 PTS 0
0 FRAME 74 PTS 37888
1 FRAME 114 PTS 0
1 FRAME 115 PTS 0
0 FRAME 75 PTS 38400
1 FRAME 116 PTS 0
0 FRAME 76 PTS 38912
1 FRAME 117 PTS 0
1 FRAME 118 PTS 0
0 FRAME 77 PTS 39424
1 FRAME 119 PTS 0
1 FRAME 120 PTS 0
0 FRAME 78 PTS 39936
1 FRAME 121 PTS 0
1 FRAME 122 PTS 0
0 FRAME 79 PTS 40448
1 FRAME 123 PTS 0
1 FRAME 124 PTS 0
0 FRAME 80 PTS 40960
1 FRAME 125 PTS 0
0 FRAME 81 PTS 41472
1 FRAME 126 PTS 0
1 FRAME 127 PTS 0
0 FRAME 82 PTS 41984
1 FRAME 128 PTS 0
1 FRAME 129 PTS 0
0 FRAME 83 PTS 42496
1 FRAME 130 PTS 0
1 FRAME 131 PTS 0

file "in_short.mp4" ended

下面是计算音频点数时

0 FRAME 0 PTS 0
0 FRAME 1 PTS 512
0 FRAME 2 PTS 1024
0 FRAME 3 PTS 1536
1 FRAME 0 PTS 0
0 FRAME 4 PTS 2048
1 FRAME 1 PTS 1
0 FRAME 5 PTS 2560
1 FRAME 2 PTS 2
1 FRAME 3 PTS 3
0 FRAME 6 PTS 3072
1 FRAME 4 PTS 4
1 FRAME 5 PTS 5
0 FRAME 7 PTS 3584
1 FRAME 6 PTS 6
1 FRAME 7 PTS 7
0 FRAME 8 PTS 4096
1 FRAME 8 PTS 8
1 FRAME 9 PTS 9
0 FRAME 9 PTS 4608
1 FRAME 10 PTS 10
0 FRAME 10 PTS 5120
1 FRAME 11 PTS 11
1 FRAME 12 PTS 12
0 FRAME 11 PTS 5632
1 FRAME 13 PTS 13
1 FRAME 14 PTS 14
0 FRAME 12 PTS 6144
1 FRAME 15 PTS 15
1 FRAME 16 PTS 16
0 FRAME 13 PTS 6656
1 FRAME 17 PTS 17
1 FRAME 18 PTS 18
0 FRAME 14 PTS 7168
1 FRAME 19 PTS 19
0 FRAME 15 PTS 7680
1 FRAME 20 PTS 20
1 FRAME 21 PTS 21
0 FRAME 16 PTS 8192
1 FRAME 22 PTS 22
1 FRAME 23 PTS 23
0 FRAME 17 PTS 8704
1 FRAME 24 PTS 24
1 FRAME 25 PTS 25
0 FRAME 18 PTS 9216
1 FRAME 26 PTS 26
1 FRAME 27 PTS 27
0 FRAME 19 PTS 9728
1 FRAME 28 PTS 28
0 FRAME 20 PTS 10240
1 FRAME 29 PTS 29
1 FRAME 30 PTS 30
0 FRAME 21 PTS 10752
1 FRAME 31 PTS 31
1 FRAME 32 PTS 32
0 FRAME 22 PTS 11264
1 FRAME 33 PTS 33
1 FRAME 34 PTS 34
0 FRAME 23 PTS 11776
1 FRAME 35 PTS 35
1 FRAME 36 PTS 36
0 FRAME 24 PTS 12288
1 FRAME 37 PTS 37
0 FRAME 25 PTS 12800
1 FRAME 38 PTS 38
1 FRAME 39 PTS 39
0 FRAME 26 PTS 13312
1 FRAME 40 PTS 40
1 FRAME 41 PTS 41
0 FRAME 27 PTS 13824
1 FRAME 42 PTS 42
1 FRAME 43 PTS 43

file "in_short.mp4" ended
0 FRAME 28 PTS 14336
0 FRAME 29 PTS 14848
0 FRAME 30 PTS 15360
0 FRAME 31 PTS 15872
1 FRAME 44 PTS 44
0 FRAME 32 PTS 16384
1 FRAME 45 PTS 45
0 FRAME 33 PTS 16896
1 FRAME 46 PTS 46
1 FRAME 47 PTS 47
0 FRAME 34 PTS 17408
1 FRAME 48 PTS 48
1 FRAME 49 PTS 49
0 FRAME 35 PTS 17920
1 FRAME 50 PTS 50
1 FRAME 51 PTS 51
0 FRAME 36 PTS 18432
1 FRAME 52 PTS 52
1 FRAME 53 PTS 53
0 FRAME 37 PTS 18944
1 FRAME 54 PTS 54
0 FRAME 38 PTS 19456
1 FRAME 55 PTS 55
1 FRAME 56 PTS 56
0 FRAME 39 PTS 19968
1 FRAME 57 PTS 57
1 FRAME 58 PTS 58
0 FRAME 40 PTS 20480
1 FRAME 59 PTS 59
1 FRAME 60 PTS 60
0 FRAME 41 PTS 20992
1 FRAME 61 PTS 61
1 FRAME 62 PTS 62
0 FRAME 42 PTS 21504
1 FRAME 63 PTS 63
0 FRAME 43 PTS 22016
1 FRAME 64 PTS 64
1 FRAME 65 PTS 65
0 FRAME 44 PTS 22528
1 FRAME 66 PTS 66
1 FRAME 67 PTS 67
0 FRAME 45 PTS 23040
1 FRAME 68 PTS 68
1 FRAME 69 PTS 69
0 FRAME 46 PTS 23552
1 FRAME 70 PTS 70
1 FRAME 71 PTS 71
0 FRAME 47 PTS 24064
1 FRAME 72 PTS 72
0 FRAME 48 PTS 24576
1 FRAME 73 PTS 73
1 FRAME 74 PTS 74
0 FRAME 49 PTS 25088
1 FRAME 75 PTS 75
1 FRAME 76 PTS 76
0 FRAME 50 PTS 25600
1 FRAME 77 PTS 77
1 FRAME 78 PTS 78
0 FRAME 51 PTS 26112
1 FRAME 79 PTS 79
1 FRAME 80 PTS 80
0 FRAME 52 PTS 26624
1 FRAME 81 PTS 81
0 FRAME 53 PTS 27136
1 FRAME 82 PTS 82
1 FRAME 83 PTS 83
0 FRAME 54 PTS 27648
1 FRAME 84 PTS 84
1 FRAME 85 PTS 85
0 FRAME 55 PTS 28160
1 FRAME 86 PTS 86
1 FRAME 87 PTS 87

file "in_short.mp4" ended
0 FRAME 56 PTS 28672
0 FRAME 57 PTS 29184
0 FRAME 58 PTS 29696
0 FRAME 59 PTS 30208
1 FRAME 88 PTS 88
0 FRAME 60 PTS 30720
1 FRAME 89 PTS 89
0 FRAME 61 PTS 31232
1 FRAME 90 PTS 90
1 FRAME 91 PTS 91
0 FRAME 62 PTS 31744
1 FRAME 92 PTS 92
1 FRAME 93 PTS 93
0 FRAME 63 PTS 32256
1 FRAME 94 PTS 94
1 FRAME 95 PTS 95
0 FRAME 64 PTS 32768
1 FRAME 96 PTS 96
1 FRAME 97 PTS 97
0 FRAME 65 PTS 33280
1 FRAME 98 PTS 98
0 FRAME 66 PTS 33792
1 FRAME 99 PTS 99
1 FRAME 100 PTS 100
0 FRAME 67 PTS 34304
1 FRAME 101 PTS 101
1 FRAME 102 PTS 102
0 FRAME 68 PTS 34816
1 FRAME 103 PTS 103
1 FRAME 104 PTS 104
0 FRAME 69 PTS 35328
1 FRAME 105 PTS 105
1 FRAME 106 PTS 106
0 FRAME 70 PTS 35840
1 FRAME 107 PTS 107
0 FRAME 71 PTS 36352
1 FRAME 108 PTS 108
1 FRAME 109 PTS 109
0 FRAME 72 PTS 36864
1 FRAME 110 PTS 110
1 FRAME 111 PTS 111
0 FRAME 73 PTS 37376
1 FRAME 112 PTS 112
1 FRAME 113 PTS 113
0 FRAME 74 PTS 37888
1 FRAME 114 PTS 114
1 FRAME 115 PTS 115
0 FRAME 75 PTS 38400
1 FRAME 116 PTS 116
0 FRAME 76 PTS 38912
1 FRAME 117 PTS 117
1 FRAME 118 PTS 118
0 FRAME 77 PTS 39424
1 FRAME 119 PTS 119
1 FRAME 120 PTS 120
0 FRAME 78 PTS 39936
1 FRAME 121 PTS 121
1 FRAME 122 PTS 122
0 FRAME 79 PTS 40448
1 FRAME 123 PTS 123
1 FRAME 124 PTS 124
0 FRAME 80 PTS 40960
1 FRAME 125 PTS 125
0 FRAME 81 PTS 41472
1 FRAME 126 PTS 126
1 FRAME 127 PTS 127
0 FRAME 82 PTS 41984
1 FRAME 128 PTS 128
1 FRAME 129 PTS 129
0 FRAME 83 PTS 42496
1 FRAME 130 PTS 130
1 FRAME 131 PTS 131

file "in_short.mp4" ended

我尝试过像这个例子https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/transcoding.c那样重新缩放pts数据包 第 448 行,带有 av_packet_rescale_ts() 但由于我在新视频开始时使用多个视频,所以新视频 pts 以 0 开头。

视频输出 这个就是mpv输出这个错误的那个 检测到音频/视频不同步!可能的原因包括太慢 硬件、临时 CPU 峰值、损坏的驱动程序和损坏的文件。声音的 位置将与视频不匹配(请参阅 A-V 状态字段)。 https://drive.google.com/file/d/1DlIOxJGiqUHumvuOQBISPNtHDnfuaKDE/view?usp=sharing

https://drive.google.com/file/d/15fnrZT6XZw_CkOy51PsTbKG_F2ykM09M/view?usp=sharing 这个播放正常,但音频和视频不同步,不太明显,因为我只附加了 3 个视频,我附加了同一个视频 100 次,音频在视频前几分钟结束

我正在制作的“视频编辑器”的完整代码: https://github.com/LentilStew/video_transcoder

我很不擅长编程,所以我的代码不是很好。这篇文章中给出的sn-ps来自我做的这个小转码器 https://pastebin.com/VHLREVGf (它在一个 pastebin 中,因为 *** 中有 30000 个字符的限制) 我用这个编译 gcc main.c -o compiled.out pkg-config --libs libavformat libavfilter libavutil libavcodec libswscale libavdevice libavutil

编辑 1:

现在(感谢@Rotem)我可以正确计算音频点,但由于音频流可能比视频流短,当将所有视频附加在一起时,视频和音频不同步,为了解决这个问题,我尝试用空帧,直到所有流具有相同的 pts,但出现此错误 [aac @ 0xfc00880] 输入包含(接近)NaN/+-Inf 并且输出和不调用这个函数完全一样

说明我认为的问题是什么

填充流的函数

int fill_with_empty_frames_until_all_streams_match_pts(file *input, file *output)

    int res;

    int biggest_pts = 0;
    int biggest_pts_index = -1;

    for (int i = 0; i < output->container->nb_streams; i++)
    
        int curr_pts_in_new_time_base = av_rescale_q(output->pre_pts[i], output->container->streams[i]->time_base, (AVRational).den = 60000, .num = 1);
    
        if (curr_pts_in_new_time_base > biggest_pts)
        
            biggest_pts = curr_pts_in_new_time_base;
            biggest_pts_index = i;
        
    

    for (int i = 0; i < output->container->nb_streams; i++)
    

        AVCodecContext *codec = output->codec[i];
        AVRational fps = output->codec[i]->framerate;
        AVRational time_base = output->container->streams[i]->time_base;

        int frames_per_packet = 1;

        if (output->container->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
        
            fps.den = 1;
            fps.num = input->container->streams[i]->codecpar->sample_rate;

            ////////////////////////////////////////////////////////////////////
            frames_per_packet = input->container->streams[i]->codecpar->frame_size; //For the audio there are 1024 (or 960) frames per packet https://***.com/questions/23216103/about-definition-for-terms-of-audio-codec
            ////////////////////////////////////////////////////////////////////
        

        AVFrame *dummy_frame = av_frame_alloc();

        switch (output->container->streams[i]->codecpar->codec_type)
        
        case AVMEDIA_TYPE_AUDIO:
            dummy_frame->nb_samples = frames_per_packet;
            dummy_frame->format = output->container->streams[i]->codecpar->format;
            dummy_frame->channel_layout = output->container->streams[i]->codecpar->channel_layout;
            break;
        case AVMEDIA_TYPE_VIDEO:
            dummy_frame->width = output->container->streams[i]->codecpar->width;
            dummy_frame->height = output->container->streams[i]->codecpar->height;
            dummy_frame->format = output->container->streams[i]->codecpar->format;
            break;
        default:
            continue;
        

        av_frame_get_buffer(dummy_frame, 0);

        while (1)
        
            int64_t pkt_duration = (int64_t)(av_q2d(av_div_q((AVRational)time_base.den, 1, fps))) * (int64_t)frames_per_packet;
            int curr_pts_in_new_time_base = av_rescale_q(pkt_duration * output->frames[i], output->container->streams[i]->time_base, (AVRational).den = 60000, .num = 1);
            printf("biggest_pts %i\n", biggest_pts);
            printf("curr_pts_in_new_time_base %i\n", curr_pts_in_new_time_base);

            printf("Adding frame\n");
            if (biggest_pts <= curr_pts_in_new_time_base)
                break;
            dummy_frame->pkt_duration = pkt_duration;
            dummy_frame->pts = (int64_t)(dummy_frame->pkt_duration * output->frames[i]);

            dummy_frame->pkt_dts = dummy_frame->pts;

            printf("%i FRAME %i PTS %i\n", (int)i, (int)output->frames[i], (int)dummy_frame->pts);

            output->frames[i]++;

            res = encode_frame(output, dummy_frame, i);
        
        av_frame_free(&dummy_frame);
    

【问题讨论】:

【参考方案1】:

主要问题是将数据包持续时间设置为 pts:frame-&gt;pkt_duration = frame-&gt;pts

所有帧的持续时间通常相同,并且 pts 是递增的。

其他问题:

每个音频包都有多个音频帧。 根据以下post,“每个数据包可以有1024(或960)帧”。 我们必须通过“每个数据包的帧数”来缩放数据包持续时间和 pts。 av_packet_unref(output_packet) 好像没到位(不知道有没有米)。 视频时基的分辨率好像太低了,我修改为1/60000(不知道算不算)。 没有为数据包设置 pts、dts 和持续时间(我不知道我们是否必须按帧和每个数据包都设置它们)。 为了以防万一,我为每个数据包添加了 pts、dts 和持续时间。

我还在学习Libav的C接口。 我建议的解决方案可能并不完美......


这里是设置pts、dts和duration的更新代码:

AVRational fps = output->codec[packet->stream_index]->framerate;
int frames_per_packet = 1;
AVRational time_base = output->container->streams[packet->stream_index]->time_base;

if (input->container->streams[packet->stream_index]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)

    fps.den = 1;
    fps.num = input->container->streams[packet->stream_index]->codecpar->sample_rate;
    frames_per_packet = input->container->streams[packet->stream_index]->codecpar->frame_size;  //For the audio there are 1024 (or 960) frames per packet https://***.com/questions/23216103/about-definition-for-terms-of-audio-codec


frame->pkt_duration = (int64_t)(av_q2d(av_div_q((AVRational)  time_base.den, 1 , fps))) * (int64_t)frames_per_packet;
frame->pts = (int64_t)(frame->pkt_duration * output->frames[packet->stream_index]);
frame->pkt_dts = frame->pts;

完整的更新代码:

#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
#include <libavutil/pixdesc.h>
#include <libavutil/avutil.h>

typedef struct file

    AVFormatContext* container;
    AVCodecContext** codec;
    int* frames;
      
    //Counting packets (same principle as counting frames). 
    ////////////////////////////////////////////////////////////////////
    int* packets;
    ////////////////////////////////////////////////////////////////////
 file;

typedef struct EncoderContext

    file* encoder;

 EncoderContext;

file* create_output(int streams, const char* filename);
file* start_output_from_file(const char* path, file* input, const char* video_encoder, const char* audio_encoder);
int create_video_encoder(AVCodecContext** cod_ctx, AVFormatContext* container, const char* encoder, int width, int height,
    int pix_fmt, AVRational sample_aspect_ratio, AVRational frame_rate, int bit_rate, int buffer_size);
int create_audio_encoder(AVCodecContext** cod_ctx, AVFormatContext* container, const char* encoder,
    int channels, int sample_rate, int bit_rate);
int decode_frame(file* decoder, AVFrame* frame, AVPacket* packet);
int open_media(file* video, const char input_path[], const char* video_codec, const char* audio_codec);
void save_gray_frame(unsigned char* buf, int width, int height);
int free_file(file* f);
int encode_frame(file* encoder, AVFrame* input_frame, int index);
int stream_clip(file* input, file* output);

int main()

    int res;
    int inputs_len = 2;

    //file* input1 = malloc(sizeof(file));
    file* input1 = calloc(sizeof(file), 1);

    res = open_media(input1, "in_short.mp4", "h264_cuvid", NULL);

    file* output = start_output_from_file("output.mp4", input1, "h264_nvenc", NULL);

    if (res != 0 || !input1)
    
        printf("Failed opening input 1");
        return 1;
    

    stream_clip(input1, output);
    free_file(input1);

    for (int i = 0; i < inputs_len; i++)
    
        //file* input = malloc(sizeof(file));
        file* input = calloc(sizeof(file), 1);
        res = open_media(input, "in_short.mp4", "h264_cuvid", NULL);
        stream_clip(input, output);
        free_file(input);
    

    encode_frame(output, NULL, 0);
    encode_frame(output, NULL, 1);    

    av_write_trailer(output->container);

    free_file(output);


int stream_clip(file* input, file* output)

    AVPacket* packet = av_packet_alloc();
    AVFrame* frame = av_frame_alloc();
    int res;

    while (1)
    
        res = decode_frame(input, frame, packet);

        if (res == 1)
        
            printf("Error decoding a frame\n");
            av_frame_free(&frame);
            av_packet_free(&packet);

            return 1;
        
        else if (res == 0)
        

            AVCodecContext* codec = output->codec[packet->stream_index];
            AVRational fps = output->codec[packet->stream_index]->framerate;
            int frames_per_packet = 1;
            AVRational time_base = output->container->streams[packet->stream_index]->time_base;


            if (input->container->streams[packet->stream_index]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
            
                fps.den = 1;
                fps.num = input->container->streams[packet->stream_index]->codecpar->sample_rate;

                ////////////////////////////////////////////////////////////////////
                frames_per_packet = input->container->streams[packet->stream_index]->codecpar->frame_size;  //For the audio there are 1024 (or 960) frames per packet https://***.com/questions/23216103/about-definition-for-terms-of-audio-codec
                ////////////////////////////////////////////////////////////////////
            

            //Why pkt_duration = pts???
            ////////////////////////////////////////////////////////////////////
            //frame->pkt_duration = frame->pts;
            frame->pkt_duration = (int64_t)(av_q2d(av_div_q((AVRational)  time_base.den, 1 , fps))) * (int64_t)frames_per_packet;
            ////////////////////////////////////////////////////////////////////

            frame->pts = (int64_t)(frame->pkt_duration * output->frames[packet->stream_index]);

            frame->pkt_dts = frame->pts;

            printf("%i FRAME %i PTS %i\n", (int)packet->stream_index, (int)output->frames[packet->stream_index], (int)frame->pts);

            output->frames[packet->stream_index]++;

            res = encode_frame(output, frame, packet->stream_index);
            if (res == 1)
            
                av_frame_free(&frame);
                printf("Failde encoding frame\n");
                return 1;
            
            av_frame_unref(frame);
        

        else if (res == -1)
        
            printf("\nfile \"%s\" ended\n", input->container->url);
            break;
        
    

    av_frame_free(&frame);

    decode_frame(input, NULL, packet);

    av_packet_free(&packet);

    return 0;


int encode_frame(file* encoder, AVFrame* input_frame, int index)


    AVPacket* output_packet = av_packet_alloc();
    if (!output_packet)
    
        printf("ENCODER: Failed mallocing output_package");
        return 1;
    

    AVCodecContext* codec = encoder->codec[index];

    if (!codec)
        return 0;

    int response = avcodec_send_frame(codec, input_frame);

    while (response >= 0)
    
        //The packet unref is supposed to be here
        ////////////////////////////////////////////////////////////////////////
        av_packet_unref(output_packet);
        ////////////////////////////////////////////////////////////////////////

        response = avcodec_receive_packet(codec, output_packet);

        if (response == AVERROR(EAGAIN) || response == AVERROR_EOF)
        
            break;
        
        else if (response < 0)
        
            printf("ENCODER: Error receiving packet");

            return 1;
        

        output_packet->stream_index = index;

        //I think we have to set PTS, DTS and duration for each packet.
        ////////////////////////////////////////////////////////////////////////
        //output_packet->pts = input_frame->pts;
        //output_packet->dts = input_frame->pkt_dts;
        //output_packet->duration = input_frame->pkt_duration;
        AVRational fps = codec->framerate;
        int frames_per_packet = 1;

        if (encoder->container->streams[index]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
        
            fps.den = 1;
            fps.num = encoder->container->streams[index]->codecpar->sample_rate;

            ////////////////////////////////////////////////////////////////////
            frames_per_packet = encoder->container->streams[index]->codecpar->frame_size;  //For the audio there are 1024 (or 960) frames per packet https://***.com/questions/23216103/about-definition-for-terms-of-audio-codec
            ////////////////////////////////////////////////////////////////////
        

        AVRational time_base = encoder->container->streams[index]->time_base;

        output_packet->duration = (int64_t)(av_q2d(av_div_q((AVRational)  time_base.den, 1 , fps))) * (int64_t)frames_per_packet;
        output_packet->pts = (int64_t)(output_packet->duration * encoder->packets[index]);
        output_packet->dts = output_packet->pts;

        encoder->packets[index]++;  //Count packets
        ////////////////////////////////////////////////////////////////////////

        response = av_interleaved_write_frame(encoder->container, output_packet);

        if (response != 0)
        
            printf("ENCODER:failed writing frame");

            return 1;
        
    
    //av_packet_unref(output_packet);
    av_packet_free(&output_packet);

    return 0;


int free_file(file* f)

    int i;
    for (i = 0; i < (int)f->container->nb_streams; i++)
    
        if (f->codec[i] == NULL)
        
            continue;
        
        avcodec_free_context(&f->codec[i]);
    

    //av_free - Free a memory block which has been allocated with a function of av_malloc(), but f->codec is not allocated with av_malloc()???
    //av_free(f->codec);

    avformat_close_input(&f->container);

    ////////////////////////////////////////////////////////////////////////
    if (f->frames != NULL)
    
        free(f->frames);
    

    if (f->packets != NULL)
    
        free(f->packets);
    
    ////////////////////////////////////////////////////////////////////////

    free(f);

    return 0;


int open_media(file* video, const char input_path[], const char* video_codec, const char* audio_codec)

    video->container = avformat_alloc_context();

    if (!video->container)
    
        printf("Failed to alloc memory to the container of the input file");
        return 1;
    
    if (avformat_open_input(&video->container, input_path, NULL, NULL) != 0)
    
        printf("Failed to open input file");
        return 1;
    
    if (avformat_find_stream_info(video->container, NULL) < 0)
    
        printf("Failed to open read stream info");
        return 1;
    

    video->codec = calloc(video->container->nb_streams, sizeof(AVCodecContext*));

    for (unsigned int i = 0; i < video->container->nb_streams; i++)
    
        const char* curr_codec = NULL;

        AVStream* stream = video->container->streams[i];
        const AVCodec* dec;
        AVCodecContext* codec_ctx;

        if (AVMEDIA_TYPE_VIDEO == stream->codecpar->codec_type)
        
            curr_codec = video_codec;
        
        else if (AVMEDIA_TYPE_AUDIO == stream->codecpar->codec_type)
        
            curr_codec = audio_codec;
        

        if (curr_codec == NULL)
            dec = avcodec_find_decoder(stream->codecpar->codec_id);
        else
            dec = avcodec_find_decoder_by_name(video_codec);

        if (!dec)
        
            printf("failed to find the codec");
            return 1;
        

        codec_ctx = avcodec_alloc_context3(dec);
        if (!codec_ctx)
        
            printf("failed to alloc memory for codec context");
            return 1;
        

        if (avcodec_parameters_to_context(codec_ctx, stream->codecpar) < 0)
        
            printf("failed to fill codec context");
            return 1;
        

        if (avcodec_open2(codec_ctx, dec, NULL) < 0)
        
            printf("failed to open codec");
            return 1;
        

        video->codec[i] = codec_ctx;
    
    return 0;


/*
    returns:
    1 if error
    0 if success
    -1 if file ended
*/
int decode_frame(file* decoder, AVFrame* frame, AVPacket* packet)

    AVCodecContext* dec;

    while (1)
    
        av_packet_unref(packet);
        if (av_read_frame(decoder->container, packet) < 0)
            break;

        int index = packet->stream_index;

        dec = decoder->codec[index];

        int response = avcodec_send_packet(dec, packet);

        if (response < 0)
        
            printf("Error while sending packet to decoder");
            return 1;
        

        while (response >= 0)
        
            response = avcodec_receive_frame(dec, frame);
            if (response == AVERROR(EAGAIN) || response == AVERROR_EOF)
            
                break;
            
            else if (response < 0)
            
                printf("Error while receiving frame from decoder");
                return 1;
            
            if (response >= 0)
            
                return 0;
            
            av_frame_unref(frame);
        
    
    return -1;

int create_audio_encoder(AVCodecContext** cod_ctx, AVFormatContext* container, const char* encoder,
    int channels, int sample_rate, int bit_rate)

    AVStream* stream = avformat_new_stream(container, NULL);
    if (!stream)
    
        printf("CREATE AUDIO ENCODER: Failed allocating memory for stream");
        return 1;
    
    const AVCodec* enc = avcodec_find_encoder_by_name(encoder);
    if (!enc)
    
        printf("CREATE AUDIO ENCODER: Failed searching encoder");

        return 1;
    

    cod_ctx[0] = avcodec_alloc_context3(enc);

    if (!cod_ctx[0])
    
        printf("CREATE AUDIO ENCODER: Failed allocation codec context");
        return 1;
    

    cod_ctx[0]->channels = channels;
    cod_ctx[0]->channel_layout = av_get_default_channel_layout(channels);
    cod_ctx[0]->sample_rate = sample_rate;
    cod_ctx[0]->sample_fmt = *enc->sample_fmts;
    cod_ctx[0]->bit_rate = bit_rate;
    cod_ctx[0]->time_base = (AVRational) 1, sample_rate ; // 1/48000

    int res = 0;

    res = avcodec_open2(cod_ctx[0], enc, NULL);
    if (res < 0)
    
        printf("CREATE AUDIO ENCODER: couldn't open codec");
        return 1;
    

    res = avcodec_parameters_from_context(stream->codecpar, cod_ctx[0]);

    if (res < 0)
    
        printf("CREATE AUDIO ENCODER: failed setting codec parameters from context");
        return 1;
    

    return 0;


int create_video_encoder(AVCodecContext** cod_ctx, AVFormatContext* container, const char* encoder, int width, int height,
    int pix_fmt, AVRational sample_aspect_ratio, AVRational frame_rate, int bit_rate, int buffer_size)

    AVStream* stream = avformat_new_stream(container, NULL);
    if (!stream)
    
        printf("CREATE VIDEO ENCODER: Failed allocating memory for stream");
        return 1;
    
    const AVCodec* enc = avcodec_find_encoder_by_name(encoder);
    if (!enc)
    
        printf("CREATE VIDEO ENCODER: Failed searching encoder");

        return 1;
    

    cod_ctx[0] = avcodec_alloc_context3(enc);

    if (!cod_ctx[0])
    
        printf("CREATE VIDEO ENCODER: Failed allocation codec context");
        return 1;
    

    cod_ctx[0]->height = height;
    cod_ctx[0]->width = width;
    cod_ctx[0]->pix_fmt = pix_fmt;

    cod_ctx[0]->sample_aspect_ratio = sample_aspect_ratio;

    //It's not a good idea to set the video time base to 1/60 - we need higher resolution for allowing audio synchronization
    ////////////////////////////////////////////////////////////////////////////
    cod_ctx[0]->time_base = av_make_q(1, 60000);//av_inv_q(frame_rate); //av_inv_q(frame_rate);
    ////////////////////////////////////////////////////////////////////////////

    cod_ctx[0]->framerate = frame_rate;
    cod_ctx[0]->bit_rate = bit_rate;
    cod_ctx[0]->rc_buffer_size = buffer_size;
    cod_ctx[0]->rc_max_rate = buffer_size;
    cod_ctx[0]->rc_min_rate = buffer_size;

    stream->time_base = cod_ctx[0]->time_base; //cod_ctx->time_base;

    int res = 0;

    res = av_opt_set(cod_ctx[0]->priv_data, "preset", "fast", 0);

    if (res != 0)
    
        printf("CREATE VIDEO ENCODER: Failed opt set");
        return 1;
    

    res = avcodec_open2(cod_ctx[0], enc, NULL);
    if (res < 0)
    
        printf("CREATE VIDEO ENCODER: couldn't open codec");
        return 1;
    

    res = avcodec_parameters_from_context(stream->codecpar, cod_ctx[0]);

    if (res < 0)
    
        printf("CREATE VIDEO ENCODER: failed setting codec parameters from context");
        return 1;
    

    return 0;


file* start_output_from_file(const char* path, file* input, const char* video_encoder, const char* audio_encoder)

    int res;

    file* output = create_output(input->container->nb_streams, path);
    if (!output)
    
        return NULL;
    
    AVCodecContext* codec_ctx;
    output->frames = calloc(input->container->nb_streams, sizeof(int));
    output->packets = calloc(input->container->nb_streams, sizeof(int));
    for (int stream = 0; stream < (int)input->container->nb_streams; stream++)
    
        codec_ctx = input->codec[stream];

        switch (codec_ctx->codec_type)
        
        case AVMEDIA_TYPE_AUDIO:
            if (audio_encoder == NULL)
            
                audio_encoder = codec_ctx->codec_descriptor->name;
            
            res = create_audio_encoder(&output->codec[stream], output->container, audio_encoder, codec_ctx->channels, codec_ctx->sample_rate, (int)codec_ctx->bit_rate);

            break;

        case AVMEDIA_TYPE_VIDEO:
            if (video_encoder == NULL)
            
                video_encoder = codec_ctx->codec_descriptor->name;
            
            AVRational framerate = av_guess_frame_rate(input->container, input->container->streams[stream], NULL);
            res = create_video_encoder(&output->codec[stream], output->container, video_encoder, codec_ctx->width, codec_ctx->height,
                codec_ctx->sw_pix_fmt, (AVRational)  1, 1 , framerate, (int)codec_ctx->bit_rate, codec_ctx->rc_buffer_size);
            break;
        
        if (res != 0)
        
            printf("Failed opening encoder stream number %i \n", stream);
            return NULL;
        
    

    if (output->container->oformat->flags & AVFMT_GLOBALHEADER)
        output->container->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;

    if (!(output->container->oformat->flags & AVFMT_NOFILE))
    
        if (avio_open(&output->container->pb, path, AVIO_FLAG_WRITE) < 0)
        
            printf("could not open the output file");
            return NULL;
        
    

    AVDictionary* muxer_opts = NULL;

    if (avformat_write_header(output->container, &muxer_opts) < 0)
    
        printf("an error occurred when opening output file");
        return NULL;
    

    return output;


file* create_output(int streams, const char* filename)

    int res;

    //file* output = malloc(sizeof(file));
    file* output = calloc(sizeof(file), 1);
    if (!output)
    
        return NULL;
    
    res = avformat_alloc_output_context2(&output->container, NULL, NULL, filename);
    if (res < 0)
    
        printf("Failed opening output\n");
        return NULL;
    

    output->codec = av_calloc(streams, sizeof(AVCodecContext*));

    if (!output->codec)
    
        printf("Failed allocating ram for codec\n");
        return NULL;
    

    for (int stream = 0; stream < streams; stream++)
    
        output->codec[stream] = NULL;
    

    return output;

【讨论】:

这解决了一个问题,因为当将所有视频附加在一起时,音频流可能比视频流短在流式传输每个剪辑后调用它,它确实有效但不能解决问题,我将在帖子中添加此信息,以及您的贡献,感谢您的帮助! 我试图通过增加“音频间隙”之后的第一个音频数据包的 PTS 来创建音频间隙。该解决方案在使用 VLC 时有效,但其他玩家不尊重 PTS 差距。 嘿@Rotem你能把代码发给我吗,我什么都试过了,我不知道怎么办,我解决了这个错误[aac @ 0xfc00880] Input contains (near) NaN/+-Inf , (如果你不分配缓冲区,它不会出现),但它并没有修复它。 正如我发布的那样,我仍在学习,如果这个问题对我来说太高级了。不知道怎么解决。

以上是关于用ffmpeg c连接视频和音频时如何计算pts和dts的主要内容,如果未能解决你的问题,请参考以下文章

FFmpeg concat 视频和音频不同步

如何使用 FFMPEG 和 C 将音频和视频写入同一个文件?

ffmpeg.c pts 和 dts 是啥?这个代码块在 ffmpeg.c 中做了啥?

音视频时间戳计算

回复@吾爱林有关用ffmpeg加速视频和改变帧率测试

FFmpeg 播放时间计算