音视频大合集第四篇;走近音视频

Posted 初一十五啊

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了音视频大合集第四篇;走近音视频相关的知识,希望对你有一定的参考价值。

前言

关于音视频篇是那么有点长,关于音视频初识篇和初探篇已经完了,接下来是深入探索篇,还有一部分面试内容就要完结了。

10.19-24音视频中高级52部+面试
10.25-26高级Android组件化强化实战(一二)
10.27-11.3高级Android组件化强化实战(大厂架构演化20章)
中间所有的周六周日都休息😁

关注公众号:初一十五a
解锁 《Android十二大板块PDF》
音视频大合集,从初中高到面试应有尽有;让学习更贴近未来实战。已形成PDF版

十二个模块PDF内容如下

1.2022最新Android11位大厂面试专题,128道附答案
2.音视频大合集,从初中高到面试应有尽有
3.Android车载应用大合集,从零开始一起学
4.性能优化大合集,告别优化烦恼
5.Framework大合集,从里到外分析的明明白白
6.Flutter大合集,进阶Flutter高级工程师
7.compose大合集,拥抱新技术
8.Jetpack大合集,全家桶一次吃个够
9.架构大合集,轻松应对工作需求
10.Android基础篇大合集,根基稳固高楼平地起
11.Flutter番外篇:Flutter面试+项目实战+电子书
12.大厂高级Android组件化强化实战

整理不易,关注一下吧。开始进入正题,ღ( ´・ᴗ・` ) 🤔

一丶FFmpeg结构体: AVFormatContext 分析

在上一篇的文章,我们先不去继续了解其他模块,先针对在之前的学习中接触到的结构体进行分析,然后在根据功能源码,继续了解FFmpeg。

AVFormatContext是包含码流参数较多的结构体。本文将会详细分析一下该结构体里每个变量的含义和作用。

1.源码整理

首先我们先看一下结构体AVFormatContext的定义的结构体源码(位于libavformat/avformat.h,本人已经将相关注释翻译成中文,方便大家理解):

/**
 * I/O格式上下文
 *
 * sizeof(AVFormatContext)方法不能在libav*外部调用,使用avformat_alloc_context()来创建一个AVFormatContext.
 */
typedef struct AVFormatContext 
    /**
     * 一个用来记录和指向avoptions的类。由avformat_all_context()设置。
     * 如果(de)muxer存在私有option也会输出。
     */
    const AVClass *av_class;

    /**
     * 输入容器的格式结构体
     *
     * 只在解码中生成,由avformat_open_input()生成
     */
    struct AVInputFormat *iformat;

    /**
     * 输出容器的格式的结构体
     *
     * 只在编码中生成后,必须在调用avformat_write_header()方法之前被生成好。
     */
    struct AVOutputFormat *oformat;

    /**
     * 私有数据的格式。这是一个AVOptions-enabled的结构体。
     * 当且仅当iformat/oformat.priv_class不为空的时候才会用到。
     *
     * - 编码时: 由avformat_write_header()设置
     * - 解码时: 由avformat_open_input()设置
     */
    void *priv_data;

    /**
     * 输入/输出上下文.
     *
     * - 解码时: 可以由用户自己设置(在avformat_open_intput()之前,而且必须手动关闭),也可以由avformat_open_input()设置.
     * - 编码时: 由用户设置(在avformat_write_header之前).调用者必须注意关闭和释放的问题。
     *
     * 如果在iformat/oformat.flags里面设置了AVFMT_NOFILE的标志,就不要设置设个字段。 因为在这个情况下,编解码器将以其他的方式进行I/O操作,这个字段将为NULL.
     */
    AVIOContext *pb;

    /***************************** 流信息相关字段 ***********************************/
    /**
     * 流属性标志.是AVFMTCTX_*的集合
     * 由libavformat设置.
     */
    int ctx_flags;

    /**
     * AVFormatContext.streams -- 流的数量
     *
     * 由avformat_new_stream()设置,而且不能被其他代码更改.
     */
    unsigned int nb_streams;
    /**
     * 文件中所有流的列表.新的流主要由avformat_new_stream()创建.
     *
     * - 解码时: 流是在avformat_open_input()方法里,由libavformat创建的。如果在ctx_flags里面设置了AVFMTCTX_NOHEADER,那么新的流也可能由av_read_frame()创建.
     * - 编码时: 流是由用户创建的(在调用avformat_write_header()之前).
     *
     * 在avformat_free_context()释放.
     */
    AVStream **streams;

#if FF_API_FORMAT_FILENAME
    /**
     * 输入或输出的文件名
     *
     * - 解码时: 由avformat_open_input()设置
     * - 编码时: 应该在调用avformat_write_header之前由调用者设置
     *
     * @deprecated 本字段目前已经启用,更改为使用url地址
     */
    attribute_deprecated
    char filename[1024];
#endif

    /**
     * 输入或输出的URL. 和旧文件名字段不同的是,这个字段没有长度限制.
     *
     * - 解码时: 有avformat_open_input()设置, 如果在avformat_open_input()设置的参数为NULL,则初始化为空字符串
     * - 编码时: 应该在调用avformat_writer_header()之前由调用者设置(或者调用avformat_init_output_()进行设置),如果在avformat_open_output()设置的参数为NULL,则初始化为空字符串。
     *
     * 调用avformat_free_context()后由libavformat释放.
     */
    char *url;

    /**
     * 第一帧的时间(AV_TIME_BASE:单位为微秒),不要直接设置这个值,这个值是由AVStream推算出来的。
     *
     * 仅用于解码,由libavformat设置.
     */
    int64_t start_time;

    /**
     * 流的时长(单位AV_TIME_BASE:微秒)
     *
     * 仅用于解码时,由libavformat设置.
     */
    int64_t duration;

    /**
     * 所有流的比特率,如果不可用的时候为0。不要设置这个字段,这个字段的值是由FFmpeg自动计算出来的。
     */
    int64_t bit_rate;

    unsigned int packet_size;
    int max_delay;

    /**
     * 用于修改编(解)码器行为的标志,由AVFMT_FLAG_*集合构成,需要用户在调用avformat_open_input()或avformat_write_header()之前进行设置
     */
    int flags;
#define AVFMT_FLAG_*       0x**** //*****

    /**
     * 在确定输入格式的之前的最大输入数据量.
     * 仅用于解码, 在调用avformat_open_input()之前设置。
     */
    int64_t probesize;

    /**
     * 从avformat_find_stream_info()的输入数据里面读取的最大时长(单位AV_TIME_BASE:微秒)
     * 仅用于解码, 在avformat_find_stream_info()设置
     * 可以设置0让avformat使用启发式机制.
     */
    int64_t max_analyze_duration;

    const uint8_t *key;
    int keylen;

    unsigned int nb_programs;
    AVProgram **programs;

    /**
     * 强制使用指定codec_id视频解码器
     * 仅用于解码时: 由用户自己设置
     */
    enum AVCodecID video_codec_id;

    /**
     * 强制使用指定codec_id音频解码器
     * 仅用于解码时: 由用户自己设置.
     */
    enum AVCodecID audio_codec_id;

    /**
     * 强制使用指定codec_id字母解码器
     * 仅用于解码时: 由用户自己设置.
     */
    enum AVCodecID subtitle_codec_id;

    /**
     * 每个流的最大内存索引使用量。
     * 如果超过了大小,就会丢弃一些,这可能会使得seek操作更慢且不精准。
     * 如果提供了全部内存使用索引,这个字段会被忽略掉.
     * - 编码时: 未使用
     * - 解码时: 由用户设置
     */
    unsigned int max_index_size;

    /**
     * 最大缓冲帧的内存使用量(从实时捕获设备中获得的帧数据)
     */
    unsigned int max_picture_buffer;

    /**
     * AVChapter数组的数量
     */
    unsigned int nb_chapters;
    AVChapter **chapters;

    /**
     * 整个文件的元数据
     *
     * - 解码时: 在avformat_open_input()方法里由libavformat设置
     * - 编码时: 可以由用户设置(在avformat_write_header()之前)
     *
     * 在avformat_free_context()方法里面由libavformat释放
     */
    AVDictionary *metadata;

    /**
     * 流开始的绝对时间(真实世界时间)
     */
    int64_t start_time_realtime;

    /**
     * 用于确定帧速率的帧数
     * 仅在解码时使用
     */
    int fps_probe_size;

    /**
     * 错误识别级别.
     */
    int error_recognition;

    /**
     * I/O层的自定义中断回调.
     */
    AVIOInterruptCB interrupt_callback;

    /**
     * 启动调试的标志
     */
    int debug;
#define FF_FDEBUG_TS        0x0001

    /**
     * 最大缓冲持续时间
     */
    int64_t max_interleave_delta;

    /**
     * 允许非标准扩展和实验
     */
    int strict_std_compliance;

    /**
     * 检测文件上发生事件的标志
     */
    int event_flags;
#define AVFMT_EVENT_FLAG_METADATA_UPDATED 0x0001

    /**
     * 等待第一个事件戳要读取的最大包数
     * 仅解码
     */
    int max_ts_probe;

    /**
     * 在编码期间避免负时间戳.
     * 值的大小应该是AVFMT_AVOID_NEG_TS_*其中之一.
     * 注意,这个设置只会在av_interleaved_write_frame生效
     * - 编码时: 由用户设置
     * - 解码时: 未使用
     */
    int avoid_negative_ts;
#define AVFMT_AVOID_NEG_TS_*

    /**
     * 传输流id.
     * 这个将被转移到解码器的私有属性. 所以没有API/ABI兼容性
     */
    int ts_id;

    /**
     * 音频预加载时间(单位:毫秒)
     * 注意:并非所有的格式都支持这个功能,如果在不支持的时候使用,可能会发生不可预测的事情.
     * - 编码时: 由用户设置
     * - 解码时: 未使用
     */
    int audio_preload;

    /**
     * 最大块时间(单位:微秒).
     * 注意:并非所有格式都支持这个功能,如果在不支持的时候使用,可能会发生不可预测的事情.
     * - 编码时: 由用户设置
     * - 解码时: 未使用
     */
    int max_chunk_duration;

    /**
     * 最大块大小(单位:bytes)
     * 注意:并非所有格式都支持这个功能,如果在不支持的时候使用,可能会发生不可预测的事情.
     * - 编码时: 由用户设置
     * - 解码时: 未使用
     */
    int max_chunk_size;

    /**
     * 强制使用wallclock时间戳作为数据包的pts/dts
     */
    int use_wallclock_as_timestamps;

    /**
     * avio标志
     */
    int avio_flags;

    /**
     * 可以用各种方法估计事件的字段
     */
    enum AVDurationEstimationMethod duration_estimation_method;

    /**
     * 打开流时跳过初始字节
     */
    int64_t skip_initial_bytes;

    /**
     * 纠正单个时间戳溢出
     */
    unsigned int correct_ts_overflow;

    /**
     * 强制寻找任何帧
     */
    int seek2any;

    /**
     * 在每个包只会刷新I/O context
     */
    int flush_packets;

    /**
     * 格式探索得分
     */
    int probe_score;

    /**
     * 最大读取字节数(用于识别格式)
     */
    int format_probesize;

    /**
     * 允许的编码器列表(通过','分割)
     */
    char *codec_whitelist;

    /**
     * 允许的解码器列表(通过','分割 )
     */
    char *format_whitelist;

    ......./**
     * 强制视频解码器
     */
    AVCodec *video_codec;

    /**
     * 强制音频解码器
     */
    AVCodec *audio_codec;

    /**
     * 强制字母解码器
     */
    AVCodec *subtitle_codec;

    /**
     * 强制数据解码器
     */
    AVCodec *data_codec;

    /**
     * 在元数据头中写入填充的字节数
     */
    int metadata_header_padding;

    /**
     * 用户数据(放置私人数据的地方)
     */
    void *opaque;

    /**
     * 用于设备和应用程序之间的回调
     */
    av_format_control_message control_message_cb;

    /**
     * 输出时间戳偏移量(单位:微秒)
     */
    int64_t output_ts_offset;

    /**
     * 转储格式分隔符
     */
    uint8_t *dump_separator;

    /**
     * 强制使用的数据解码器id
     */
    enum AVCodecID data_codec_id;

#if FF_API_OLD_OPEN_CALLBACKS
    /**
     * 需要为解码开启更多的IO contexts时调用
     * @deprecated 已弃用,建议使用io_open and io_close.
     */
    attribute_deprecated
    int (*open_cb)(struct AVFormatContext *s, AVIOContext **p, const char *url, int flags, const AVIOInterruptCB *int_cb, AVDictionary **options);
#endif

    /**
     * ',' separated list of allowed protocols.
     * - encoding: unused
     * - decoding: set by user
     */
    char *protocol_whitelist;

    /**
     * 打开新IO流的回调
     */
    int (*io_open)(struct AVFormatContext *s, AVIOContext **pb, const char *url,
                   int flags, AVDictionary **options);

    /**
     * 关闭流的回调(流是由AVFormatContext.io_open()打开的)
     */
    void (*io_close)(struct AVFormatContext *s, AVIOContext *pb);

    /**
     * ',' 单独的不允许的协议的列表
     * - 编码: 没使用到
     * - 解码: 由用户设置
     */
    char *protocol_blacklist;

    /**
     * 最大流数
     * - 编码: 没使用到
     * - 解码: 由用户设置
     */
    int max_streams;
 AVFormatContext;

2.AVForamtContext 重点字段

在使用FFMPEG进行开发的时候,AVFormatContext是一个贯穿始终的数据结构,很多函数都要用到它作为参数。它是FFMPEG解封装(flv,mp4,rmvb,avi)功能的结构体。下面看几个主要变量的作用(在这里考虑解码的情况):

struct AVInputFormat *iformat:输入数据的封装格式
AVIOContext *pb:输入数据的缓存
unsigned int nb_streams:视音频流的个数
AVStream **streams:视音频流
char filename[1024]:文件名
int64_t duration:时长(单位:微秒us,转换为秒需要除以1000000)
int bit_rate:比特率(单位bps,转换为kbps需要除以1000)
AVDictionary *metadata:元数据

视频的时长可以转换成HH:MM:SS的形式,示例代码如下:

AVFormatContext *pFormatCtx;
CString timelong;
...
//duration是以微秒为单位
//转换成hh:mm:ss形式
int tns, thh, tmm, tss;
tns  = (pFormatCtx->duration)/1000000;
thh  = tns / 3600;
tmm  = (tns % 3600) / 60;
tss  = (tns % 60);
timelong.Format("%02d:%02d:%02d",thh,tmm,tss);

视频的原数据(metadata)信息可以通过AVDictionary获取。元数据存储在AVDictionaryEntry结构体中,如下所示:

typedef struct AVDictionaryEntry 
    char *key;
    char *value;
 AVDictionaryEntry;

每一条元数据分为key和value两个属性。

在ffmpeg中通过av_dict_get()函数获得视频的原数据。

下列代码显示了获取元数据并存入meta字符串变量的过程,注意每一条key和value之间有一个"\\t:“,value之后有一个”\\r\\n"

//MetaData
//从AVDictionary获得
//需要用到AVDictionaryEntry对象
//CString author,copyright,description;
CString meta=NULL,key,value;
AVDictionaryEntry *m = NULL;
//不用一个一个找出来
/*    m=av_dict_get(pFormatCtx->metadata,"author",m,0);
author.Format("作者:%s",m->value);
m=av_dict_get(pFormatCtx->metadata,"copyright",m,0);
copyright.Format("版权:%s",m->value);
m=av_dict_get(pFormatCtx->metadata,"description",m,0);
description.Format("描述:%s",m->value);
*/
//使用循环读出
//(需要读取的数据,字段名称,前一条字段(循环时使用),参数)
while(m=av_dict_get(pFormatCtx->metadata,"",m,AV_DICT_IGNORE_SUFFIX))
    key.Format(m->key);
    value.Format(m->value);
    meta+=key+"\\t:"+value+"\\r\\n" ;

二丶FFmpeg 结构体: AVStream 分析

在上文我们学习了AVFormatContext结构体的相关内容。本文,我们将讲述一下AVStream。

AVStream是存储每一个视频/音频流信息的结构体。下面我们来分析一下该结构体里重要变量的含义和作用。

1.源码整理

首先我们先看一下结构体AVStream的定义的结构体源码(位于libavformat/avformat.h):

/**
 * Stream structure.
 * New fields can be added to the end with minor version bumps.
 * Removal, reordering and changes to existing fields require a major
 * version bump.
 * sizeof(AVStream) must not be used outside libav*.
 */
typedef struct AVStream 
    int index;    /**< stream index in AVFormatContext */
    /**
     * Format-specific stream ID.
     * decoding: set by libavformat
     * encoding: set by the user
     */
    int id;
    AVCodecContext *codec; /**< codec context */
    /**
     * Real base framerate of the stream.
     * This is the lowest framerate with which all timestamps can be
     * represented accurately (it is the least common multiple of all
     * framerates in the stream). Note, this value is just a guess!
     * For example, if the time base is 1/90000 and all frames have either
     * approximately 3600 or 1800 timer ticks, then r_frame_rate will be 50/1.
     */
    AVRational r_frame_rate;
    void *priv_data;
 
    /**
     * encoding: pts generation when outputting stream
     */
    struct AVFrac pts;
 
    /**
     * This is the fundamental unit of time (in seconds) in terms
     * of which frame timestamps are represented. For fixed-fps content,
     * time base should be 1/framerate and timestamp increments should be 1.
     * decoding: set by libavformat
     * encoding: set by libavformat in av_write_header
     */
    AVRational time_base;
 
    /**
     * Decoding: pts of the first frame of the stream in presentation order, in stream time base.
     * Only set this if you are absolutely 100% sure that the value you set
     * it to really is the pts of the first frame.
     * This may be undefined (AV_NOPTS_VALUE).
     * @note The ASF header does NOT contain a correct start_time the ASF
     * demuxer must NOT set this.
     */
    int64_t start_time;
 
    /**
     * Decoding: duration of the stream, in stream time base.
     * If a source file does not specify a duration, but does specify
     * a bitrate, this value will be estimated from bitrate and file size.
     */
    int64_t duration;
 
    int64_t nb_frames;                 ///< number of frames in this stream if known or 0
 
    int disposition; /**< AV_DISPOSITION_* bit field */
 
    enum AVDiscard discard; ///< Selects which packets can be discarded at will and do not need to be demuxed.
 
    /**
     * sample aspect ratio (0 if unknown)
     * - encoding: Set by user.
     * - decoding: Set by libavformat.
     */
    AVRational sample_aspect_ratio;
 
    AVDictionary *metadata;
 
    /**
     * Average framerate
     */
    AVRational avg_frame_rate;
 
    /**
     * For streams with AV_DISPOSITION_ATTACHED_PIC disposition, this packet
     * will contain the attached picture.
     *
     * decoding: set by libavformat, must not be modified by the caller.
     * encoding: unused
     */
    AVPacket attached_pic;
 
    /*
     * All fields below this line are not part of the public API. They
     * may not be used outside of libavformat and can be changed and
     * removed at will.
     * New public fields should be added right above.
     
     */
 
    /**
     * Stream information used internally by av_find_stream_info()
     */
#define MAX_STD_TIMEBASES (60*12+5)
    struct 
        int64_t last_dts;
        int64_t duration_gcd;
        int duration_count;
        double duration_error[2][2][MAX_STD_TIMEBASES];
        int64_t codec_info_duration;
        int nb_decoded_frames;
        int found_decoder;
     *info;
 
    int pts_wrap_bits; /**< number of bits in pts (used for wrapping control) */
 
    // Timestamp generation support:
    /**
     * Timestamp corresponding to the last dts sync point.
     *
     * Initialized when AVCodecParserContext.dts_sync_point >= 0 and
     * a DTS is received from the underlying container. Otherwise set to
     * AV_NOPTS_VALUE by default.
     */
    int64_t reference_dts;
    int64_t first_dts;
    int64_t cur_dts;
    int64_t last_IP_pts;
    int last_IP_duration;
 
    /**
     * Number of packets to buffer for codec probing
     */
#define MAX_PROBE_PACKETS 2500
    int probe_packets;
 
    /**
     * Number of frames that have been demuxed during av_find_stream_info()
     */
    int codec_info_nb_frames;
 
    /**
     * Stream Identifier
     * This is the MPEG-TS stream identifier +1
     * 0 means unknown
     */
    int stream_identifier;
 
    int64_t interleaver_chunk_size;
    int64_t interleaver_chunk_duration;
 
    /* av_read_frame() support */
    enum AVStreamParseType need_parsing;
    struct AVCodecParserContext *parser;
 
    /**
     * last packet in packet_buffer for this stream when muxing.
     */
    struct AVPacketList *last_in_packet_buffer;
    AVProbeData probe_data;
#define MAX_REORDER_DELAY 16
    int64_t pts_buffer[MAX_REORDER_DELAY+1];
 
    AVIndexEntry *index_entries; /**< Only used if the format does not
                                    support seeking natively. */
    int nb_index_entries;
    unsigned int index_entries_allocated_size;
 
    /**
     * flag to indicate that probing is requested
     * NOT PART OF PUBLIC API
     */
    int request_probe;
 AVStream;

2.AVStream 重点字段

int index:标识该视频/音频流

AVCodecContext *codec:指向该视频/音频流的AVCodecContext(它们是一一对应的关系)

AVRational time_base:时基。通过该值可以把PTS,DTS转化为真正的时间。FFMPEG其他结构体中也有这个字段,但是根据我的经验,只有AVStream中的time_base是可用的。PTS*time_base=真正的时间

int64_t duration:该视频/音频流长度

AVDictionary *metadata:元数据信息

AVRational avg_frame_rate:帧率(注:对视频来说,这个挺重要的)

AVPacket attached_pic:附带的图片。比如说一些MP3,AAC音频文件附带的专辑封面。

三丶FFmpeg 结构体: AVPacket 分析

在上文我们学习了AVStream结构体的相关内容。本文,我们将讲述一下AVPacket。

AVPacket是存储压缩编码数据相关信息的结构体。下面我们来分析一下该结构体里重要变量的含义和作用。

1.源码整理

首先我们先看一下结构体AVPacket的定义的结构体源码(位于libavcodec/avcodec.h):

typedef struct AVPacket 
    /**
     * Presentation timestamp in AVStream->time_base units; the time at which
     * the decompressed packet will be presented to the user.
     * Can be AV_NOPTS_VALUE if it is not stored in the file.
     * pts MUST be larger or equal to dts as presentation cannot happen before
     * decompression, unless one wants to view hex dumps. Some formats misuse
     * the terms dts and pts/cts to mean something different. Such timestamps
     * must be converted to true pts/dts before they are stored in AVPacket.
     */
    int64_t pts;
    /**
     * Decompression timestamp in AVStream->time_base units; the time at which
     * the packet is decompressed.
     * Can be AV_NOPTS_VALUE if it is not stored in the file.
     */
    int64_t dts;
    uint8_t *data;
    int   size;
    int   stream_index;
    /**
     * A combination of AV_PKT_FLAG values
     */
    int   flags;
    /**
     * Additional packet data that can be provided by the container.
     * Packet can contain several types of side information.
     */
    struct 
        uint8_t *data;
        int      size;
        enum AVPacketSideDataType type;
     *side_data;
    int side_data_elems;
 
    /**
     * Duration of this packet in AVStream->time_base units, 0 if unknown.
     * Equals next_pts - this_pts in presentation order.
     */
    int   duration;
    void  (*destruct)(struct AVPacket *);
    void  *priv;
    int64_t pos;                            ///< byte position in stream, -1 if unknown
 
    /**
     * Time difference in AVStream->time_base units from the pts of this
     * packet to the point at which the output from the decoder has converged
     * independent from the availability of previous frames. That is, the
     * frames are virtually identical no matter if decoding started from
     * the very first frame or from this keyframe.
     * Is AV_NOPTS_VALUE if unknown.
     * This field is not the display duration of the current packet.
     * This field has no meaning if the packet does not have AV_PKT_FLAG_KEY
     * set.
     *
     * The purpose of this field is to allow seeking in streams that have no
     * keyframes in the conventional sense. It corresponds to the
     * recovery point SEI in H.264 and match_time_delta in NUT. It is also
     * essential for some types of subtitle streams to ensure that all
     * subtitles are correctly displayed after seeking.
     */
    int64_t convergence_duration;
 AVPacket;

2.AVPacket 重点字段

uint8_t *data:压缩编码的数据。

int   size:data的大小

int64_t pts:显示时间戳

int64_t dts:解码时间戳

int   stream_index:标识该AVPacket所属的视频/音频流。

针对data做一下说明:对于H.264格式来说,在使用FFMPEG进行视音频处理的时候,我们常常可以将得到的AVPacket的data数据直接写成文件,从而得到视音频的码流文件。

四丶FFmpeg 结构体: AVFrame 分析

在上文我们学习了AVPacket结构体的相关内容。本文,我们将讲述一下AVFrame。

AVFrame是包含码流参数较多的结构体。下面我们来分析一下该结构体里重要变量的含义和作用。

1.源码整理

首先我们先看一下结构体AVFrame的定义的结构体源码(位于libavcodec/avcodec.h):

/**
 * Audio Video Frame.
 * New fields can be added to the end of AVFRAME with minor version
 * bumps. Similarly fields that are marked as to be only accessed by
 * av_opt_ptr() can be reordered. This allows 2 forks to add fields
 * without breaking compatibility with each other.
 * Removal, reordering and changes in the remaining cases require
 * a major version bump.
 * sizeof(AVFrame) must not be used outside libavcodec.
 */
typedef struct AVFrame 
#define AV_NUM_DATA_POINTERS 8
    /**图像数据
     * pointer to the picture/channel planes.
     * This might be different from the first allocated byte
     * - encoding: Set by user
     * - decoding: set by AVCodecContext.get_buffer()
     */
    uint8_t *data[AV_NUM_DATA_POINTERS];
 
    /**
     * Size, in bytes, of the data for each picture/channel plane.
     *
     * For audio, only linesize[0] may be set. For planar audio, each channel
     * plane must be the same size.
     *
     * - encoding: Set by user
     * - decoding: set by AVCodecContext.get_buffer()
     */
    int linesize[AV_NUM_DATA_POINTERS];
 
    /**
     * pointers to the data planes/channels.
     *
     * For video, this should simply point to data[].
     *
     * For planar audio, each channel has a separate data pointer, and
     * linesize[0] contains the size of each channel buffer.
     * For packed audio, there is just one data pointer, and linesize[0]
     * contains the total size of the buffer for all channels.
     *
     * Note: Both data and extended_data will always be set by get_buffer(),
     * but for planar audio with more channels that can fit in data,
     * extended_data must be used by the decoder in order to access all
     * channels.
     *
     * encoding: unused
     * decoding: set by AVCodecContext.get_buffer()
     */
    uint8_t **extended_data;
 
    /**宽高
     * width and height of the video frame
     * - encoding: unused
     * - decoding: Read by user.
     */
    int width, height;
 
    /**
     * number of audio samples (per channel) described by this frame
     * - encoding: Set by user
     * - decoding: Set by libavcodec
     */
    int nb_samples;
 
    /**
     * format of the frame, -1 if unknown or unset
     * Values correspond to enum AVPixelFormat for video frames,
     * enum AVSampleFormat for audio)
     * - encoding: unused
     * - decoding: Read by user.
     */
    int format;
 
    /**是否是关键帧
     * 1 -> keyframe, 0-> not
     * - encoding: Set by libavcodec.
     * - decoding: Set by libavcodec.
     */
    int key_frame;
 
    /**帧类型(I,B,P)
     * Picture type of the frame, see ?_TYPE below.
     * - encoding: Set by libavcodec. for coded_picture (and set by user for input).
     * - decoding: Set by libavcodec.
     */
    enum AVPictureType pict_type;
 
    /**
     * pointer to the first allocated byte of the picture. Can be used in get_buffer/release_buffer.
     * This isn't used by libavcodec unless the default get/release_buffer() is used.
     * - encoding:
     * - decoding:
     */
    uint8_t *base[AV_NUM_DATA_POINTERS];
 
    /**
     * sample aspect ratio for the video frame, 0/1 if unknown/unspecified
     * - encoding: unused
     * - decoding: Read by user.
     */
    AVRational sample_aspect_ratio;
 
    /**
     * presentation timestamp in time_base units (time when frame should be shown to user)
     * If AV_NOPTS_VALUE then frame_rate = 1/time_base will be assumed.
     * - encoding: MUST be set by user.
     * - decoding: Set by libavcodec.
     */
    int64_t pts;
 
    /**
     * reordered pts from the last AVPacket that has been input into the decoder
     * - encoding: unused
     * - decoding: Read by user.
     */
    int64_t pkt_pts;
 
    /**
     * dts from the last AVPacket that has been input into the decoder
     * - encoding: unused
     * - decoding: Read by user.
     */
    int64_t pkt_dts;
 
    /**
     * picture number in bitstream order
     * - encoding: set by
     * - decoding: Set by libavcodec.
     */
    int coded_picture_number;
    /**
     * picture number in display order
     * - encoding: set by
     * - decoding: Set by libavcodec.
     */
    int display_picture_number;
 
    /**
     * quality (between 1 (good) and FF_LAMBDA_MAX (bad))
     * - encoding: Set by libavcodec. for coded_picture (and set by user for input).
     * - decoding: Set by libavcodec.
     */
    int quality;
 
    /**
     * is this picture used as reference
     * The values for this are the same as the MpegEncContext.picture_structure
     * variable, that is 1->top field, 2->bottom field, 3->frame/both fields.
     * Set to 4 for delayed, non-reference frames.
     * - encoding: unused
     * - decoding: Set by libavcodec. (before get_buffer() call)).
     */
    int reference;
 
    /**QP表
     * QP table
     * - encoding: unused
     * - decoding: Set by libavcodec.
     */
    int8_t *qscale_table;
    /**
     * QP store stride
     * - encoding: unused
     * - decoding: Set by libavcodec.
     */
    int qstride;
 
    /**
     *
     */
    int qscale_type;
 
    /**跳过宏块表
     * mbskip_table[mb]>=1 if MB didn't change
     * stride= mb_width = (width+15)>>4
     * - encoding: unused
     * - decoding: Set by libavcodec.
     */
    uint8_t *mbskip_table;
 
    /**运动矢量表
     * motion vector table
     * @code
     * example:
     * int mv_sample_log2= 4 - motion_subsample_log2;
     * int mb_width= (width+15)>>4;
     * int mv_stride= (mb_width << mv_sample_log2) + 1;
     * motion_val[direction][x + y*mv_stride][0->mv_x, 1->mv_y];
     * @endcode
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    int16_t (*motion_val[2])[2];
 
    /**宏块类型表
     * macroblock type table
     * mb_type_base + mb_width + 2
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    uint32_t *mb_type;
 
    /**DCT系数
     * DCT coefficients
     * - encoding: unused
     * - decoding: Set by libavcodec.
     */
    short *dct_coeff;
 
    /**参考帧列表
     * motion reference frame index
     * the order in which these are stored can depend on the codec.
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    int8_t *ref_index[2];
 
    /**
     * for some private data of the user
     * - encoding: unused
     * - decoding: Set by user.
     */
    void *opaque;
 
    /**
     * error
     * - encoding: Set by libavcodec. if flags&CODEC_FLAG_PSNR.
     * - decoding: unused
     */
    uint64_t error[AV_NUM_DATA_POINTERS];
 
    /**
     * type of the buffer (to keep track of who has to deallocate data[*])
     * - encoding: Set by the one who allocates it.
     * - decoding: Set by the one who allocates it.
     * Note: User allocated (direct rendering) & internal buffers cannot coexist currently.
     */
    int type;
 
    /**
     * When decoding, this signals how much the picture must be delayed.
     * extra_delay = repeat_pict / (2*fps)
     * - encoding: unused
     * - decoding: Set by libavcodec.
     */
    int repeat_pict;
 
    /**
     * The content of the picture is interlaced.
     * - encoding: Set by user.
     * - decoding: Set by libavcodec. (default 0)
     */
    int interlaced_frame;
 
    /**
     * If the content is interlaced, is top field displayed first.
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    int top_field_first;
 
    /**
     * Tell user application that palette has changed from previous frame.
     * - encoding: ??? (no palette-enabled encoder yet)
     * - decoding: Set by libavcodec. (default 0).
     */
    int palette_has_changed;
 
    /**
     * codec suggestion on buffer type if != 0
     * - encoding: unused
     * - decoding: Set by libavcodec. (before get_buffer() call)).
     */
    int buffer_hints;
 
    /**
     * Pan scan.
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    AVPanScan *pan_scan;
 
    /**
     * reordered opaque 64bit (generally an integer or a double precision float
     * PTS but can be anything).
     * The user sets AVCodecContext.reordered_opaque to represent the input at
     * that time,
     * the decoder reorders values as needed and sets AVFrame.reordered_opaque
     * to exactly one of the values provided by the user through AVCodecContext.reordered_opaque
     * @deprecated in favor of pkt_pts
     * - encoding: unused
     * - decoding: Read by user.
     */
    int64_t reordered_opaque;
 
    /**
     * hardware accelerator private data (FFmpeg-allocated)
     * - encoding: unused
     * - decoding: Set by libavcodec
     */
    void *hwaccel_picture_private;
 
    /**
     * the AVCodecContext which ff_thread_get_buffer() was last called on
     * - encoding: Set by libavcodec.
     * - decoding: Set by libavcodec.
     */
    struct AVCodecContext *owner;
 
    /**
     * used by multithreading to store frame-specific info
     * - encoding: Set by libavcodec.
     * - decoding: Set by libavcodec.
     */
    void *thread_opaque;
 
    /**
     * log2 of the size of the block which a single vector in motion_val represents:
     * (4->16x16, 3->8x8, 2-> 4x4, 1-> 2x2)
     * - encoding: unused
     * - decoding: Set by libavcodec.
     */
    uint8_t motion_subsample_log2;
 
    /**(音频)采样率
     * Sample rate of the audio data.
     *
     * - encoding: unused
     * - decoding: read by user
     */
    int sample_rate;
 
    /**
     * Channel layout of the audio data.
     *
     * - encoding: unused
     * - decoding: read by user.
     */
    uint64_t channel_layout;
 
    /**
     * frame timestamp estimated using various heuristics, in stream time base
     * Code outside libavcodec should access this field using:
     * av_frame_get_best_effort_timestamp(frame)
     * - encoding: unused
     * - decoding: set by libavcodec, read by user.
     */
    int64_t best_effort_timestamp;
 
    /**
     * reordered pos from the last AVPacket that has been input into the decoder
     * Code outside libavcodec should access this field using:
     * av_frame_get_pkt_pos(frame)
     * - encoding: unused
     * - decoding: Read by user.
     */
    int64_t pkt_pos;
 
    /**
     * duration of the corresponding packet, expressed in
     * AVStream->time_base units, 0 if unknown.
     * Code outside libavcodec should access this field using:
     * av_frame_get_pkt_duration(frame)
     * - encoding: unused
     * - decoding: Read by user.
     */
    int64_t pkt_duration;
 
    /**
     * metadata.
     * Code outside libavcodec should access this field using:
     * av_frame_get_metadata(frame)
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    AVDictionary *metadata;
 
    /**
     * decode error flags of the frame, set to a combination of
     * FF_DECODE_ERROR_xxx flags if the decoder produced a frame, but there
     * were errors during the decoding.
     * Code outside libavcodec should access this field using:
     * av_frame_get_decode_error_flags(frame)
     * - encoding: unused
     * - decoding: set by libavcodec, read by user.
     */
    int decode_error_flags;
#define FF_DECODE_ERROR_INVALID_BITSTREAM   1
#define FF_DECODE_ERROR_MISSING_REFERENCE   2
 
    /**
     * number of audio channels, only used for audio.
     * Code outside libavcodec should access this field using:
     * av_frame_get_channels(frame)
     * - encoding: unused
     * - decoding: Read by user.
     */
    int64_t channels;
 AVFrame;

2.AVFrame 重点字段

AVFrame结构体一般用于存储原始数据(即非压缩数据,例如对视频来说是YUV,RGB,对音频来说是PCM),此外还包含了一些相关的信息。比如说,解码的时候存储了宏块类型表,QP表,运动矢量表等数据。编码的时候也存储了相关的数据。因此在使用FFMPEG进行码流分析的时候,AVFrame是一个很重要的结构体。

下面看几个主要变量的作用(在这里考虑解码的情况):

uint8_t *data[AV_NUM_DATA_POINTERS]:解码后原始数据(对视频来说是YUV,RGB,对音频来说是PCM)

int linesize[AV_NUM_DATA_POINTERS]:data中“一行”数据的大小。注意:未必等于图像的宽,一般大于图像的宽。

int width, height:视频帧宽和高(1920x1080,1280x720...)

int nb_samples:音频的一个AVFrame中可能包含多个音频帧,在此标记包含了几个

int format:解码后原始数据类型(YUV420,YUV422,RGB24...)

int key_frame:是否是关键帧

enum AVPictureType pict_type:帧类型(I,B,P...)

AVRational sample_aspect_ratio:宽高比(16:9,4:3...)

int64_t pts:显示时间戳

int coded_picture_number:编码帧序号

int display_picture_number:显示帧序号

int8_t *qscale_table:QP表

uint8_t *mbskip_table:跳过宏块表

int16_t (*motion_val[2])[2]:运动矢量表

uint32_t *mb_type:宏块类型表

short *dct_coeff:DCT系数,这个没有提取过

int8_t *ref_index[2]:运动估计参考帧列表(貌似H.264这种比较新的标准才会涉及到多参考帧)

int interlaced_frame:是否是隔行扫描

uint8_t motion_subsample_log2:一个宏块中的运动矢量采样个数,取log的

其他的变量不再一一列举,源代码中都有详细的说明。在这里重点分析一下几个需要一定的理解的变量:

data[]

对于packed格式的数据(例如RGB24),会存到data[0]里面。

对于planar格式的数据(例如YUV420P),则会分开成data[0],data[1],data[2]…(YUV420P中data[0]存Y,data[1]存U,data[2]存V)

pict_type

包含以下类型:

enum AVPictureType 
    AV_PICTURE_TYPE_NONE = 0, ///< Undefined
    AV_PICTURE_TYPE_I,     ///< Intra
    AV_PICTURE_TYPE_P,     ///< Predicted
    AV_PICTURE_TYPE_B,     ///< Bi-dir predicted
    AV_PICTURE_TYPE_S,     ///< S(GMC)-VOP MPEG4
    AV_PICTURE_TYPE_SI,    ///< Switching Intra
    AV_PICTURE_TYPE_SP,    ///< Switching Predicted
    AV_PICTURE_TYPE_BI,    ///< BI type
;

sample_aspect_ratio

宽高比是一个分数,FFMPEG中用AVRational表达分数:

/**
 * rational number numerator/denominator
 */
typedef struct AVRational
    int num; ///< numerator
    int den; ///< denominator
 AVRational;

qscale_table

QP表指向一块内存,里面存储的是每个宏块的QP值。宏块的标号是从左往右,一行一行的来的。每个宏块对应1个QP。

qscale_table[0]就是第1行第1列宏块的QP值;qscale_table[1]就是第1行第2列宏块的QP值;qscale_table[2]就是第1行第3列宏块的QP值。以此类推…

宏块的个数用下式计算:

注:宏块大小是16x16的。

每行宏块数:

int mb_stride = pCodecCtx->width/16+1

宏块的总数:

int mb_sum = ((pCodecCtx->height+15)>>4)*(pCodecCtx->width/16+1)

五丶FFmpeg 结构体: AVCodec 分析

在上文我们学习了AVFrame结构体的相关内容。本文,我们将讲述一下AVCodec。

AVCodec是存储编解码器信息的结构体。下面我们来分析一下该结构体里重要变量的含义和作用。

1.源码整理

首先我们先看一下结构体AVCodec的定义的结构体源码(位于libavcodec/avcodec.h):

View Code

2.AVCodec 重点字段

下面说一下最主要的几个变量:

const char *name:编解码器的名字,比较短

const char *long_name:编解码器的名字,全称,比较长

enum AVMediaType type:指明了类型,是视频,音频,还是字幕

enum AVCodecID id:ID,不重复

const AVRational *supported_framerates:支持的帧率(仅视频)

const enum AVPixelFormat *pix_fmts:支持的像素格式(仅视频)

const int *supported_samplerates:支持的采样率(仅音频)

const enum AVSampleFormat *sample_fmts:支持的采样格式(仅音频)

const uint64_t *channel_layouts:支持的声道数(仅音频)

int priv_data_size:私有数据的大小

详细介绍几个变量:

enum AVMediaType type

AVMediaType定义如下:

enum AVMediaType 
    AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATA
    AVMEDIA_TYPE_VIDEO,
    AVMEDIA_TYPE_AUDIO,
    AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuous
    AVMEDIA_TYPE_SUBTITLE,
    AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparse
    AVMEDIA_TYPE_NB
;

enum AVCodecID id

AVCodecID定义如下:

enum AVCodecID 
    AV_CODEC_ID_NONE,
 
    /* video codecs */
    AV_CODEC_ID_MPEG1VIDEO,
    AV_CODEC_ID_MPEG2VIDEO, ///< preferred ID for MPEG-1/2 video decoding
    AV_CODEC_ID_MPEG2VIDEO_XVMC,
    AV_CODEC_ID_H261,
    AV_CODEC_ID_H263,
    AV_CODEC_ID_RV10,
    AV_CODEC_ID_RV20,
    AV_CODEC_ID_MJPEG,
    AV_CODEC_ID_MJPEGB,
    AV_CODEC_ID_LJPEG,
    AV_CODEC_ID_SP5X,
    AV_CODEC_ID_JPEGLS,
    AV_CODEC_ID_MPEG4,
    AV_CODEC_ID_RAWVIDEO,
    AV_CODEC_ID_MSMPEG4V1,
    AV_CODEC_ID_MSMPEG4V2,
    AV_CODEC_ID_MSMPEG4V3,
    AV_CODEC_ID_WMV1,
    AV_CODEC_ID_WMV2,
    AV_CODEC_ID_H263P,
    AV_CODEC_ID_H263I,
    AV_CODEC_ID_FLV1,
    AV_CODEC_ID_SVQ1,
    AV_CODEC_ID_SVQ3,
    AV_CODEC_ID_DVVIDEO,
    AV_CODEC_ID_HUFFYUV,
    AV_CODEC_ID_CYUV,
    AV_CODEC_ID_H264,
    ...

const enum AVPixelFormat *pix_fmts

AVPixelFormat定义如下:

enum AVPixelFormat 
    AV_PIX_FMT_NONE = -1,
    AV_PIX_FMT_YUV420P,   ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
    AV_PIX_FMT_YUYV422,   ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
    AV_PIX_FMT_RGB24,     ///< packed RGB 8:8:8, 24bpp, RGBRGB...
    AV_PIX_FMT_BGR24,     ///< packed RGB 8:8:8, 24bpp, BGRBGR...
    AV_PIX_FMT_YUV422P,   ///< planar YUV 4:2:2, 16bpp, (1 Cr & Cb sample per 2x1 Y samples)
    AV_PIX_FMT_YUV444P,   ///< planar YUV 4:4:4, 24bpp, (1 Cr & Cb sample per 1x1 Y samples)
    AV_PIX_FMT_YUV410P,   ///< planar YUV 4:1:0,  9bpp, (1 Cr & Cb sample per 4x4 Y samples)
    AV_PIX_FMT_YUV411P,   ///< planar YUV 4:1:1, 12bpp, (1 Cr & Cb sample per 4x1 Y samples)
    AV_PIX_FMT_GRAY8,     ///<        Y        ,  8bpp
    AV_PIX_FMT_MONOWHITE, ///<        Y        ,  1bpp, 0 is white, 1 is black, in each byte pixels are ordered from the msb to the lsb
    AV_PIX_FMT_MONOBLACK, ///<        Y        ,  1bpp, 0 is black, 1 is white, in each byte pixels are ordered from the msb to the lsb
    AV_PIX_FMT_PAL8,      ///< 8 bit with PIX_FMT_RGB32 palette
    AV_PIX_FMT_YUVJ420P,  ///< planar YUV 4:2:0, 12bpp, full scale (JPEG), deprecated in favor of PIX_FMT_YUV420P and setting color_range
    AV_PIX_FMT_YUVJ422P,  ///< planar YUV 4:2:2, 16bpp, full scale (JPEG), deprecated in favor of PIX_FMT_YUV422P and setting color_range
    AV_PIX_FMT_YUVJ444P,  ///< planar YUV 4:4:4, 24bpp, full scale (JPEG), deprecated in favor of PIX_FMT_YUV444P and setting color_range
    AV_PIX_FMT_XVMC_MPEG2_MC,///< XVideo Motion Acceleration via common packet passing
    AV_PIX_FMT_XVMC_MPEG2_IDCT,
    ...(代码太长,略)

const enum AVSampleFormat *sample_fmts

enum AVSampleFormat 
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double
 
    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
 
    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
;

每一个编解码器对应一个该结构体,查看一下ffmpeg的源代码,我们可以看一下H.264解码器的结构体如下所示(h264.c):

AVCodec ff_h264_decoder = 
    .name           = "h264",
    .type           = AVMEDIA_TYPE_VIDEO,
    .id             = CODEC_ID_H264,
    .priv_data_size = sizeof(H264Context),
    .init           = ff_h264_decode_init,
    .close          = ff_h264_decode_end,
    .decode         = decode_frame,
    .capabilities   = /*CODEC_CAP_DRAW_HORIZ_BAND |*/ CODEC_CAP_DR1 | CODEC_CAP_DELAY |
                      CODEC_CAP_SLICE_THREADS | CODEC_CAP_FRAME_THREADS,
    .flush= flush_dpb,
    .long_name = NULL_IF_CONFIG_SMALL("H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"),
    .init_thread_copy      = ONLY_IF_THREADS_ENABLED(decode_init_thread_copy),
    .update_thread_context = ONLY_IF_THREADS_ENABLED(decode_update_thread_context),
    .profiles = NULL_IF_CONFIG_SMALL(profiles),
    .priv_class     = &h264_class,
;

JPEG2000解码器结构体(j2kdec.c):

AVCodec ff_jpeg2000_decoder = 
    .name           = "j2k",
    .type           = AVMEDIA_TYPE_VIDEO,
    .id             = CODEC_ID_JPEG2000,
    .priv_data_size = sizeof(J2kDecoderContext),
    .init           = j2kdec_init,
    .close          = decode_end,
    .decode         = decode_frame,
    .capabilities = CODEC_CAP_EXPERIMENTAL,
    .long_name = NULL_IF_CONFIG_SMALL("JPEG 2000"),
    .pix_fmts =
        (const enum PixelFormat[]) PIX_FMT_GRAY8, PIX_FMT_RGB24, PIX_FMT_NONE
;

下面简单介绍一下遍历ffmpeg中的解码器信息的方法(这些解码器以一个链表的形式存储):

1.注册所有编解码器:av_register_all();

2.声明一个AVCodec类型的指针,比如说AVCodec* first_c;

3.调用av_codec_next()函数,即可获得指向链表下一个解码器的指针,循环往复可以获得所有解码器的信息。注意,如果想要获得指向第一个解码器的指针,则需要将该函数的参数设置为NULL。

六丶FFmpeg 结构体: AVCodecContext 分析

在上文我们学习了AVCodec结构体的相关内容。本文,我们将讲述一下AVCodecContext。

AVCodecContext是包含变量较多的结构体(感觉差不多是变量最多的结构体)。下面我们来分析一下该结构体里重要变量的含义和作用。

1.源码整理

首先我们先看一下结构体AVCodecContext的定义的结构体源码(位于libavcodec/avcodec.h):

View Code

2.AVCodecContext 重点字段

下面挑一些关键的变量来看看(这里只考虑解码):

enum AVMediaType codec_type:编解码器的类型(视频,音频...)

struct AVCodec  *codec:采用的解码器AVCodec(H.264,MPEG2...)

int bit_rate:平均比特率

uint8_t *extradata; int extradata_size:针对特定编码器包含的附加信息(例如对于H.264解码器来说,存储SPS,PPS等)

AVRational time_base:根据该参数,可以把PTS转化为实际的时间(单位为秒s)

int width, height:如果是视频的话,代表宽和高

int refs:运动估计参考帧的个数(H.264的话会有多帧,MPEG2这类的一般就没有了)

int sample_rate:采样率(音频)

int channels:声道数(音频)

enum AVSampleFormat sample_fmt:采样格式

int profile:型(H.264里面就有,其他编码标准应该也有)

int level:级(和profile差不太多)

在这里需要注意:AVCodecContext中很多的参数是编码的时候使用的,而不是解码的时候使用的。

其实这些参数都比较容易理解。就不多费篇幅了。在这里看一下以下几个参数:

codec_type

编解码器类型有以下几种:

enum AVMediaType 
    AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATA
    AVMEDIA_TYPE_VIDEO,
    AVMEDIA_TYPE_AUDIO,
    AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuous
    AVMEDIA_TYPE_SUBTITLE,
    AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparse
    AVMEDIA_TYPE_NB
;

sample_fmt

在FFMPEG中音频采样格式有以下几种:

enum AVSampleFormat 
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double
 
    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
 
    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
;

profile

在FFMPEG中型有以下几种,可以看出AAC,MPEG2,H.264,VC-1,MPEG4都有型的概念。

####define FF_PROFILE_UNKNOWN -99
####define FF_PROFILE_RESERVED -100
 
####define FF_PROFILE_AAC_MAIN 0
####define FF_PROFILE_AAC_LOW  1
####define FF_PROFILE_AAC_SSR  2
####define FF_PROFILE_AAC_LTP  3
####define FF_PROFILE_AAC_HE   4
####define FF_PROFILE_AAC_HE_V2 28
####define FF_PROFILE_AAC_LD   22
####define FF_PROFILE_AAC_ELD  38
 
####define FF_PROFILE_DTS         20
####define FF_PROFILE_DTS_ES      30
####define FF_PROFILE_DTS_96_24   40
####define FF_PROFILE_DTS_HD_HRA  50
####define FF_PROFILE_DTS_HD_MA   60
 
####define FF_PROFILE_MPEG2_422    0
####define FF_PROFILE_MPEG2_HIGH   1
####define FF_PROFILE_MPEG2_SS     2
####define FF_PROFILE_MPEG2_SNR_SCALABLE  3
####define FF_PROFILE_MPEG2_MAIN   4
####define FF_PROFILE_MPEG2_SIMPLE 5
 
####define FF_PROFILE_H264_CONSTRAINED  (1<<9)  // 8+1; constraint_set1_flag
####define FF_PROFILE_H264_INTRA        (1<<11) // 8+3; constraint_set3_flag
 
####define FF_PROFILE_H264_BASELINE             66
####define FF_PROFILE_H264_CONSTRAINED_BASELINE (66|FF_PROFILE_H264_CONSTRAINED)
####define FF_PROFILE_H264_MAIN                 77
####define FF_PROFILE_H264_EXTENDED             88
####define FF_PROFILE_H264_HIGH                 100
####define FF_PROFILE_H264_HIGH_10              110
####define FF_PROFILE_H264_HIGH_10_INTRA        (110|FF_PROFILE_H264_INTRA)
####define FF_PROFILE_H264_HIGH_422             122
####define FF_PROFILE_H264_HIGH_422_INTRA       (122|FF_PROFILE_H264_INTRA)
####define FF_PROFILE_H264_HIGH_444             144
####define FF_PROFILE_H264_HIGH_444_PREDICTIVE  244
####define FF_PROFILE_H264_HIGH_444_INTRA       (244|FF_PROFILE_H264_INTRA)
####define FF_PROFILE_H264_CAVLC_444            44
 
####define FF_PROFILE_VC1_SIMPLE   0
####define FF_PROFILE_VC1_MAIN     1
####define FF_PROFILE_VC1_COMPLEX  2
####define FF_PROFILE_VC1_ADVANCED 3
 
####define FF_PROFILE_MPEG4_SIMPLE                     0
####define FF_PROFILE_MPEG4_SIMPLE_SCALABLE            1
####define FF_PROFILE_MPEG4_CORE                       2
####define FF_PROFILE_MPEG4_MAIN                       3
####define FF_PROFILE_MPEG4_N_BIT                      4
####define FF_PROFILE_MPEG4_SCALABLE_TEXTURE           5
####define FF_PROFILE_MPEG4_SIMPLE_FACE_ANIMATION      6
####define FF_PROFILE_MPEG4_BASIC_ANIMATED_TEXTURE     7
####define FF_PROFILE_MPEG4_HYBRID                     8
####define FF_PROFILE_MPEG4_ADVANCED_REAL_TIME         9
####define FF_PROFILE_MPEG4_CORE_SCALABLE             10
####define FF_PROFILE_MPEG4_ADVANCED_CODING           11
####define FF_PROFILE_MPEG4_ADVANCED_CORE             12
####define FF_PROFILE_MPEG4_ADVANCED_SCALABLE_TEXTURE 13
####define FF_PROFILE_MPEG4_SIMPLE_STUDIO             14
####define FF_PROFILE_MPEG4_ADVANCED_SIMPLE           15

七丶FFmpeg 结构体: AVIOContext 分析

在上文我们学习了AVCodec结构体的相关内容。本文,我们将讲述一下AVIOContext。

AVIOContext是FFMPEG管理输入输出数据的结构体。下面我们来分析一下该结构体里重要变量的含义和作用。

1.源码整理

首先我们先看一下结构体AVIOContext的定义的结构体源码(位于libavformat/avio.h):

 /**
 * Bytestream IO Context.
 * New fields can be added to the end with minor version bumps.
 * Removal, reordering and changes to existing fields require a major
 * version bump.
 * sizeof(AVIOContext) must not be used outside libav*.
 *
 * @note None of the function pointers in AVIOContext should be called
 *       directly, they should only be set by the client application
 *       when implementing custom I/O. Normally these are set to the
 *       function pointers specified in avio_alloc_context()
 */
typedef struct 
    /**
     * A class for private options.
     *
     * If this AVIOContext is created by avio_open2(), av_class is set and
     * passes the options down to protocols.
     *
     * If this AVIOContext is manually allocated, then av_class may be set by
     * the caller.
     *
     * warning -- this field can be NULL, be sure to not pass this AVIOContext
     * to any av_opt_* functions in that case.
     */
    AVClass *av_class;
    unsigned char *buffer;  /**< Start of the buffer. */
    int buffer_size;        /**< Maximum buffer size */
    unsigned char *buf_ptr; /**< Current position in the buffer */
    unsigned char *buf_end; /**< End of the data, may be less than
                                 buffer+buffer_size if the read function returned
                                 less data than requested, e.g. for streams where
                                 no more data has been received yet. */
    void *opaque;           /**< A private pointer, passed to the read/write/seek/...
                                 functions. */
    int (*read_packet)(void *opaque, uint8_t *buf, int buf_size);
    int (*write_packet)(void *opaque, uint8_t *buf, int buf_size);
    int64_t (*seek)(void *opaque, int64_t offset, int whence);
    int64_t pos;            /**< position in the file of the current buffer */
    int must_flush;         /**< true if the next seek should flush */
    int eof_reached;        /**< true if eof reached */
    int write_flag;         /**< true if open for writing */
    int max_packet_size;
    unsigned long checksum;
    unsigned char *checksum_ptr;
    unsigned long (*update_checksum)(unsigned long checksum, const uint8_t *buf, unsigned int size);
    int error;              /**< contains the error code or 0 if no error happened */
    /**
     * Pause or resume playback for network streaming protocols - e.g. MMS.
     */
    int (*read_pause)(void *opaque, int pause);
    /**
     * Seek to a given timestamp in stream with the specified stream_index.
     * Needed for some network streaming protocols which don't support seeking
     * to byte position.
     */
    int64_t (*read_seek)(void *opaque, int stream_index,
                         int64_t timestamp, int flags);
    /**
     * A combination of AVIO_SEEKABLE_ flags or 0 when the stream is not seekable.
     */
    int seekable;
 
    /**
     * max filesize, used to limit allocations
     * This field is internal to libavformat and access from outside is not allowed.
     */
     int64_t maxsize;
 AVIOContext;

2.AVIOContext 重点字段

AVIOContext中有以下几个变量比较重要:

unsigned char *buffer:缓存开始位置

int buffer_size:缓存大小(默认32768)

unsigned char *buf_ptr:当前指针读取到的位置

unsigned char *buf_end:缓存结束的位置

void *opaque:URLContext结构体

在解码的情况下,buffer用于存储ffmpeg读入的数据。例如打开一个视频文件的时候,先把数据从硬盘读入buffer,然后在送给解码器用于解码。

其中opaque指向了URLContext。注意,这个结构体并不在FFMPEG提供的头文件中,而是在FFMPEG的源代码中。从FFMPEG源代码中翻出的定义如下所示:

typedef struct URLContext 
    const AVClass *av_class; ///< information for av_log(). Set by url_open().
    struct URLProtocol *prot;
    int flags;
    int is_streamed;  /**< true if streamed (no seek possible), default = false */
    int max_packet_size;  /**< if non zero, the stream is packetized with this max packet size */
    void *priv_data;
    char *filename; /**< specified URL */
    int is_connected;
    AVIOInterruptCB interrupt_callback;
 URLContext;

URLContext结构体中还有一个结构体URLProtocol。注:每种协议(rtp,rtmp,file等)对应一个URLProtocol。这个结构体也不在FFMPEG提供的头文件中。从FFMPEG源代码中翻出其的定义:

typedef struct URLProtocol 
    const char *name;
    int (*url_open)(URLContext *h, const char *url, int flags);
    int (*url_read)(URLContext *h, unsigned char *buf, int size);
    int (*url_write)(URLContext *h, const unsigned char *buf, int size);
    int64_t (*url_seek)(URLContext *h, int64_t pos, int whence);
    int (*url_close)(URLContext *h);
    struct URLProtocol *next;
    int (*url_read_pause)(URLContext *h, int pause);
    int64_t (*url_read_seek)(URLContext *h, int stream_index,
        int64_t timestamp, int flags);
    int (*url_get_file_handle)(URLContext *h);
    int priv_data_size;
    const AVClass *priv_data_class;
    int flags;
    int (*url_check)(URLContext *h, int mask);
 URLProtocol;

在这个结构体中,除了一些回调函数接口之外,有一个变量const char *name,该变量存储了协议的名称。每一种输入协议都对应这样一个结构体。

比如说,文件协议中代码如下(file.c):

URLProtocol ff_file_protocol = 
    .name                = "file",
    .url_open            = file_open,
    .url_read            = file_read,
    .url_write           = file_write,
    .url_seek            = file_seek,
    .url_close           = file_close,
    .url_get_file_handle = file_get_handle,
    .url_check           = file_check,
;

libRTMP中代码如下(libRTMP.c):

URLProtocol ff_rtmp_protocol = 
    .name                = "rtmp",
    .url_open            = rtmp_open,
    .url_read            = rtmp_read,
    .url_write           = rtmp_write,
    .url_close           = rtmp_close,
    .url_read_pause      = rtmp_read_pause,
    .url_read_seek       = rtmp_read_seek,
    .url_get_file_handle = rtmp_get_file_handle,
 

以上是关于音视频大合集第四篇;走近音视频的主要内容,如果未能解决你的问题,请参考以下文章

自动驾驶资料合集:视频书籍与开源项目

大数据视频教程合集

音视频大合集第二篇,初探音视频

音视频大合集第三篇;深入探索

音视频大合集最终篇;学废了

第四篇 基本类型转换