如何理解printf变参函数的实现

Posted lida2003

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何理解printf变参函数的实现相关的知识,希望对你有一定的参考价值。

如何理解printf变参函数的实现

1. printf函数变参的问题

距离2011年写Linux应用程序之Helloworld入门都10年多了。

相信只要有点C语言基础或者瞄过一眼C语言的朋友,对于这段代码是非常之熟悉,估计能倒背如流吧。

C语言教学的时候,有一个章节是专门学习如何声明函数,参数(传值,传址),返回变量等。

学完那个章节,不知道是否有同学问过:这个printf是怎么声明和定义的嘛?

2. printf函数定义和解释

2.1 printf函数定义

int printf(const char *format, ...);

2.2 Linux Man page解释

$ man printf

PRINTF(1)                                                                  User Commands                                                                  PRINTF(1)

NAME
       printf - format and print data

SYNOPSIS
       printf FORMAT [ARGUMENT]...
       printf OPTION

DESCRIPTION
       Print ARGUMENT(s) according to FORMAT, or execute according to OPTION:

       --help display this help and exit

       --version
              output version information and exit

       FORMAT controls the output as in C printf.  Interpreted sequences are:

       \\"     double quote

       \\\\     backslash

       \\a     alert (BEL)

       \\b     backspace

       \\c     produce no further output

       \\e     escape

       \\f     form feed

       \\n     new line

       \\r     carriage return

       \\t     horizontal tab

       \\v     vertical tab

       \\NNN   byte with octal value NNN (1 to 3 digits)

       \\xHH   byte with hexadecimal value HH (1 to 2 digits)

       \\uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)

       \\UHHHHHHHH
              Unicode character with hex value HHHHHHHH (8 digits)

       %%     a single %

       %b     ARGUMENT as a string with '\\' escapes interpreted, except that octal escapes are of the form \\0 or \\0NNN

       %q     ARGUMENT is printed in a format that can be reused as shell input, escaping non-printable characters with the proposed POSIX $'' syntax.

       and all C format specifications ending with one of diouxXfeEgGcs, with ARGUMENTs converted to proper type first.  Variable widths are handled.

       NOTE:  your  shell  may have its own version of printf, which usually supersedes the version described here.  Please refer to your shell's documentation for
       details about the options it supports.

AUTHOR
       Written by David MacKenzie.

REPORTING BUGS
       GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
       Report printf translation bugs to <https://translationproject.org/team/>

COPYRIGHT
       Copyright © 2018 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.

SEE ALSO
       printf(3)

       Full documentation at: <https://www.gnu.org/software/coreutils/printf>
       or available locally via: info '(coreutils) printf invocation'

GNU coreutils 8.30                                                         September 2019                                                                 PRINTF(1)

2.3 标准C库解释

【1】cplusplus.com/reference/cstdio
【2】tutorialspoint.com/c_standard_library

int printf ( const char * format, ... );

Print formatted data to stdout
Writes the C string pointed by format to the standard output (stdout). If format includes format specifiers (subsequences beginning with %), the additional arguments following format are formatted and inserted in the resulting string replacing their respective specifiers.

3. C语言函数定义

回顾下C语言教程:C Tutorial找到函数定义章节,看下定义:

注:教程其实很多,讲的也都差不多,为什么选择英文,英文定义和解释会比大部分国内砖家要来的贴切,毕竟计算机编程这个是人家发明的。

Parameters − A parameter is like a placeholder. When a function is invoked, you pass a value to the parameter. This value is referred to as actual parameter or argument. The parameter list refers to the type, order, and number of the parameters of a function. Parameters are optional; that is, a function may contain no parameters.

从字面意思看参数定义“actual parameter or argument”是实参,并没有说什么变参的。

这里我们是不是就感觉到有点不太一样了。printf函数里面的变参到底是不是和这个函数定义的实参有矛盾?

因为通常printf函数的这三个点(第二个参数)的含义貌似我们都叫变参,所以问题焦点就是这个变参到底是怎么理解,并且怎么通过C语言来理解或者解释?

int printf(const char *format, ...);

4. printf函数的出处

Linux应用程序之Helloworld入门的二进制,看下主程序里面唯一的printf来自哪个地方。

$ ldd hello_world
        linux-vdso.so.1 (0x00007ffeb5b65000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe405634000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe405853000)

linux-vdso.so: 这是(virtual dynamic shared object)动态库。
ld-linux-x86-64.so:这是(dynamic linker/loader)动态库。

到这里可以看出printf是来自于libc.so,而我们是ubuntu 20.04 LTS系统,所以主要用的libc.so.6

In the early 1990s, the developers of the Linux kernel forked glibc. Their fork, called “Linux libc”, was maintained separately for years and released versions 2 through 5.
When FSF released glibc 2.0 in January 1997, … At this point, the Linux kernel developers discontinued their fork and returned to using FSF’s glibc.[6]
The last used version of Linux libc used the internal name (soname) libc.so.5. Following on from this, glibc 2.x on Linux uses the soname libc.so.6

不同的linux版本上用的libc来源都不太一样,x86系统主要就是glibc了。有些嵌入式系统就不太会用glibc,而使用picolibc、uclibc、klibc等。

5. printf函数的代码实现

鉴于我们是ubuntu20.04的系统,使用的是glibc,官方最新的是2.35版本

初步分析了下代码,我们可以看到主要问题是集中va_arg/va_start/va_end/__vfprintf_internal这几个定义上(也许是宏,也许是函数)。

\\glibc-2.35\\stdio-common\\printf.c
#include <libioP.h>
#include <stdarg.h>
#include <stdio.h>

#undef printf

/* Write formatted output to stdout from the format string FORMAT.  */
/* VARARGS1 */
int
__printf (const char *format, ...)

  va_list arg;
  int done;

  va_start (arg, format);
  done = __vfprintf_internal (stdout, format, arg, 0);
  va_end (arg);

  return done;


#undef _IO_printf
ldbl_strong_alias (__printf, printf);
ldbl_strong_alias (__printf, _IO_printf);

我们发现了以下线索:

  1. stdarg.h 下的宏定义:va_arg/va_start/va_end
  2. libioP.h 下外部函数定义:__vfprintf_internal
  3. stdarg.h文件不在glibc-2.35代码中,我们需要到GCC工程里面进一步了解。

5.1 __vfprintf_internal

进一步在glibc 2.35版本中查找定义,vfprintf 实现了__vfprintf_internal,其主要目的是将字符串通过IO输出(具体代码比较长,想阅读代码的朋友,可以跟下去),但是这里的函数定义已经不是变参,满足C语言课本教程定义。

分析到这里,主要变参部分应该在gcc代码头文件stdarg.h下va_list/va_start/va_end宏定义。

\\stdio-common\\vfprintf-internal.c

# define vfprintf	__vfprintf_internal

int
vfprintf (FILE *s, const CHAR_T *format, va_list ap, unsigned int mode_flags)

  /* The character used as thousands separator.  */
  THOUSANDS_SEP_T thousands_sep = 0;

  /* The string describing the size of groups of digits.  */
  const char *grouping;

  /* Place to accumulate the result.  */
  int done;

  /* Current character in format string.  */
  const UCHAR_T *f;

  /* End of leading constant string.  */
  const UCHAR_T *lead_str_end;

  /* Points to next format specifier.  */
  const UCHAR_T *end_of_spec;

  /* Buffer intermediate results.  */
  CHAR_T work_buffer[WORK_BUFFER_SIZE];
  CHAR_T *workend;

  /* We have to save the original argument pointer.  */
  va_list ap_save;

  /* Count number of specifiers we already processed.  */
  int nspecs_done;

  /* For the %m format we may need the current `errno' value.  */
  int save_errno = errno;

  /* 1 if format is in read-only memory, -1 if it is in writable memory,
     0 if unknown.  */
  int readonly_format = 0;

  /* Orient the stream.  */
#ifdef ORIENT
  ORIENT;
#endif

  /* Sanity check of arguments.  */
  ARGCHECK (s, format);

#ifdef ORIENT
  /* Check for correct orientation.  */
  if (_IO_vtable_offset (s) == 0
      && _IO_fwide (s, sizeof (CHAR_T) == 1 ? -1 : 1)
      != (sizeof (CHAR_T) == 1 ? -1 : 1))
    /* The stream is already oriented otherwise.  */
    return EOF;
#endif

  if (UNBUFFERED_P (s))
    /* Use a helper function which will allocate a local temporary buffer
       for the stream and then call us again.  */
    return buffered_vfprintf (s, format, ap, mode_flags);

  /* Initialize local variables.  */
  done = 0;
  grouping = (const char *) -1;
#ifdef __va_copy
  /* This macro will be available soon in gcc's <stdarg.h>.  We need it
     since on some systems `va_list' is not an integral type.  */
  __va_copy (ap_save, ap);
#else
  ap_save = ap;
#endif
  nspecs_done = 0;

#ifdef COMPILE_WPRINTF
  /* Find the first format specifier.  */
  f = lead_str_end = __find_specwc ((const UCHAR_T *) format);
#else
  /* Find the first format specifier.  */
  f = lead_str_end = __find_specmb ((const UCHAR_T *) format);
#endif

  /* Lock stream.  */
  _IO_cleanup_region_start ((void (*) (void *)) &_IO_funlockfile, s);
  _IO_flockfile (s);

  /* Write the literal text before the first format.  */
  outstring ((const UCHAR_T *) format,
	     lead_str_end - (const UCHAR_T *) format);

  /* If we only have to print a simple string, return now.  */
  if (*f == L_('\\0'))
    goto all_done;

  /* Use the slow path in case any printf handler is registered.  */
  if (__glibc_unlikely (__printf_function_table != NULL
			|| __printf_modifier_table != NULL
			|| __printf_va_arg_table != NULL))
    goto do_positional;

  /* Process whole format string.  */
  do
    
      STEP0_3_TABLE;
      STEP4_TABLE;

      int is_negative;	/* Flag for negative number.  */
      union
      
	unsigned long long int longlong;
	unsigned long int word;
       number;
      int base;
      union printf_arg the_arg;
      CHAR_T *string;	/* Pointer to argument string.  */
      int alt = 0;	/* Alternate format.  */
      int space = 0;	/* Use space prefix if no sign is needed.  */
      int left = 0;	/* Left-justify output.  */
      int showsign = 0;	/* Always begin with plus or minus sign.  */
      int group = 0;	/* Print numbers according grouping rules.  */
      /* Argument is long double/long long int.  Only used if
	 double/long double or long int/long long int are distinct.  */
      int is_long_double __attribute__ ((unused)) = 0;
      int is_short = 0;	/* Argument is short int.  */
      int is_long = 0;	/* Argument is long int.  */
      int is_char = 0;	/* Argument is promoted (unsigned) char.  */
      int width = 0;	/* Width of output; 0 means none specified.  */
      int prec = -1;	/* Precision of output; -1 means none specified.  */
      /* This flag is set by the 'I' modifier and selects the use of the
	 `outdigits' as determined by the current locale.  */
      int use_outdigits = 0;
      UCHAR_T pad = L_(' ');/* Padding character.  */
      CHAR_T spec;

      workend = work_buffer + WORK_BUFFER_SIZE;

      /* Get current character in format string.  */
      JUMP (*++f, step0_jumps);

      /* ' ' flag.  */
    LABEL (flag_space):
      space = 1;
      JUMP (*++f, step0_jumps);

      /* '+' flag.  */
    LABEL (flag_plus):
      showsign = 1;
      JUMP (*++f, step0_jumps);

      /* The '-' flag.  */
    LABEL (flag_minus):
      left = 1;
      pad = L_(' ');
      JUMP (*++f, step0_jumps);

      /* The '#' flag.  */
    LABEL (flag_hash):
      alt = 1;
      JUMP (*++f, step0_jumps);

      /* The '0' flag.  */
    LABEL (flag_zero):
      if (!left)
	pad = L_('0');
      JUMP (*++f, step0_jumps);

      /* The '\\'' flag.  */
    LABEL (flag_quote):
      group = 1;

      if (grouping == (const char *) -1)
	
#ifdef COMPILE_WPRINTF
	  thousands_sep = _NL_CURRENT_WORD (LC_NUMERIC,
					    _NL_NUMERIC_THOUSANDS_SEP_WC);
#else
	  thousands_sep = _NL_CURRENT (LC_NUMERIC, THOUSANDS_SEP);
#endif

	  grouping = _NL_CURRENT (LC_NUMERIC, GROUPING);
	  if (*grouping == '\\0' || *grouping == CHAR_MAX
#ifdef COMPILE_WPRINTF
	      || thousands_sep == L'\\0'
#else
	      || *thousands_sep == '\\0'
#endif
	      )
	    grouping = NULL;
	
      JUMP (*++f, step0_jumps);

    LABEL (flag_i18n):
      use_outdigits = 1;
      JUMP (*++f, step0_jumps);

      /* Get width from argument.  */
    LABEL (width_asterics):
      
	const UCHAR_T *tmp;	/* Temporary value.  */

	tmp = ++f;
	if (ISDIGIT (*tmp))
	  
	    int pos = read_int (&tmp);

	    if (pos == -1)
	      
		__set_errno (EOVERFLOW);
		done = -1;
		goto all_done;
	      

	    if (pos && *tmp == L_('$'))
	      /* The width comes from a positional parameter.  */
	      goto do_positional;
	  
	width = va_arg (ap, int);

	/* Negative width means left justified.  */
	if (width < 0)
	  
	    width = -width;
	    pad = L_(' ');
	    left = 1;
	  
      
      JUMP (*f, step1_jumps);

      /* Given width in format string.  */
    LABEL (width):
      width = read_int (&f);

      if (__glibc_unlikely (width == -1))
	
	  __set_errno (EOVERFLOW);
	  done = -1;
	  goto all_done;
	

      if (*f == L_('$'))
	/* Oh, oh.  The argument comes from a positional parameter.  */
	goto do_positional;
      JUMP (*f, step1_jumps);

    LABEL (precision):
      ++f;
      if (*f == L_('*'))
	
	  const UCHAR_T *tmp;	/* Temporary value.  */

	  tmp = ++f;
	  if (ISDIGIT (*tmp))
	    
	      int pos = read_int (&tmp);

	      if (pos == -1)
		
		  __set_errno (EOVERFLOW);
		  done = -1;
		  goto all_done;
		

	      if (pos && *tmp == L_('$'))
		/* The precision comes from a positional parameter.  */
		goto do_positional;
	    
	  prec = va_arg (ap, int);

	  /* If the precision is negative the precision is omitted.  */
	  if (prec < 0)
	    prec = -1;
	
      else if (ISDIGIT (*f))
	
	  prec = read_int (&f);

	  /* The precision was specified in this case as an extremely
	     large positive value.  */
	  if (prec == -1)
	    
	      __set_errno (EOVERFLOW);
	      done = -1;
	      goto all_done;
	    
	
      else
	prec = 0;
      JUMP (*f, step2_jumps);

      /* Process 'h' modifier.  There might another 'h' following.  */
    LABEL (mod_half):
      is_short = 1;
      JUMP (*++f, step3a_jumps);

      /* Process 'hh' modifier.  */
    LABEL (mod_halfhalf):
      is_short = 0;
      is_char = 1;
      JUMP (*++f, step4_jumps);

      /* Process 'l' modifier.  There might another 'l' following.  */
    LABEL (mod_long):
      is_long = 1;
      JUMP (*++f, step3b_jumps);

      /* Process 'L', 'q', or 'll' modifier.  No other modifier is
	 allowed to follow.  */
    LABEL (mod_longlong):
      is_long_double = 1;
      is_long = 1;
      JUMP (*++f, step4_jumps);

    LABEL (mod_size_t):
      is_long_double = sizeof (size_t) > sizeof (unsigned long int);
      is_long = sizeof (size_t) > sizeof (unsigned int);
      JUMP (*++f, step4_jumps);

    LABEL (mod_ptrdiff_t):
      is_long_double = sizeof (ptrdiff_t) > sizeof (unsigned long int);
      is_long = sizeof (ptrdiff_t) > sizeof (unsigned int);
      JUMP (*++f, step4_jumps);

    LABEL (mod_intmax_t):
      is_long_double = sizeof (intmax_t) > sizeof (unsigned long int);
      is_long = sizeof (intmax_t) > sizeof (unsigned int);
      JUMP (*++f, step4_jumps);

      /* Process current format.  */
      while (1)
	
#define process_arg_int() va_arg (ap, int)
#define process_arg_long_int() va_arg (ap, long int)
#define process_arg_long_long_int() va_arg (ap, long long int)
#define process_arg_pointer() va_arg (ap, void *)
#define process_arg_string() va_arg (ap, const char *)
#define process_arg_unsigned_int() va_arg (ap, unsigned int)
#define process_arg_unsigned_long_int() va_arg (ap, unsigned long int)
#define process_arg_unsigned_long_long_int() va_arg (ap, unsigned long long int)
#define process_arg_wchar_t() va_arg (ap, wchar_t)
#define process_arg_wstring() va_arg (ap, const wchar_t *)
	  process_arg ();
	  process_string_arg ();
#undef process_arg_int
#undef process_arg_long_int
#undef process_arg_long_long_int
#undef process_arg_pointer
#undef process_arg_string
#undef process_arg_unsigned_int
#undef process_arg_unsigned_long_int
#undef process_arg_unsigned_long_long_int
#undef process_arg_wchar_t
#undef process_arg_wstring

	LABEL (form_float):
	LABEL (form_floathex):
	  
	    if (__glibc_unlikely ((mode_flags & PRINTF_LDBL_IS_DBL) != 0))
	      is_long_double = 0;

	    struct printf_info info =
	      
		.prec = prec,
		.width = width,
		.spec = spec,
		.is_long_double = is_long_double,
		以上是关于如何理解printf变参函数的实现的主要内容,如果未能解决你的问题,请参考以下文章

变参函数和可变参数宏

C 语言精髓之变参函数

Go - 函数/方法 的 变参

用initializer_list实现变参函数

用initializer_list实现C++变参函数

sprintf函数的用法是啥?