用百万 (M) 和十亿 (B) 后缀格式化数字

Posted 2023-04-14

技术标签:

【中文标题】用百万 (M) 和十亿 (B) 后缀格式化数字【英文标题】：Format numbers with million (M) and billion (B) suffixes 【发布时间】：2015-01-26 22:29:53 【问题描述】：

我有很多数字，例如货币或美元：

1 6,000,000
2 75,000,400
3 743,450,000
4 340,000
5 4,300,000

我想使用后缀来格式化它们，例如M（百万）和B（十亿）：

1 6.0 M
2 75.0 M
3 743.5 M
4 0.3 M
5 4.3 M

【问题讨论】：

我猜你可以做类似paste(as.numeric(gsub(",", "", x))/1e6, "M") 的事情，但我不确定这有多漂亮...... 工程记数法是科学记数法的一个子集，它试图让 10 的指数成为 3 的倍数。而且，有人为此编写了一些 R 代码：r.789695.n4.nabble.com/… - 建议从那里开始更改打印语句。 @Paul 我实际上在问这个问题之前就看到了那个帖子......但无法弄清楚发生了什么...... 如果你有一个数字向量，你可以看看 Spacedman 的this answer 并根据你的需要调整它。优点是数值不会改变，只会“很好地”打印出来。另见Convert numbers to SI prefix和sitools 【参考方案1】：

显然你首先需要去掉格式化数字中的逗号，gsub("\\,", ...) 是要走的路。这使用findInterval 为标签选择适当的后缀并确定分母以获得更紧凑的显示。如果想要低于 1.0 或高于 1 万亿，可以轻松地向任一方向扩展：

comprss <- function(tx)  
      div <- findInterval(as.numeric(gsub("\\,", "", tx)), 
         c(0, 1e3, 1e6, 1e9, 1e12) )  # modify this if negative numbers are possible
      paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 2), 
           c("","K","M","B","T")[div] )

如果输入是数字，则无需删除 as.numeric 或 gsub。这无疑是多余的，但会成功。这是 Gregor 示例的结果：

> comprss (big_x)
 [1] "123 "     "500 "     "999 "     "1.05 K"   "9 K"     
 [6] "49 K"     "105.4 K"  "998 K"    "1.5 M"    "20 M"    
[11] "313.4 M"  "453.12 B"

使用原始输入（如果使用read.table、read.csv 输入或使用data.frame 创建，这可能是一个因子变量。）

comprss (dat$V2)
[1] "6 M"      "75 M"     "743.45 M" "340 K"    "4.3 M"

当然，这些可以在不带引号的情况下打印，使用显式的print 命令，使用quotes=FALSE 或使用cat。

【讨论】：

很好的解决方案，但您需要在后缀数组中添加“T”以获取万亿。此外，为了完整起见，您应该在绝对值上找到Interval 以处理负数（但请注意，它与舍入负数的 ISO 行为不一致）。感谢您的敏锐评论。我猜 ISO 对各个英语国家对“十亿”定义的差异没有任何意见。（我确实看到到目前为止，***的文章声称英语已经放弃了他们之前的立场，但我从来没有得到那个备忘录。）啊抱歉，我只是在谈论处理负数并从零四舍五入（如果您在 tx 的绝对值上使用 findInterval 会发生什么），而不是向正无穷大舍入。 @42- 当我遇到一些奇怪的行为时，我正在使用此功能。我在这里问了一个问题***.com/q/46657442/1977587。但将其链接回帖子以供参考在查看了您的问题的 cmets 之后，我怀疑您的困难来自假设 R 支持零索引......它不支持。低于第一个间隔边界的项目将被忽略的原因。如果您想要一个与第一个参数长度相同的向量，您可能需要有一个 -Inf 边界和一个适当的值来说明该条件。【参考方案2】：

如果你从这个数字向量x开始，

x <- c(6e+06, 75000400, 743450000, 340000, 4300000)

您可以执行以下操作。

paste(format(round(x / 1e6, 1), trim = TRUE), "M")
# [1] "6.0 M"   "75.0 M"  "743.5 M" "0.3 M"   "4.3 M"

如果您不关心尾随零，只需删除 format() 调用。

paste(round(x / 1e6, 1), "M")
# [1] "6 M"     "75 M"    "743.5 M" "0.3 M"   "4.3 M"

或者，您可以使用 print 方法分配一个 S3 类，并将 y 保留为下面的数字。这里我使用paste0() 使结果更清晰。

print.million <- function(x, quote = FALSE, ...) 
    x <- paste0(round(x / 1e6, 1), "M")
    NextMethod(x, quote = quote, ...)

## assign the 'million' class to 'x'
class(x) <- "million"
x
# [1] 6M     75M    743.5M 0.3M   4.3M  
x[] 
# [1]   6000000  75000400 743450000    340000   4300000

您也可以为数十亿和数万亿美元做同样的事情。有关如何将其放入数据框中的信息，请参阅 this answer，因为您需要 format() 和 as.data.frame() 方法。

【讨论】：

这正是我所需要的。因此，如果我的数字已经是数字，我会取出 gsub 位并保留：round(x/1e6,1) ? 没问题！我知道我会将字符值作为输出。但是，是的，我有输入的数字。【参考方案3】：

scales 软件包的最新版本包括打印可读标签的功能。如果您使用的是 ggplot 或 tidyverse，scales 可能已经安装。不过，您可能需要更新软件包。

在这种情况下，可以使用label_number_si：

> library(scales)
> inp <- c(6000000, 75000400, 743450000, 340000, 4300000)
> label_number_si(accuracy=0.1)(inp)
[1] "6.0M"   "75.0M"  "743.4M" "340.0K" "4.3M"

【讨论】：

根据我的经验，这是最简单的方法。我不认为自定义函数是正确的建议。请注意，对于货币，您可能需要使用prefix = "$" 或类似名称。【参考方案4】：

另一种选择，从数字（而不是字符）数字开始，适用于数百万和数十亿（及以下）。您可以将更多参数传递给 formatC 以自定义输出，并在需要时扩展到 Trillions。

m_b_format = function(x) 
    b.index = x >= 1e9
    m.index = x >= 1e5 & x < 1e9

    output = formatC(x, format = "d", big.mark = ",")
    output[b.index] = paste(formatC(x[b.index] / 1e9, digits = 1, format = "f"), "B")
    output[m.index] = paste(formatC(x[m.index] / 1e6, digits = 1, format = "f"), "M")
    return(output)


your_x = c(6e6, 75e6 + 400, 743450000, 340000, 43e6)
> m_b_format(your_x)
[1] "6.0 M"   "75.0 M"  "743.5 M" "0.3 M"   "43.0 M" 

big_x = c(123, 500, 999, 1050, 9000, 49000, 105400, 998000,
          1.5e6, 2e7, 313402182, 453123634432)
> m_b_format(big_x)
 [1] "123"     "500"     "999"    "1,050"   "9,000"    "49,000"
 [7] "0.1 M"   "1.0 M"   "1.5 M"  "20.0 M"  "313.4 M"  "453.1 B"

【讨论】：

【参考方案5】：

借鉴其他答案并添加到它们的主要目的是为 ggplot2 轴生成漂亮的标签。是的，只有正值（负值将保持原样），因为通常我只希望这些后缀用于正数。易于扩展到负数。

# Format numbers with suffixes K, M, B, T and optional rounding. Vectorized
# Main purpose: pretty formatting axes for plots produced by ggplot2
#
# Usage in ggplot2: scale_x_continuous(labels = suffix_formatter)

suffix_formatter <- function(x, digits = NULL)

    intl <- c(1e3, 1e6, 1e9, 1e12);
    suffixes <- c('K', 'M', 'B', 'T');

    i <- findInterval(x, intl);

    result <- character(length(x));

    # Note: for ggplot2 the last label element of x is NA, so we need to handle it
    ind_format <- !is.na(x) & i > 0;

    # Format only the elements that need to be formatted 
    # with suffixes and possible rounding
    result[ind_format] <- paste0(
        formatC(x[ind_format]/intl[i[ind_format]], format = "f", digits = digits)
        ,suffixes[i[ind_format]]
    );
    # And leave the rest with no changes
    result[!ind_format] <- as.character(x[!ind_format]);

    return(invisible(result));

以及使用示例。

x <- seq(1:10);
d <- data.frame(x = x, y = 10^x);
ggplot(aes(x=x, y=y), data = d) + geom_line() + scale_y_log10()

without suffix formatter

ggplot(aes(x=x, y=y), data = d) + geom_line() + scale_y_log10(labels = suffix_formatter)

with suffix formatter

【讨论】：

谢谢！我添加了对负数的处理并在下面发布：***.com/a/56449202/496209【参考方案6】：

类似于@Alex Poklonskiy，我需要一个图表格式化程序。但我也需要一个支持负数的版本。这是他调整后的函数（虽然我不是 R 编程专家）：

number_format <- function(x, digits = NULL)

  intl <- c(1e3, 1e6, 1e9, 1e12)
  suffixes <- c(' K', ' M', ' B', ' T')

  i <- findInterval(x, intl)

  i_neg <- findInterval(-x, intl)

  result <- character(length(x))

  # Note: for ggplot2 the last label element of x is NA, so we need to handle it
  ind_format <- !is.na(x) & i > 0
  neg_format <- !is.na(x) & i_neg > 0

  # Format only the elements that need to be formatted
  # with suffixes and possible rounding
  result[ind_format] <- paste0(
    formatC(x[ind_format] / intl[i[ind_format]], format = "f", digits = digits),
    suffixes[i[ind_format]]
  )
  # Format negative numbers
  result[neg_format] <- paste0(
    formatC(x[neg_format] / intl[i_neg[neg_format]], format = "f", digits = digits),
    suffixes[i_neg[neg_format]]
  )

  # To the rest only apply rounding
  result[!ind_format & !neg_format] <- as.character(
    formatC(x[!ind_format & !neg_format], format = "f", digits = digits)
  )

  return(invisible(result))

我还调整了digits 参数用于舍入没有后缀的值（例如1.23434546）

示例用法：

> print( number_format(c(1.2325353, 500, 132364584563, 5.67e+9, -2.45e+7, -1.2333, -55)) )
[1] "1.2325"     "500.0000"   "132.3646 B" "5.6700 B"   "-24.5000 M" "-1.2333"    "-55.0000"  
> print( number_format(c(1.2325353, 500, 132364584563, 5.67e+9, -2.45e+7, -1.2333, -55), digits = 2) )
[1] "1.23"     "500.00"   "132.36 B" "5.67 B"   "-24.50 M" "-1.23"    "-55.00"

【讨论】：

【参考方案7】：

dplyr 的 case_when 现在为此提供了更友好的解决方案 - 例如：

format_bignum = function(n)
  case_when(
    n >= 1e12 ~ paste(round(n/1e12), 'Tn'),
    n >= 1e9 ~ paste(round(n/1e9), 'Bn'),
    n >= 1e6 ~ paste(round(n/1e6), 'M'),
    n >= 1e3 ~ paste(round(n/1e3), 'K'),
    TRUE ~ as.character(n))

或者，您可以将 case_when 位嵌入到 mutate 调用中。

【讨论】：

【参考方案8】：

我重写 @42- 函数来容纳 % 数字，就像这样

compress <- function(tx) 
  tx <- as.numeric(gsub("\\,", "", tx))
  int <- c(1e-2, 1, 1e3, 1e6, 1e9, 1e12)
  div <- findInterval(tx, int)
  paste(round( tx/int[div], 2), c("%","", "K","M","B","T")[div] )


>tx
 total_reads  total_bases     q20_rate     q30_rate   gc_content 
3.504660e+05 1.051398e+08 6.648160e-01 4.810370e-01 5.111660e-01 
> compress(tx)
[1] "350.47 K" "105.14 M" "66.48 %"  "48.1 %"   "51.12 %"

这可能对类似的问题有用

【讨论】：

【参考方案9】：

scales 包的另一个选项是使用unit_format：

inp <- c(6000000, 75000400, 743450000, 340000, 4300000)

scales::unit_format(unit = 'M', scale = 1e-6)(inp)
# "6.0 M"   "75.0 M"  "743.4 M" "0.3 M"   "4.3 M"

【讨论】：

以上是关于用百万 (M) 和十亿 (B) 后缀格式化数字的主要内容，如果未能解决你的问题，请参考以下文章