从 Power BI 桌面/查询中的单元格中提取短语

Posted

技术标签:

【中文标题】从 Power BI 桌面/查询中的单元格中提取短语【英文标题】:Extract phrases from cell in power BI Desktop/Queries 【发布时间】:2021-10-12 17:19:05 【问题描述】:

我对 Power BI 完全陌生,对那个工具有点迷茫。

我正在尝试从具有很长字符串的单元格中提取短语。 我尝试提取的每个短语都以“DI”开头,后跟 4 个随机数,因此格式为 DIXXXX。 每个单元格中都有随机数量的此类短语,它们被放置在字符串的随机部分中。 我要求将每个短语提取到一个单独的单元格中。

请看下面的长字符串示例

另外,我以前是用excel做的,不幸的是,由于大量的数据,它一直在崩溃。

excel中的最终结果如下所示

谁能告诉我如何在 Power Bi 中实现这一目标

【问题讨论】:

我建议使用正则表达式。您可以使用 python 或 javascript 来实现它。 Extract numbers from text by Minimum Length of Number String 展示了一个使用 javascript 能够在 Power Query 中使用正则表达式的示例 【参考方案1】:

您可以使用分隔符拆分列(使用“DI”),然后提取每个结果列的前 4 位,然后根据需要添加前缀。

类似这样的:

let
    Source = Excel.CurrentWorkbook()[Name="Table1"][Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,"Column1", type text),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByDelimiter("DI", QuoteStyle.Csv), "Column1.1", "Column1.2", "Column1.3", "Column1.4"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter","Column1.1", type text, "Column1.2", type text, "Column1.3", type text),
    #"Extracted First Characters" = Table.TransformColumns(#"Changed Type1", "Column1.2", each Text.Start(_, 4), type text, "Column1.3", each Text.Start(_, 4), type text),
    #"Removed Columns" = Table.RemoveColumns(#"Extracted First Characters","Column1.1"),
    #"Added Prefix" = Table.TransformColumns(#"Removed Columns", "Column1.2", each "DI" & _, type text),
    #"Added Prefix1" = Table.TransformColumns(#"Added Prefix", "Column1.3", each "DI" & _, type text),
    #"Added Prefix2" = Table.TransformColumns(#"Added Prefix1", "Column1.4", each "DI" & _, type text)
in
    #"Added Prefix2"

【讨论】:

对于迟到的答案,我深表歉意。我试过这个并收到以下错误:Expression.Error:我们找不到名为“Table1”的 Excel 表。详情:Table1 Table1 是我的工作簿中表的名称。您的可能会有所不同。 不,我的表也叫Table1,不知道哪里出了问题,在任何地方都找不到答案。 这很奇怪。第一步就失败了?也许不是将代码粘贴到高级编辑器中,而是尝试单击数据 -> 从表/范围,然后选择您的表。看看它是否会抛出同样的错误。 好吧,我有时间玩一会儿。所以当我在 Excel 查询中运行它时它工作当我尝试在 Power Bi 查询中运行它时它不起作用。最重要的是,输出中存在一些小问题。粘贴我的最终代码作为这个问题的答案。【参考方案2】:

试试这个

let
  Source = Table.FromRows(
    Json.Document(
      Binary.Decompress(
        Binary.FromText(
          "i45WKsnILFYAokSF4pKizLx0BRdPQyNjE1MgDaLMgLSZuYWlAUxcKVYnWgmhxsTY1MxcAWYIjJ+XX64UGwsA",
          BinaryEncoding.Base64
        ),
        Compression.Deflate
      )
    ),
    let
      _t = ((type nullable text) meta [Serialized.Text = true])
    in
      type table [Column1 = _t]
  ),
  #"Changed Type" = Table.TransformColumnTypes(Source, "Column1", type text),
  fx =
    let
      fx = (input) =>
        Web.Page(
          "<script>
            var x='"
                    
            & input
            & "';
            var b = x.match(/DI[0-9]5/gm);
            document.write(b);
        </script>"


                  
        )0[Data]0[Children]1[Children]
    in
      fx,
  #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each fx([Column1])),
  #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", "Text", "Text"),
  #"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom", each Text.Split([Text], ",")),
  #"Removed Columns" = Table.RemoveColumns(#"Added Custom1", "Text"),
  #"Expanded Custom1" = Table.ExpandListColumn(#"Removed Columns", "Custom"),
  #"Added Index" = Table.AddIndexColumn(#"Expanded Custom1", "Index", 1, 1, Int64.Type),
  #"Grouped Rows" = Table.Group(
    #"Added Index",
    "Column1",
    "ad", each _, type table [Column1 = nullable text, Custom = text, Index = number]
  ),
  #"Added Custom2" = Table.AddColumn(
    #"Grouped Rows",
    "Custom",
    each
      let
        x = [ad],
        #"Sorted Rows" = Table.Sort(x, "Index", Order.Ascending),
        #"Added Index1" = Table.AddIndexColumn(#"Sorted Rows", "Index.1", 1, 1, Int64.Type),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Index1", "Index"),
        #"Added Prefix" = Table.TransformColumns(
          #"Removed Columns1",
          "Index.1", each "Value" & Text.From(_, "en-US"), type text
        ),
        #"Pivoted Column" = Table.Pivot(
          #"Added Prefix",
          List.Distinct(#"Added Prefix"[Index.1]),
          "Index.1",
          "Custom"
        )
      in
        #"Pivoted Column"
  ),
  #"Removed Other Columns" = Table.SelectColumns(#"Added Custom2", "Custom"),
  #"Expanded Custom2" = Table.ExpandTableColumn(
    #"Removed Other Columns",
    "Custom",
    "Column1", "Value1", "Value2", "Value3", "Value4",
    "Column1", "Value1", "Value2", "Value3", "Value4"
  )
in
  #"Expanded Custom2"

【讨论】:

我为迟到的答案道歉。我已将您的代码添加到自定义列公式中,并为我返回了一个表格。不幸的是,该表中有一些随机值。我假设您的代码引用了一个列,但我不确定代码的哪一部分是这样做的。【参考方案3】:

对我有用的最终更正代码:

let
    Source = Excel.CurrentWorkbook()[Name="Table1"][Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,"Merged", type text),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Merged", Splitter.SplitTextByDelimiter("DI", QuoteStyle.Csv), "Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5","Column1.6","Column1.7","Column1.8","Column1.9","Column1.10"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter","Column1.1", type text, "Column1.2", type text, "Column1.3", type text, "Column1.4", type text, "Column1.5", type text, "Column1.6", type text, "Column1.7", type text, "Column1.8", type text, "Column1.9", type text, "Column1.10", type text),
    #"Extracted First Characters" = Table.TransformColumns(#"Changed Type1", "Column1.2", each Text.Start(_, 4), type text, "Column1.3", each Text.Start(_, 4), type text, "Column1.4", each Text.Start(_, 4), type text, "Column1.5", each Text.Start(_, 4), type text, "Column1.6", each Text.Start(_, 4), type text, "Column1.7", each Text.Start(_, 4), type text, "Column1.8", each Text.Start(_, 4), type text, "Column1.9", each Text.Start(_, 4), type text, "Column1.10", each Text.Start(_, 4), type text),
    #"Removed Columns" = Table.RemoveColumns(#"Extracted First Characters","Column1.1"),
    #"Added Prefix" = Table.TransformColumns(#"Removed Columns", "Column1.2", each "DI" & _, type text),
    #"Added Prefix1" = Table.TransformColumns(#"Added Prefix", "Column1.3", each "DI" & _, type text),
    #"Added Prefix2" = Table.TransformColumns(#"Added Prefix1", "Column1.4", each "DI" & _, type text),
    #"Added Prefix3" = Table.TransformColumns(#"Added Prefix2", "Column1.5", each "DI" & _, type text),
    #"Added Prefix4" = Table.TransformColumns(#"Added Prefix3", "Column1.6", each "DI" & _, type text),
#"Added Prefix5" = Table.TransformColumns(#"Added Prefix4", "Column1.7", each "DI" & _, type text),
#"Added Prefix6" = Table.TransformColumns(#"Added Prefix5", "Column1.8", each "DI" & _, type text),
#"Added Prefix7" = Table.TransformColumns(#"Added Prefix6", "Column1.9", each "DI" & _, type text),
#"Added Prefix8" = Table.TransformColumns(#"Added Prefix7", "Column1.10", each "DI" & _, type text)
in
    #"Added Prefix8"

【讨论】:

以上是关于从 Power BI 桌面/查询中的单元格中提取短语的主要内容,如果未能解决你的问题,请参考以下文章

从 Power Query 中的单元格值中提取数据类型?

通过 Power Bi 中的企业网关查询 MDX 查询

power bi函数与excel通用么

Power BI 查询 - 将分隔符之间的文本提取到新列

Power BI 中的 M(Power Query),将参数从列表传递到 SQL 语句

在 Power Query 中调用 Power BI API