如何在谷歌应用脚​​本中将段落 html 字符串转换为没有 html 标签的纯文本?

Posted

技术标签:

【中文标题】如何在谷歌应用脚​​本中将段落 html 字符串转换为没有 html 标签的纯文本?【英文标题】:How to convert a paragraph html string to plain text without html tags in google app script? 【发布时间】:2021-12-24 13:25:08 【问题描述】:

这是我上一个问题的后续问题。当我想使用this 问题中的参考将 html 字符串转换为没有 HTML 标签的谷歌应用脚​​本中的纯文本时遇到问题。但是,这次是段落格式。

这是我使用的脚本:

function pullDataFromWorkday() 
  var url = 'https://services1.myworkday.com/ccx/service/customreport2/[company name]/[owner's email]/[Report Name]?format=csv'; //this is the csv link from workday report
  var b64 = 'asdfghjklkjhgfdfghj=='; //this is supposed to be our workday password in b64
  var response = UrlFetchApp.fetch(url, 
      headers: 
        Authorization: 'Basic '+ b64
      
  );

//Parse   
  if (response.getResponseCode() >= 200 && response.getResponseCode() < 300) 
    var blob = response.getBlob();
    var string = blob.getDataAsString();
    var data = Utilities.parseCsv(string, ",");

    for(i=1;i<data.length;i++)
    

      data[i][0];
      data[i][1];
      data[i][2]=toStringFromHtml(data[i][2]);
      data[i][3]=toStringFromHtml(data[i][3]);
      data[i][4]=toStringFromHtml(data[i][4]);
      data[i][5]=toStringFromHtml(data[i][5]);
    

  //Paste  it in   
  var ss = SpreadsheetApp.getActive();
  var sheet = ss.getSheetByName('Sheet1');
  sheet.clear();
  sheet.getRange(1,1,data.length,data[0].length).setValues(data);
    

  else 
    return;
    
  



function toStringFromHtml(html)

  
html = '<div>' + html + '</div>';
html = html.replace(/<br>/g,"");
var document = XmlService.parse(html);
var strText = XmlService.getPrettyFormat().format(document);
strText = strText.replace(/<[^>]*>/g,"");
return strText.trim();

这是我想要的数据样本:

或者您可以使用这个sample 电子表格。

我有没有遗漏或做错的步骤?

感谢您之前回答问题

【问题讨论】:

【参考方案1】:

我认为您可以使用这个库:[cheerio for Google Apps Script][1]

function htmltotext() 
  let html = `<p><span>Hi Katy</span></p><p></p><p><span>The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:</span></p><p></p><p></p><p><span>1. Examples of annoying habits people have on the Skytrain.</span></p><p><span>2. Positive habits that you admire in other people.   </span></p><p><span>3. Endangered animals in Asia. </span></p>`
  const $ = Cheerio.load(html)
  let paragraph = []
  let lines = $('p')
  for(let i = 0; i < lines.length;i++) 
    let line = lines.get((i))
    let line_text = $(line).text();
    if(line_text) 
      paragraph.push(line_text)
    
  
  Logger.log(paragraph.join('\n'))
  return paragraph.join('\n')

【讨论】:

您能否提供有关此软件包是什么、它的作用以及它如何帮助 OP 的更多信息? Cheerio 是一个 GAS 帮助用户解析 html 的库。 @Tyler2P 请检查最后的评论。我已经更新了我的脚本【参考方案2】:

你的情况,如何修改toStringFromHtml如下?

修改脚本:

function toStringFromHtml(html) 
  html = '<div>' + html + '</div>';
  html = html.replace(/<br>/g, "").replace(/<p><\/p><p><\/p>/g, "<p></p>").replace(/<span>|<\/span>/g, "");
  var document = XmlService.parse(html);
  var strText = XmlService.getPrettyFormat().setIndent("").format(document);
  strText = strText.replace(/<[^>]*>/g, "");
  return strText.trim();

在此修改后的脚本中,您的以下示例 HTML 转换如下。

来自

  <p><span>Hi Katy</span></p>
  <p></p>
  <p><span>The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:</span></p>
  <p></p>
  <p></p>
  <p><span>1. Examples of annoying habits people have on the Skytrain.</span></p>
  <p><span>2. Positive habits that you admire in other people. </span></p>
  <p><span>3. Endangered animals in Asia. </span></p>

  <div>
    <p>Hi Katy</p>
    <p></p>
    <p>The illustration (examples) paragraph is useful when we want to explain or clarify something,
      such as an object,
      a person,
      a concept,
      or a situation. Sample Illustration Topics:</p>
    <p></p>
    <p>1. Examples of annoying habits people have on the Skytrain.</p>
    <p>2. Positive habits that you admire in other people. </p>
    <p>3. Endangered animals in Asia. </p>
  </div>

通过这种转换,得到以下结果。

  Hi Katy

  The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:

  1. Examples of annoying habits people have on the Skytrain.
  2. Positive habits that you admire in other people.
  3. Endangered animals in Asia.

注意:

使用问题中显示的示例 HTML 时,修改后的脚本可以实现您的目标。但是,我不确定您的其他 HTML 数据。所以我不确定这个修改后的脚本是否可以用于您的实际 HTML 数据。请注意这一点。

【讨论】:

嗨@tanaike 非常感谢你!现在它工作正常。我通过删除 "replace(/|/g, "")" 稍微修改了脚本,因为显然不是所有写在 "" 中的原始数据跨度> @Nadila 感谢您的回复。很高兴您的问题得到解决。

以上是关于如何在谷歌应用脚​​本中将段落 html 字符串转换为没有 html 标签的纯文本?的主要内容,如果未能解决你的问题,请参考以下文章

如何在谷歌应用脚​​本中解压 .ZIP mime 类型(应用程序/x-zip-compressed)

如何在谷歌应用引擎 RPC 流中将对象作为参数传递?

如何在谷歌地图中将文本设置为标记

如何在谷歌脚本中将文本作为日期插入单元格? (符合时区转换)

如何使用谷歌应用脚​​本发送电子邮件草稿

如何使用谷歌应用脚​​本找到最后一列?