将多个 YAML 文件转换为 CSV

Posted

技术标签:

【中文标题】将多个 YAML 文件转换为 CSV【英文标题】:Convert several YAML files to CSV 【发布时间】:2019-01-23 14:37:24 【问题描述】:

我对 Python 很陌生,有几个 YAML 文件需要转换为 csv。这些是来自我们的 CRM (Highrise) 的笔记、cmets 和电子邮件。我只需要注释和评论,而不需要电子邮件。下面是几个例子。

Test_Co_1.txt

---
- ID: 273679215
  Name: Test Company 1
  Tags: 
  - Sample tag 1
  - Sample tag 2
  - Sample tag 3
  - Sample tag 4
- Contact: 
  - 
    - Addresses
    - 
      - "123 W Elm Street, Anywhere, FL, 11111, United States"
  - 
    - Phone_numbers
    - 
      - 555-111-2222
- Background: sample text
- Note 424169327: 
  - 
    Author: Diane S.
  - 
    Written: "May 16, 2017 19:32"
  - 
    About: Jeff Smith
  - 
    Body: Called 5/16/17 - Receptionist indicated Jeff was unavailable. She said they are not interested in attending any webinars hung up.
- Note 424598243: 
  - 
    Author: Jenny S.
  - 
    Written: "May 18, 2017 15:45"
  - 
    About: Test Company 1
  - 
    Body: |-
      email sent to TM: Pete

      Pete,

      Can you help us with this prospective customer to determine if he is interested?

      We made some outbound calls this week, inviting dealers to the prospective dealer Summer Series webinars, with the first one being this Friday.  Can you see if Jeff is interested?  We do not have an email for him.  Do you have that?

      This is the note from earlier this week:
      Called 5/16/17 - Receptionist indicated Jeff was unavailable. She said they are not interested in attending any webinars hung up.

      Thanks for your help.
      photo

      Jenny
- Comment 424601588: 
  - 
    Author: Jenny S.
  - 
    Written: "May 18, 2017 15:56"
  - 
    About: Test Company 1
  - 
    Body: |-
      email back from TM: Jenny,

      Yes.  I will reach out to them. 

      Thanks!
      Pete

另一个例子:Fake_Co_2

---
- ID: 306184746
  Name: Fake Company 2
  Tags: 
  - Sample Tag 1
- Contact: 
  - 
    - Addresses
    - 
      - "444 N Oak St, Faketon City, MI, 22222, United States"
  - 
    - Phone_numbers
    - 
      - 333-333-3333
- Note 473905168: 
  - 
    Author: Robin S.
  - 
    Written: "February 20, 2018 22:19"
  - 
    About: Fake Company 2
  - 
    Body: "1:1 with Steven 2/27/18"
- Email 476444812: 
  - 
    Author: Aaron N.
  - 
    Written: "March 06, 2018 16:30"
  - 
    About: Jose Viago
  - 
    Subject: Welcome Call
  - 
    Body: |-
      Hello Jose,



      We just talked and we scheduled your welcome call.  I noticed after we hung
      up that time changes this weekend.  Unfortunately Arizona
      doesn't change time and we will now be 2 hours behind you.  Are you
      available on at 10:30 AM CST on Tuesday, March 13th?  Otherwise I will need
      to schedule at a different time.  



      I apologize for the error and inconvenience. 




       <http://fakedomain.com/> 

      Support Team Lead 
      D: xxx-xxx-xxxx | C: xxx-xxx-xxxx | F: xxx-xxx-xxxx 
       <mailto:noreply@fakedomain.com> noreply@fakedomain.com 




       <http://fakedomain.com/> Website |
      <https://www.youtube.com/watch?v=xxx> Our Story


      Confidentiality Disclaimer: This email may contain confidential and/or
      private 
      information. If you received this email in error please delete and notify
      sender.
- Note 476458623: 
  - 
    Author: Jamie H.
  - 
    Written: "March 06, 2018 17:12"
  - 
    About: Fake Company 2
  - 
    Body: ""
- Note 476460268: 
  - 
    Author: Aaron N.
  - 
    Written: "March 06, 2018 17:18"
  - 
    About: Fake Company 2
  - 
    Body: |-
      Called and talked to Jose and scheduled the Welcome Call for Tuesday, March 13 at 9:30 AM.  After I hung up I realized that time changes this weekend.  I left him a voice mail and emailed to see if doing the appointment at 10:30 AM would be ok.  

      Prep for appointment: Monday, March 12 at 2:30 PM 
      Welcome Call: Tuesday, March 13 at 10:30 AM CST

      Jose emailed back and said that 10:30 is fine.  

      Michael H has been scheduled
- Comment 476460532: 
  - 
    Author: Aaron N.
  - 
    Written: "March 06, 2018 17:18"
  - 
    About: Jose Viago
  - 
    Body: |-
      From: Jose Viago [mailto:fakecompany2@gmail.com] 
      Sent: Tuesday, March 6, 2018 10:01 AM
      To: admin@fakecompany.com
      Subject: Re: Welcome Call

      Yes that is fine.  Thank you! 
      Jose Viago
      Fake Company 2
      xxx-xxx-xxxx
- Note 477585004: 
  - 
    Author: Laura H.
  - 
    Written: "March 12, 2018 23:46"
  - 
    About: Fake Company 2
  - 
    Body: |-
      Welcome call prep complete. Roadmap & workbook have been saved to their profile in BOX, and updated per their provided information. 
      03/12/18 (LH)
- Note 477740716: 
  - 
    Author: Michael H.
  - 
    Written: "March 13, 2018 16:47"
  - 
    About: Fake Company 2
  - 
    Body: |-
      03-13-2018. Did a welcome call with Jose. Jose now has access to the box. We will have a follow up call for Dashboard roll out.

      03-13-2018. Did a follow up with Jose. He now has owner and tech role to the App and Dashboard. We also reviewed Online portal and help center. (MH)
- Note 502997603: 
  - 
    Author: Laura H.
  - 
    Written: "August 06, 2018 17:14"
  - 
    About: Fake Company 2
  - 
    Body: |-
      Received a text from Jose letting me know there is a leak in his office, and he needs to reschedule our call today. I moved him to Thursday 08/09/18 @ 9:00AM CDT. 
      08/06/18 (LH)

其中一些文本文件长达 1000 行,包含曾经为该特定客户(或为该客户工作的联系人)记录的所有内部注释、评论和电子邮件。

我们正在迁移到不同的 CRM,并且只需要导入注释和评论。我想像这样生成一个 csv(或多个 csv 文件):

输出.csv

Name,Author,Written,About,Body
"Fake Company 2"|"Robin S."|"February 20, 2018 22:19"|"Fake Company 2"|"1:1 with Steve 2/27/18"
"Fake Company 2"|"Aaron N."|"March 06, 2018 17:18"|"Fake Company 2"|"Called and talked to Jose and scheduled the Welcome Call for Tuesday, March 13 at 9:30 AM.  After I hung up I realized that time changes this weekend.  I left him a voice mail and emailed to see if doing the appointment at 10:30 AM would be ok.  

      Prep for appointment: Monday, March 12 at 2:30 PM 
      Welcome Call: Tuesday, March 13 at 10:30 AM CST

      Jose emailed back and said that 10:30 is fine.  

      Michael H has been scheduled"

我找到了这段代码Need a script that extracts from a yaml file content and output as a csv file,但我对 Python 的了解还不够,无法让它在没有语法错误的情况下正常工作。

【问题讨论】:

您在为哪一部分苦苦挣扎?您对语法错误进行了哪些尝试? (什么是语法错误?) 这看起来像是一个包含多个 YAML 文档的大文件,其中第二个是无效的 YAML(在根级别结合了列表和多行纯标量)。您将无法使用 YAML 解析器对其进行解析。 (如果这不是包含多个 YAML 文档的单个文件,请努力格式化,不要将其作为一个文件呈现。 【参考方案1】:

我会使用 Python YAML 库来帮助完成这项工作。可以使用以下方式安装:

pip install pyyaml

您提供的文件可以转换为 CSV,如下所示:

import csv
import yaml

fieldnames = ['Name', 'Author', 'Written', 'About', 'Body']

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames)
    csv_output.writeheader()

    for filename in ['Test_Co_1.txt', 'Test_Co_2.txt']:
        with open(filename) as f_input:
            data = yaml.safe_load(f_input)

        name = data[0]['Name']

        for entry in data:
            key = next(iter(entry))

            if key.startswith('Note') or key.startswith('Comment'):
                row = 'Name' : name

                for d in entry[key]:
                    for get in ['Author', 'Written', 'About', 'Body']:
                        try:
                            row[get] = d[get]
                        except KeyError as e:
                            pass

                csv_output.writerow(row)

这采用标准 CSV 格式(即,如果字段包含换行符或逗号,则在字段和引号之间使用逗号)。

要理解这一点,我建议您添加一些打印语句以查看情况。例如data 以列表和字典的格式保存整个文件内容。然后是提取您需要的位的情况。

要将其应用于您的所有 YAML 文件,我将使用对 glob.glob('*.txt') 的调用来替换文件名

【讨论】:

谢谢马丁。这非常有帮助。这可以正确地将我的大部分文件转换为 csv,但是在某些带有特殊字符的文件上确实出现了一些错误。我很肯定这是一个编码问题,但还没有解决。再次感谢您抽出宝贵时间帮助刚接触 python 的人。 你可以尝试用with open(filename, encoding='utf-8') as f_input:打开文件

以上是关于将多个 YAML 文件转换为 CSV的主要内容,如果未能解决你的问题,请参考以下文章

如何将单个工作表中的多行(在 excel 中)转换为多个 CSV 文件

使用脚本/Windows 命令提示符将多个 CSV 文件转换为 UTF-8 编码

如何在 Python 中将多个 .txt 文件转换为 .csv 文件

遍历多个 html 文件并转换为 csv

将 xls 文件批量转换为 csv

使用 sed 将 \s+ 分隔文件转换为 csv