打印 CSV 的前 N 行,其中引用的字段可以包含换行符
Posted
技术标签:
【中文标题】打印 CSV 的前 N 行,其中引用的字段可以包含换行符【英文标题】:Print the first N lines of a CSV where quoted fields can contain newlines 【发布时间】:2021-09-29 18:03:16 【问题描述】:CSV 文件可以包含带有换行符的数据。它可以与任何列一起使用。此外,某些行可以包含没有任何新行的数据,因此它应该在所有情况下都可以使用
示例输入
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnect
Thanks for your time!
With Joy.
Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnect
Thanks for your time!
With Joy.
Test",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2
我正在使用以下命令读取 csv 的前 5 条记录
awk -v RS='("[^"]*")?\r?\n' 'NFORS = gensub(/\r?\n(.)/, "\\\\n\\1", "g", RT); ++n; print n==5exit' file.csv
实际输出:
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnect\nThanks for your time!\nWith Joy.\Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnect\nThanks for your time!\nWith Joy.\nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2
想要的输出:
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnect\nThanks for your time!\nWith Joy.\Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnect\nThanks for your time!\nWith Joy.\nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
【问题讨论】:
【参考方案1】:仅使用您展示的示例,您能否尝试遵循awk
代码。使用 GNU awk
编写和测试。使用RS
记录分隔符,然后全局替换以使RT中的新行无效,然后相应地打印行。
awk -v RS='"[^"]*"' 'gsub(/\n/,"\\n",RT);ORS=RT 1' Input_file
要获取前 10 条记录,请尝试以下操作:
awk -v RS='"[^"]*"' 'gsub(/\n/,"\\n",RT);ORS=RT 1' Input_file | head -10
【讨论】:
【参考方案2】:警告:提前自我宣传!
我编写了一个类似于awk
的实用程序,名为tawk
,它使用 tcl 作为其脚本语言,并且具有读取 CSV 数据的模式,而无需使用正则表达式来处理嵌入了换行符和引号的记录(这功能实际上是我的主要灵感来源)。
使用它:
$ tawk -csv 'line $NR <= 5 puts [regsub -all \n+ $F(0) "\\n"]; if $NR == 5 exit ' input.csv
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnect\nThanks for your time!\nWith Joy.\nTest",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnect\nThanks for your time!\nWith Joy.\nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
【讨论】:
以上是关于打印 CSV 的前 N 行,其中引用的字段可以包含换行符的主要内容,如果未能解决你的问题,请参考以下文章