如何删除多余的双引号,而不是使用bash脚本在一行文本中打开和关闭双引号

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何删除多余的双引号,而不是使用bash脚本在一行文本中打开和关闭双引号相关的知识,希望对你有一定的参考价值。

我有一个文本文件,我想将其复制到CSV文件中,然后将该CSV文件复制到PostgreSQL表中。

我的输入文本文件是(old_sample.txt),

SVCOP,"12980","2019"0627","1DEX","LUBE, OIL & FILTER - DEXOS "1"","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

我必须使用下面的代码

cat old_sample.txt
printf "
"
echo "____________________________________"
printf "
"
cat old_sample.txt | sed ': again
s/("[^",]*)"([^",]*")/12/g
t again
s/""/"/g' 

输出是

SVCOP,"12980","2019"0627","1DEX","LUBE, OIL & FILTER - DEXOS "1"","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS "1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00",","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

问题是"LUBE, OIL & FILTER - DEXOS "1""

“ 1”这个双引号由于逗号而没有被删除,但是在双引号中却存在“ 2019” 0627“,所以我想删除用开和闭双引号括起来的字符串中的所有双引号。否则,它将显示数据库错误。

这是我的代码

nl -ba -nln -s, < old_sample.txt | sed ': again
                                      s/("[^",]*)"([^",]*")/12/g
                                      t again' | grep 'SVCPTS' > old_sample.csv
psql_local <<SQL || die "Failed to import parts data"
        copy sample_table from 'old_sample.csv' with (format csv, header false)
SQL 

我的目标输出是

SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"
答案

就我个人而言,我将寻求实用程序。我认为您也许可以通过找到正确的RegEx来实现它-但最终可能会变得非常复杂。

使用csvkit之类的东西-具体来说,the csvformat command似乎容易得多。如果将来您需要将此脚本与其他数据一起重用(在某些字段中可能会有换行符,或者可能需要考虑其他情况),也会更加可靠。

另一答案

请您尝试以下操作:

str=$(<odd_sample.txt)          # slurps the file into a variable "str"
while true; do                  # infinite loop
   str2=$(sed 's/([^,])"([^,])/12/g' <<< "$str")
   [[ $str2 = $str ]] && break  # if there is no change, exit the loop
   str="$str2"                  # update "str" for next iteration
done
echo "$str"

输出:

SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"
  • 正则表达式([^,])"([^,])匹配双引号,并把它括起来通过non-逗号。
  • 它一直循环直到所有多余的双引号都被删除。
  • 上面的脚本将适用于所提供的示例,但可能并不健壮足够用于任意输入。建议介绍一种工具如chrisputnam9所建议的那样,它能够解析csv文件以获得可靠的结果。
另一答案

无法在一个命令中完成,所以我做到了

 $ sed "s/['"]//g; s/,/","/g; s/"," /, /g; s/,,/,"",/g; s/$/"/; s/"//" file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I,0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS 1","91","LANE","LANE","L,LA MERE","125.00","125.00,"",0.00","0.00","0,0","0,||||||||||||||||||||||||","N"

以上是关于如何删除多余的双引号,而不是使用bash脚本在一行文本中打开和关闭双引号的主要内容,如果未能解决你的问题,请参考以下文章

如何删除 jq 输出中的双引号以在 bash 中解析 json 文件?

如何删除 jq 输出中的双引号以在 bash 中解析 json 文件?

Bash——反引号内的双引号

从 JSON 对象/字符串或 Java 脚本变量中删除开头和结尾的双引号?

在 shell 脚本中使用 $() 而不是反引号有啥好处?

如何在php中删除从json “x”:“y”到x:y的双引号[关闭]