如何使用正则表达式替换替换特殊字符?

Posted

技术标签:

【中文标题】如何使用正则表达式替换替换特殊字符?【英文标题】:how to use regex replace to replace special character? 【发布时间】:2020-03-24 06:45:50 【问题描述】:

我正在尝试使用 regex replace 将“\”替换为 \,但没有得到正确的解决方案。想要删除即将出现的双引号。你能帮我怎么做吗?

例子:

"\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

结果:

\"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

【问题讨论】:

您的问题已经解决了吗? 【参考方案1】:

这能解决您的问题吗?

re.sub(r'"\\"', r'\\', text)

【讨论】:

嗨,亚历克斯,感谢您的回复。我仍然面临同样的问题。我用这个 - df = df.withColumn('QSTN', regexp_replace(col('QSTN'), '"\\"', '\\')) 用于保存我正在使用的数据帧 - df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').save(output_path, escape= '\"', sep='|',header='True',nullValue=None)【参考方案2】:

尝试以下解决方案:

df = spark.createDataFrame([
    (1, '"\\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"')
], ("ID","textVal"))

import pandas as pd
from  pyspark.sql.functions import regexp_replace, col
pd.set_option('max_colwidth', 200)

df2 = df.withColumn('textVal', regexp_replace(col('textVal'), '\\"\\\\\"', '\\\\')) 
df2.toPandas()


ID  textVal
0   1   \"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

希望对你有帮助!

【讨论】:

以上是关于如何使用正则表达式替换替换特殊字符?的主要内容,如果未能解决你的问题,请参考以下文章

正则表达式替换特殊符号(高分)

带有正则表达式替换的特殊符号

用于模式替换的 Java 正则表达式 - 特殊字符和大小写更改为空格

如何使用正则表达式匹配或替换仅包含数值的密码

Java 正则表达式替换特殊字符

Java 正则表达式替换特殊字符