如何使用正则表达式替换替换特殊字符？

Posted 2023-04-15

技术标签:

【中文标题】如何使用正则表达式替换替换特殊字符？【英文标题】：how to use regex replace to replace special character? 【发布时间】：2020-03-24 06:45:50 【问题描述】：

我正在尝试使用 regex replace 将“\”替换为 \，但没有得到正确的解决方案。想要删除即将出现的双引号。你能帮我怎么做吗？

例子：

"\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

结果：

\"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

【问题讨论】：

您的问题已经解决了吗？ 【参考方案1】：

这能解决您的问题吗？

re.sub(r'"\\"', r'\\', text)

【讨论】：

嗨，亚历克斯，感谢您的回复。我仍然面临同样的问题。我用这个 - df = df.withColumn('QSTN', regexp_replace(col('QSTN'), '"\\"', '\\')) 用于保存我正在使用的数据帧 - df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').save(output_path, escape= '\"', sep='|',header='True',nullValue=None)【参考方案2】：

尝试以下解决方案：

df = spark.createDataFrame([
    (1, '"\\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"')
], ("ID","textVal"))

import pandas as pd
from  pyspark.sql.functions import regexp_replace, col
pd.set_option('max_colwidth', 200)

df2 = df.withColumn('textVal', regexp_replace(col('textVal'), '\\"\\\\\"', '\\\\')) 
df2.toPandas()


ID  textVal
0   1   \"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"

希望对你有帮助！

【讨论】：

以上是关于如何使用正则表达式替换替换特殊字符？的主要内容，如果未能解决你的问题，请参考以下文章