如何拆分文本以匹配双引号加上尾随文本到点?
Posted
技术标签:
【中文标题】如何拆分文本以匹配双引号加上尾随文本到点?【英文标题】:How to split text to match double quotes plus trailing text to dot? 【发布时间】:2017-10-20 07:51:32 【问题描述】:我怎样才能得到一个双引号中的句子,其中有一个必须拆分的点?
像这样的示例文档:
“国际象棋帮助我们克服困难和痛苦,”乌尼克里希南说,带着我的王后。 “在棋盘上,你正在战斗。因为我们也在与日常生活中的艰辛作斗争。”他说。
我想得到这样的输出:
Array
(
[0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
[1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
)
我的代码仍然按点爆炸。
function sample($string)
$data=array();
$break=explode(".", $string);
array_push($data, $break);
print_r($data);
我仍然对拆分关于双引号和点的两个分隔符感到困惑。因为在双引号里面有一个包含点分隔符的句子。
【问题讨论】:
【参考方案1】:(*SKIP)(*FAIL)
的完美示例:
“[^“”]+”(*SKIP)(*FAIL)|\.\s*
# looks for strings in double quotes
# throws them away
# matches a dot literally, followed by whitespaces eventually
在
php
:
$regex = '~“[^“”]+”(*SKIP)(*FAIL)|\.\s*~';
$parts = preg_split($regex, $your_string_here);
这会产生
Array
(
[0] => “Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen
[1] => “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”
)
参见a demo on regex101.com 和a demo on ideone.com。
【讨论】:
你能告诉我你的正则表达式中字符~
是什么意思吗? Cz 我尝试学习正则表达式,但我没有在正则表达式中找到字符~
。或者你能给我参考学习正则表达式字符吗?,谢谢。
@Rachmad:这些是分隔符,例如 /
或 #
,并且在正则表达式字符串的两侧都需要。
哦..so 如果我将~
更改为 ~/~ 没问题? @简【参考方案2】:
这是preg_split()
后跟preg_replace()
使用的更简单的模式来修复左右双引号(Demo):
$in = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.';
$out = preg_split('/ (?=“)/', $in, 0, PREG_SPLIT_NO_EMPTY);
//$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
$find = '/[“”]/u'; // unicode flag is essential
$replace = '"';
$out = preg_replace($find, $replace, $out); // replace curly quotes with standard double quotes
var_export($out);
输出:
array (
0 => '"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.',
1 => '"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.',
)
preg_split()
匹配空格后跟“
(左双引号)。
preg_replace()
步骤需要带有u
修饰符的模式,以确保识别字符类中的左右双引号。使用 '/“|”/'
意味着您可以删除 u
修饰符,但它会使正则表达式引擎必须执行的步骤加倍(在这种情况下,我的字符类仅使用 189 步,而管道字符使用 372 步)。
此外,关于preg_split()
和preg_match_all()
之间的选择,选择preg_split()
的原因是因为目标只是在left double quote
后面的空格上拆分字符串。如果目标是省略与分隔空格字符不相邻的子字符串,preg_match_all()
将是一个更实际的选择。
尽管我的逻辑,如果你仍然想使用preg_match_all()
,我的preg_split()
行可以替换为:
$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
【讨论】:
完美解决方案! 也不错.. 但是我们如何在 php 中打印双引号? 哦.. 我知道我的问题,只需编辑 .htacces 并添加特殊字符AddDefaultCharset UTF-8 AddCharset UTF-8 .php
,也感谢 @mickmackusa【参考方案3】:
或者:
regex101 (16 步)
“.[^”]+”(?:.[^“]+)?
“.[^”]+”
匹配 “
和 ”
之间的所有内容。
(?:.[^“]+)?
匹配 - 一种可能性,这就是为什么会有最后一个 ?
- 不是开始的一切“
,?:
表示非捕获组。
PHP - PHPfiddle: - 点击“Run-F9” - [ 更新为替换 “
, @ 987654332@ with "
]
<?php
$str = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”';
if(preg_match_all('/“.[^”]+”(?:.[^“]+)?/',$str, $matches))
echo '<pre>';
print_r(preg_replace('[“|”]', '"', $matches[0]));
echo '</pre>';
?>
输出:
Array ( [0] => "Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen. [1] => "On a chess board you are fighting. as we are also fighting the hardships in our daily life." )
【讨论】:
以上是关于如何拆分文本以匹配双引号加上尾随文本到点?的主要内容,如果未能解决你的问题,请参考以下文章
在excel函数公式中,啥时候要用双引号、单引号,啥时候要用连接符&?