php 字符串(包含)高亮, 将包含的英文单词以空格区分 汉子以每个汉子区分, 替换文字不变 只是颜色变红

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了php 字符串(包含)高亮, 将包含的英文单词以空格区分 汉子以每个汉子区分, 替换文字不变 只是颜色变红相关的知识,希望对你有一定的参考价值。

比如:hello word 将整个字符串中包含hello和word 不管连着还是分开的全部高亮, 汉子 以每一个 进行匹配,
比如:世界你好 将整个字符串中包含 世 / 界 你 / 好 不管连着还是分开的全部高亮

明白了吧, 希望大神指教, 再加分

参考技术A 这个应用主要是用到了递归函数,中英文相同改变参数就行方法如下:
function changes($str,$arr)
foreach($arr as $val)
$str=str_replace($val,"<font color='red'>".$val."</font>",$str);
@changes($str,$val);


return $str;

$str = "hello word and so hello good word yes ones";
$arr = array("hello","word");
echo changes($str,$arr);追问

谢谢你啊, 我怎么就没想到用递归那。 折腾我好久,

追答

不谢哈,有时候没想到很正常的

通过python将一个大字符串拆分为包含'n'个单词的多个子字符串

【中文标题】通过python将一个大字符串拆分为包含\'n\'个单词的多个子字符串【英文标题】:Split a large string into multiple substrings containing 'n' number of words via python通过python将一个大字符串拆分为包含'n'个单词的多个子字符串 【发布时间】:2010-12-30 05:11:43 【问题描述】:

原文:United States Declaration of Independence

如何将上述源文本拆分为多个子字符串,包含“n”个单词?

我使用 split(' ') 来提取每个单词,但是我不知道如何在一次操作中使用多个单词来执行此操作。

我可以遍历我拥有的单词列表,并通过将第一个列表中的单词粘合在一起来创建另一个列表(同时添加空格)。但是我的方法不是很pythonic。

【问题讨论】:

【参考方案1】:

对于大字符串,建议使用迭代器以提高速度和低内存占用。

import re, itertools

# Original text
text = "When in the course of human Events, it becomes necessary for one People to dissolve the Political Bands which have connected them with another, and to assume among the Powers of the Earth, the separate and equal Station to which the Laws of Nature and of Nature?s God entitle them, a decent Respect to the Opinions of Mankind requires that they should declare the causes which impel them to the Separation."
n = 10

# An iterator which will extract words one by one from text when needed
words = itertools.imap(lambda m:m.group(), re.finditer(r'\w+', text))
# The final iterator that combines words into n-length groups
word_groups = itertools.izip_longest(*(words,)*n)

for g in word_groups: print g

会得到如下结果:

('When', 'in', 'the', 'course', 'of', 'human', 'Events', 'it', 'becomes', 'necessary')
('for', 'one', 'People', 'to', 'dissolve', 'the', 'Political', 'Bands', 'which', 'have')
('connected', 'them', 'with', 'another', 'and', 'to', 'assume', 'among', 'the', 'Powers')
('of', 'the', 'Earth', 'the', 'separate', 'and', 'equal', 'Station', 'to', 'which')
('the', 'Laws', 'of', 'Nature', 'and', 'of', 'Nature', 's', 'God', 'entitle')
('them', 'a', 'decent', 'Respect', 'to', 'the', 'Opinions', 'of', 'Mankind', 'requires')
('that', 'they', 'should', 'declare', 'the', 'causes', 'which', 'impel', 'them', 'to')
('the', 'Separation', None, None, None, None, None, None, None, None)

【讨论】:

那我把每组元组中的单词用空格粘在一起? 是的,只需使用 print ' '.join(g) 而不是 print g【参考方案2】:
text = """
When in the course of human Events, it becomes necessary for one People to dissolve the Political Bands which have connected them with another, and to assume among the Powers of the Earth, the separate and equal Station to which the Laws of Nature and of Nature?s God entitle them, a decent Respect to the Opinions of Mankind requires that they should declare the causes which impel them to the Separation.

We hold these Truths to be self-evident, that all Men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty, and the pursuit of Happiness?-That to secure these Rights, Governments are instituted among Men, deriving their just Powers from the Consent of the Governed, that whenever any Form of Government becomes destructive of these Ends, it is the Right of the People to alter or abolish it, and to institute a new Government, laying its Foundation on such Principles, and organizing its Powers in such Form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient Causes; and accordingly all Experience hath shewn, that Mankind are more disposed to suffer, while Evils are sufferable, than to right themselves by abolishing the Forms to which they are accustomed. But when a long Train of Abuses and Usurpations, pursuing invariably the same Object, evinces a Design to reduce them under absolute Despotism, it is their Right, it is their Duty, to throw off such Government, and to provide new Guards for their future Security. Such has been the patient Sufferance of these Colonies; and such is now the Necessity which constrains them to alter their former Systems of Government. The History of the Present King of Great-Britain is a History of repeated Injuries and Usurpations, all having in direct Object the Establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid World.
"""

words = text.split()
subs = []
n = 4
for i in range(0, len(words), n):
    subs.append(" ".join(words[i:i+n]))
print subs[:10]

打印:

['When in the course', 'of human Events, it', 'becomes necessary for one', 'People to dissolve the', 'Political Bands which have', 'connected them with another,', 'and to assume among', 'the Powers of the', 'Earth, the separate and', 'equal Station to which']

或者,作为列表理解:

subs = [" ".join(words[i:i+n]) for i in range(0, len(words), n)]

【讨论】:

这看起来很蟒蛇。 哦。大多数 ngram 应用程序都需要 ['When in the course', 'in the course of', 'the course of human'] 等。【参考方案3】:

您正在尝试创建 n-gram?我是这样做的,使用NLTK。

punct = re.compile(r'^[^A-Za-z0-9]+|[^a-zA-Z0-9]+$')
is_word=re.compile(r'[a-z]', re.IGNORECASE)
sentence_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
word_tokenizer=nltk.tokenize.punkt.PunktWordTokenizer()

def get_words(sentence):
    return [punct.sub('',word) for word in word_tokenizer.tokenize(sentence) if is_word.search(word)]

def ngrams(text, n):
    for sentence in sentence_tokenizer.tokenize(text.lower()):
        words = get_words(sentence)
        for i in range(len(words)-(n-1)):
            yield(' '.join(words[i:i+n]))

然后

for ngram in ngrams(sometext, 3):
    print ngram

【讨论】:

有趣的链接!将来肯定会考虑使用该工具包。

以上是关于php 字符串(包含)高亮, 将包含的英文单词以空格区分 汉子以每个汉子区分, 替换文字不变 只是颜色变红的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 PHP 检查一个单词是不是包含在另一个字符串中?

springboot集成elasticsearch全文搜索高亮显示实践

springboot集成elasticsearch全文搜索高亮显示实践

springboot集成elasticsearch全文搜索高亮显示实践

JavaScript/jQuery - 如何检查字符串是不是包含特定单词

使用 PHP 为大型文本数据和文件加速算法