使用正则表达式匹配 PHP 中的 URL 模式

Posted 2023-02-24

技术标签:

【中文标题】使用正则表达式匹配 PHP 中的 URL 模式【英文标题】：Match URL pattern in PHP using a regular expression 【发布时间】：2011-04-23 16:47:46 【问题描述】：

我想匹配墙帖中的 URL 链接并将此链接替换为锚标记。为此，我使用下面的正则表达式。

我要匹配四种类型的网址：

http://example.com

https://example.com

www.example.com

example.com

preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@',
             '<a href="$1">$1</a>', $subject);

此表达式仅匹配前两种类型的 URL。

如果我使用这个表达式来匹配一个 URL 模式， '@(www?([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@'，只匹配第三种URL模式。

如何用一个正则表达式匹配所有四种类型的 URL 模式？

【问题讨论】：

【参考方案1】：

使用Nev Stokes' given link的完整工作示例：

public function clickableUrls($html)
    return $result = preg_replace(
        '%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s',
        '<a href="$1">$1</a>',
        $html
    );

【讨论】：

我的天哪，终于这个工作了......我一直在尝试人们发布的各种方法，要么语法有问题，要么部分工作（我需要解决的是网址末尾有句点被提取，例如 t.co/123213...）完美运行。【参考方案2】：

老实说，我会使用不同的正则表达式。喜欢2009年Gruber posted的这个：

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

或者 2010 年的 updated version that Gruber posted（感谢 @IMSoP）：

(?i)\b((?:[a-z][\w-]+:(?:/1,3|[a-z0-9%])|www\d0,3[.]|[a-z0-9.\-]+[.][a-z]2,4/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\];:'".,<>?«»“”‘’]))

【讨论】：

请注意，这里有该正则表达式的更新版本：daringfireball.net/2010/07/improved_regex_for_matching_urls 用php实现：http://***.com/a/10002262/1055533【参考方案3】：

我环顾四周，没有看到任何我需要的东西。我发现this one很接近，所以我修改如下：

^((([hH][tT][tT][pP][sS]?)\:\/\/)?([\w\\-]+(\[\w\.\&%\$\-]+)*)?((([^\s\(\)\<\>\\\"\.\   [\]\,;:]+)(\.[^\s\(\)\<\>\\\"\.\[\]\,;:]+)*(\.[a-zA-Z]2,4))|((([01]?\d1,2|2[0-4]\d|25[0-5])\.)3([01]?\d1,2|2[0-4]\d|25[0-5])))(\b\:(6553[0-5]|655[0-2]\d|65[0-4]\d2|6[0-4]\d3|[1-5]\d4|[1-9]\d0,3|0)\b)?((\/[^\/][\w\.\,\?\'\\\/\+&%\$#\=~_\-]*)*[^\.\,\?\"\'\(\)\[\]!;<>\s\x7F-\xFF])?)$

查看debuggex。

【讨论】：

【参考方案4】：

用途：

preg_match("/^((https|http|ftp)\:\/\/)?([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]2,4|[a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]2,4|[a-z0-9A-Z]+\.[a-zA-Z]2,4)$/i", $url)

【讨论】：

【参考方案5】：

我刚刚查看了这篇文章（两年后）。可能你已经得到了答案，但是对于初学者来说，你可以使用正则表达式来去除所有类型的 URL 或查询字符串

(https|http|ftp)\:\/\/|([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]2,4)|([a-z0-9A-Z]+\.[a-zA-Z]2,4)|\?([a-zA-Z0-9]+[\&\=\#a-z]+)

它将去除所有类型的 URL。看看下面的列表。对于那些想问“是否会剥离 .us、.in、.pk 等？类型的域？”的人，我使用了不同类型的域。

ftp://www.web.com web.net www.website.info website.us web.ws?query=true www.web.biz?query=true ftp://web.in?query=true media.google.com ns.google.pk ww1.smart.au www3.smart.br w1.smart.so ?ques==two&t=p http://website.info?ques==two&t=p https://www.weborwebsite.com

工作示例（在 PHP5+、Apache2+ 中测试）：

$str = "ftp://www.web.com, web.net, www.website.info, website.us, web.ws?query=true, www.web.biz?query=true, ftp://web.in?query=true, media.google.com hello world, working more with ns ns.google.pk or ww1.smart.au and www3.smart.br w1.smart.so ?ques==two&t=p http://website.info?ques==two&t=p https://www.weborwebsite.com and ftp://www.hotmail.br";
echo preg_replace("/(https|http|ftp)\:\/\/|([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]2,4)|([a-z0-9A-Z]+\.[a-zA-Z]2,4)|\?([a-zA-Z0-9]+[\&\=\#a-z]+)/i", "", $str);

它会返回

, , , , , , , hello world, working more with ns or and and

【讨论】：

【参考方案6】：

这对我很有用 - 包括 mailto check：

function LinkIt($text)

    $t = preg_replace("/(\b(?:(?:http(s)?|ftp):\/\/|(www\.)))([-a-züöäß0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])/im", '<a target="_blank" href="http$2://$3$4" class="external-link" title="External Link">$1$4</a>', $text);
    return preg_replace("/([\w+\.\-]+@[\w+\-]+\.[a-zA-Z]2,4)/im", strtolower('<a href="mailto:$1" class="mail" title="E-Mail">$1</a>'), $t);

【讨论】：

【参考方案7】：

如果你想让那一项工作，你需要将“https?//”部分设为可选。由于您似乎对正则表达式有相当好的掌握，所以我不会向您展示。这是给读者的练习:)

但我一般同意with Nev。它的作用过于复杂。

【讨论】：

【参考方案8】：

使用这种模式：

$regex = "(https?\:\/\/|ftp\:\/\/|www\.|[a-z0-9-]+)+([a-z0-9-]+)\.+([a-z]2,4)((\/|\.)+([a-z0-9-_.\/]*)$|$)";

【讨论】：

以上是关于使用正则表达式匹配 PHP 中的 URL 模式的主要内容，如果未能解决你的问题，请参考以下文章