生成 SEO 友好的 URL（slug）[关闭]

Posted 2023-02-24

技术标签:

【中文标题】生成 SEO 友好的 URL（slug）[关闭]【英文标题】：Generate SEO friendly URLs (slugs) [closed] 【发布时间】：2011-07-15 10:06:31 【问题描述】：

定义

来自Wikipedia：

slug 是 URL 的一部分，它使用人类可读的关键字。

为了方便用户键入 URL，通常会使用特殊字符删除或更换为好。例如，重音字符是通常用英文字母代替；标点标记通常被去除；和空格（必须编码为 %20 或 +) 被破折号 (-) 或下划线 (_) 替换，它们是更美观。

上下文

我开发了一个照片分享网站，用户可以在上面上传、分享和查看照片。

所有页面都是自动生成的，无需我掌握标题。因为照片的标题或用户名可能包含重音字符或空格，所以我需要一个函数来自动创建 slug 并保持可读的 URL。

我创建了以下函数，它替换重音字符 (âèêëçî)、删除标点符号和坏字符 (#@&~^!) 并转换破折号中的空格。

问题

你觉得这个功能怎么样？您知道创建 slug 的其他函数吗？

代码

php:

function sluggable($str) 

    $before = array(
        'àáâãäåòóôõöøèéêëðçìíîïùúûüñšž',
        '/[^a-z0-9\s]/',
        array('/\s/', '/--+/', '/---+/')
    );
 
    $after = array(
        'aaaaaaooooooeeeeeciiiiuuuunsz',
        '',
        '-'
    );

    $str = strtolower($str);
    $str = strtr($str, $before[0], $after[0]);
    $str = preg_replace($before[1], $after[1], $str);
    $str = trim($str);
    $str = preg_replace($before[2], $after[2], $str);
 
    return $str;

【问题讨论】：

法国人喜欢蜗牛a' la escargot 喜欢使用已经完成的代码：code.google.com/p/php-slugs ? 您可能想在此处删除此问题并在codereview.stackexchange.com 上重新发布，因为那里的反馈和改进更主题化。 @maniator: wiki: Slug 法语中没有 áâãäåòóõöøðìíñšž。（瑞典、捷克等，但不是法语。） 【参考方案1】：

我喜欢谷歌代码解决方案中的 php-slugs 代码。但如果你想要一个更简单的支持 UTF-8 的：

function format_uri( $string, $separator = '-' )

    $accents_regex = '~&([a-z]1,2)(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
    $special_cases = array( '&' => 'and', "'" => '');
    $string = mb_strtolower( trim( $string ), 'UTF-8' );
    $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
    $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
    $string = preg_replace("/[^a-z0-9]/u", "$separator", $string);
    $string = preg_replace("/[$separator]+/u", "$separator", $string);
    return $string;

所以

echo format_uri("#@&~^!âèêëçî");

输出

-and-aeeeci

【讨论】：

Here's 被转换为here-039-s。更好的选择是简单地删除撇号。【参考方案2】：

有些人已经链接到 google.com 上的“php-slugs”，但现在他们的页面看起来有点乱，所以如果有人需要，这里是：

// source: https://code.google.com/archive/p/php-slugs/

function my_str_split($string)

    $slen=strlen($string);
    for($i=0; $i<$slen; $i++)
    
        $sArray[$i]=$string$i;
    
    return $sArray;


function noDiacritics($string)

    //cyrylic transcription
    $cyrylicFrom = array('А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ё', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ё', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я');
    $cyrylicTo   = array('A', 'B', 'W', 'G', 'D', 'Ie', 'Io', 'Z', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'Ch', 'C', 'Tch', 'Sh', 'Shtch', '', 'Y', '', 'E', 'Iu', 'Ia', 'a', 'b', 'w', 'g', 'd', 'ie', 'io', 'z', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'ch', 'c', 'tch', 'sh', 'shtch', '', 'y', '', 'e', 'iu', 'ia'); 


    $from = array("Á", "À", "Â", "Ä", "Ă", "Ā", "Ã", "Å", "Ą", "Æ", "Ć", "Ċ", "Ĉ", "Č", "Ç", "Ď", "Đ", "Ð", "É", "È", "Ė", "Ê", "Ë", "Ě", "Ē", "Ę", "Ə", "Ġ", "Ĝ", "Ğ", "Ģ", "á", "à", "â", "ä", "ă", "ā", "ã", "å", "ą", "æ", "ć", "ċ", "ĉ", "č", "ç", "ď", "đ", "ð", "é", "è", "ė", "ê", "ë", "ě", "ē", "ę", "ə", "ġ", "ĝ", "ğ", "ģ", "Ĥ", "Ħ", "I", "Í", "Ì", "İ", "Î", "Ï", "Ī", "Į", "Ĳ", "Ĵ", "Ķ", "Ļ", "Ł", "Ń", "Ň", "Ñ", "Ņ", "Ó", "Ò", "Ô", "Ö", "Õ", "Ő", "Ø", "Ơ", "Œ", "ĥ", "ħ", "ı", "í", "ì", "i", "î", "ï", "ī", "į", "ĳ", "ĵ", "ķ", "ļ", "ł", "ń", "ň", "ñ", "ņ", "ó", "ò", "ô", "ö", "õ", "ő", "ø", "ơ", "œ", "Ŕ", "Ř", "Ś", "Ŝ", "Š", "Ş", "Ť", "Ţ", "Þ", "Ú", "Ù", "Û", "Ü", "Ŭ", "Ū", "Ů", "Ų", "Ű", "Ư", "Ŵ", "Ý", "Ŷ", "Ÿ", "Ź", "Ż", "Ž", "ŕ", "ř", "ś", "ŝ", "š", "ş", "ß", "ť", "ţ", "þ", "ú", "ù", "û", "ü", "ŭ", "ū", "ů", "ų", "ű", "ư", "ŵ", "ý", "ŷ", "ÿ", "ź", "ż", "ž");
    $to   = array("A", "A", "A", "AE", "A", "A", "A", "A", "A", "AE", "C", "C", "C", "C", "C", "D", "D", "D", "E", "E", "E", "E", "E", "E", "E", "E", "G", "G", "G", "G", "G", "a", "a", "a", "ae", "ae", "a", "a", "a", "a", "ae", "c", "c", "c", "c", "c", "d", "d", "d", "e", "e", "e", "e", "e", "e", "e", "e", "g", "g", "g", "g", "g", "H", "H", "I", "I", "I", "I", "I", "I", "I", "I", "IJ", "J", "K", "L", "L", "N", "N", "N", "N", "O", "O", "O", "OE", "O", "O", "O", "O", "CE", "h", "h", "i", "i", "i", "i", "i", "i", "i", "i", "ij", "j", "k", "l", "l", "n", "n", "n", "n", "o", "o", "o", "oe", "o", "o", "o", "o", "o", "R", "R", "S", "S", "S", "S", "T", "T", "T", "U", "U", "U", "UE", "U", "U", "U", "U", "U", "U", "W", "Y", "Y", "Y", "Z", "Z", "Z", "r", "r", "s", "s", "s", "s", "ss", "t", "t", "b", "u", "u", "u", "ue", "u", "u", "u", "u", "u", "u", "w", "y", "y", "y", "z", "z", "z");


    $from = array_merge($from, $cyrylicFrom);
    $to   = array_merge($to, $cyrylicTo);

    $newstring=str_replace($from, $to, $string);
    return $newstring;


function makeSlugs($string, $maxlen=0)

    $newStringTab=array();
    $string=strtolower(noDiacritics($string));
    if(function_exists('str_split'))
    
        $stringTab=str_split($string);
    
    else
    
        $stringTab=my_str_split($string);
    

    $numbers=array("0","1","2","3","4","5","6","7","8","9","-");
    //$numbers=array("0","1","2","3","4","5","6","7","8","9");

    foreach($stringTab as $letter)
    
        if(in_array($letter, range("a", "z")) || in_array($letter, $numbers))
        
            $newStringTab[]=$letter;
        
        elseif($letter==" ")
        
            $newStringTab[]="-";
        
    

    if(count($newStringTab))
    
        $newString=implode($newStringTab);
        if($maxlen>0)
        
            $newString=substr($newString, 0, $maxlen);
        

        $newString = removeDuplicates('--', '-', $newString);
    
    else
    
        $newString='';
    

    return $newString;



function checkSlug($sSlug)

    if(preg_match("/^[a-zA-Z0-9]+[a-zA-Z0-9\-]*$/", $sSlug) == 1)
    
        return true;
    

    return false;


function removeDuplicates($sSearch, $sReplace, $sSubject)

    $i=0;
    do

        $sSubject=str_replace($sSearch, $sReplace, $sSubject);
        $pos=strpos($sSubject, $sSearch);

        $i++;
        if($i>100)
        
            die('removeDuplicates() loop error');
        

    while($pos!==false);

    return $sSubject;

【讨论】：

与其提供大量可怕且不完整的替换列表，不如规范化字符串，然后删除非 ascii 字符 @BlueRaja-DannyPflughoeft 由于这是 Google 的原始代码，我不打算对其进行编辑。我鼓励您通过改进此代码添加另一个答案。我编辑了德语变音符号的匹配。我认为 Ä 应该是 AE、Ü UE 等等。 @SirDerpington 我想知道这个答案是否应该是可编辑的，因为它实际上是code.google.com/archive/p/php-slugs的复制粘贴 @rybo111 是的，我知道你的意思。我认为应该是因为 - 我不知道为什么 - $to 和 $from 数组中的某些字符丢失了。它只是说“？”而不是实际的字符。【参考方案3】：

    setlocale(LC_ALL, 'en_US.UTF8');

        function slugify($text)
        
          // replace non letter or digits by -
          $text = preg_replace('~[^\\pL\d]+~u', '-', $text);

          // trim
          $text = trim($text, '-');

          // transliterate
          $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

          // lowercase
          $text = strtolower($text);

          // remove unwanted characters
          $text = preg_replace('~[^-\w]+~', '', $text);

          if (empty($text))
          
            return 'n-a';
          

          return $text;
        


$slug = slugify($var);

【讨论】：

【参考方案4】：

我在网上找到了这个，完全按照你的意愿做，但保留了情况。

function sluggable($p) 
    $ts = array("/[À-Å]/","/Æ/","/Ç/","/[È-Ë]/","/[Ì-Ï]/","/Ð/","/Ñ/","/[Ò-ÖØ]/","/×/","/[Ù-Ü]/","/[Ý-ß]/","/[à-å]/","/æ/","/ç/","/[è-ë]/","/[ì-ï]/","/ð/","/ñ/","/[ò-öø]/","/÷/","/[ù-ü]/","/[ý-ÿ]/");
    $tn = array("A","AE","C","E","I","D","N","O","X","U","Y","a","ae","c","e","i","d","n","o","x","u","y");
    return preg_replace($ts,$tn, $p);

source

【讨论】：

这不是很健壮，因为它只能处理列出的字符。西里尔文呢？希伯来语？其他晦涩的非 ASCII 符号，例如 ²、º、‘ 等？但是 preg_replace() 比 strtr() 慢。【参考方案5】：

这真的很好用。返回正确的干净 url slug。

$string = '(1234) S*m@#ith S)&+*t `Exam)ple?>land   - - 1!_2)#3)(*4""5';

// remove all non alphanumeric characters except spaces
$clean =  preg_replace('/[^a-zA-Z0-9\s]/', '', strtolower($string)); 

// replace one or multiple spaces into single dash (-)
$clean =  preg_replace('!\s+!', '-', $clean); 

echo $clean; // 1234-smith-st-exampleland-12345

【讨论】：

这段代码会导致消除所有不在正则表达式中的字符，它就像一个白名单解决方案。但要小心，因为大多数国际程序员都需要一种将“cafe”转换为“cafe”而不是像这段代码那样转换为“caf”的解决方案。【参考方案6】：

function seourl($phrase, $maxLength = 100000000000000) 
        $result = strtolower($phrase);

        $result = preg_replace("~[^A-Za-z0-9-\s]~", "", $result);
        $result = trim(preg_replace("~[\s-]+~", " ", $result));
        $result = trim(substr($result, 0, $maxLength));
        $result = preg_replace("~\s~", "-", $result);

        return $result;

【讨论】：

【参考方案7】：

function remove_accents($string)

    $a = 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûýýþÿŔŕ';
    $b = 'aaaaaaaceeeeiiiidnoooooouuuuybsaaaaaaaceeeeiiiidnoooooouuuyybyRr';
    $string = strtr(utf8_decode($string), utf8_decode($a), $b);
    return utf8_encode($string);


function format_slug($title)

    $title = remove_accents($title);
    $title = trim(strtolower($title));
    $title = preg_replace('#[^a-z0-9\\-/]#i', '_', $title);
    return trim(preg_replace('/-+/', '-', $title), '-/');

使用：回显 format_slug($var);

【讨论】：

【参考方案8】：

这是我们使用的类，虽然它可以执行单独的操作，但它也能够将字符串（或路径）转换为 slug 版本（只有 a-z、0-9 和 - 在最终版本中输出）。它还做了一些额外的事情，例如将 & 符号 (&) 转换为单词 and。

用法：

echo (new Str('My Cover Letter & Résumé'))->slugify()->__toString();

我的求职信和简历

Str类：

<?php

use RuntimeException;
use Transliterator;

class Str

    /**
     * Will hold an instance of Transliterator
     * for removing accents from characters.
     * Same instance for all instances of this class is fine.
     */
    private static $accent_transliterator;
    private $string;

    public function __construct(string $string)
    
        $this->string = $string;
    

    public function __toString()
    
        return $this->string;
    

    public function cleanForUrlPath(): self
    
        $path = '';

        // Loop through path sections (separated by `/`)
        // and slugify each section.
        foreach (explode('/', $this->string) as $section) 
            $section = (new static($section))->slugify()->__toString();
            if ($section !== '') 
                $path .= "/$section";
            
        

        // Save the cleaned path
        $this->string = "$path/";

        return $this;
    

    public function cleanUpSlugDashes(): self
    
        // Remove extra dashes
        $this->string = preg_replace('/--+/', '-', $this->string);

        // Remove leading and trailing dashes
        $this->string = trim($this->string, '-');

        return $this;
    

    /**
     * Replace symbols with word replacements.
     * Eg, `&` becomes ` and `.
     */
    public function convertSymbolsToWords(): self
    
        $this->string = strtr($this->string, [
            '@' => ' at ',
            '%' => ' percent ',
            '&' => ' and ',
        ]);

        return $this;
    

    public static function getSpacerCharacters(
        array $with = [],
        array $without = []
    ): array 
        return array_unique(array_diff(array_merge([
            ' ', // space
            '…', // ellipsis
            '–', // en dash
            '—', // em dash
            '/', // slash
            '\\', // backslash
            ':', // colon
            ';', // semi-colon
            '.', // period
            '+', // plus sign
            '#', // pound sign
            '~', // tilde
            '_', // underscore
            '|', // pipe
        ], array_values($with)), array_values($without)));
    

    public function lower(): self
    
        $this->string = strtolower($this->string);

        return $this;
    

    /**
     * Replaces all accented characters
     * with similar ASCII characters.
     */
    public function removeAccents(): self
    
        // If no accented characters are found,
        // return the given string as-is.
        if (!preg_match('/[\x80-\xff]/', $this->string)) 
            return $this;
        

        // Instantiate Transliterator if we haven't already
        if (!isset(self::$accent_transliterator)) 
            self::$accent_transliterator = Transliterator::create(
                'Any-Latin; Latin-ASCII;'
            );

            if (self::$accent_transliterator === null) 
                // @codeCoverageIgnoreStart
                throw new RuntimeException(
                    'Could not create a transliterator'
                );
                // @codeCoverageIgnoreEnd
            
        

        // Save transliterated string
        $this->string = (self::$accent_transliterator)->transliterate(
            $this->string
        );

        return $this;
    

    public function replace($search, $replace)
    
        $this->string = str_replace($search, $replace, $this->string);

        return $this;
    

    public function replaceRegex($pattern, $replacement): self
    
        $this->string = preg_replace($pattern, $replacement, $this->string);

        return $this;
    

    /**
     * @param int $length number of bytes to shorten the string to
     */
    public function shorten(int $length): self
    
        // If the string is already `$length` or shorter,
        // return it as-is.
        if (strlen($this->string) <= $length) 
            return $this;
        

        // Shorten by 2 additional characters
        // to account for the three periods that are appended.
        // Only need to shorten by 2
        // as there's always at least one character (space) removed
        // when the last word is popped off of the array.
        $length -= 2;

        // Shorten the string to `$length` and split into words
        $words = explode(' ', substr($this->string, 0, $length));

        // Discard the last word as it's a partial word,
        // or empty if the last character happened to be a space.
        // If there's only one word,
        // then it was longer than `$length`
        // and the truncated version should be returned.
        if (count($words) > 1) 
            array_pop($words);
        

        // Save the shortened string with "..." appended
        $this->string = rtrim(implode(' ', $words), ':').'...';

        return $this;
    

    public function slugify(): self
    
        // If the string is already a slug
        if (preg_match('/^[a-z0-9\\-]+$/', $this->string)) 
            return $this;
        

        // - Normalize accents
        // - Normalize symbols
        // - Lowercase
        // - Replace space characters with dashes
        // - Remove non-slug characters
        // - Clean up leading, trailing, and consecutive dashes
        return $this
            ->removeAccents()
            ->convertSymbolsToWords()
            ->lower()
            ->spacersToDashes()
            ->replaceRegex('/([^a-z0-9\\-]+)/', '')
            ->cleanUpSlugDashes();
    

    public function spacersToDashes(): self
    
        return $this->replace(static::getSpacerCharacters(), '-');

【讨论】：

@NorbertBoros 我发布这篇文章已经 7 年多了，虽然大部分内容保持不变（一些清理并将其放入一个独立的类中），但最大的变化是remove_accents() 已被完全重写以利用 PHP's Transliterator class。保留第一个if 语句，然后函数的其余部分可以替换为$transliterator = Transliterator::create('Any-Latin; Latin-ASCII;'); return $transliterator->transliterate($string);。我也会尝试更新答案。我实际上将$transliterator 保存到班级以避免每次都重建它。 @NorbertBoros 回答更新如果你想要清理版本。乍一看，我认为它适用于 PHP 7.0+。谢谢！我会尽快测试它。

以上是关于生成 SEO 友好的 URL（slug）[关闭]的主要内容，如果未能解决你的问题，请参考以下文章