PHP实现关键词全文搜索Sphinx及中文分词Coreseek的安装配置

Posted Genghai_Y

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PHP实现关键词全文搜索Sphinx及中文分词Coreseek的安装配置相关的知识,希望对你有一定的参考价值。

一、需求

  实现文章标题中或分类(甚至文章内容)包含搜索词的文章,按照搜索词出现的频率的权重展示。

二、环境

  nginx+php+mysql(系统Centos7)。

三、安装

  1.安装依赖

yum  -y  install  make  gcc  gcc-c++  libtool  autoconf  automake  imake  mariadb  mariadb-server  mariadb-devel libxml2-devel expat-devel

  2.下载软件包

git clone https://github.com/wanqianworld/coreseek4.1.git
cd coreseek4.1 #下载完成后进入目录

  3.解压coreseek

tar  -xzf  coreseek-4.1-beta.tar.gz

  4.安装mmseg

cd  coreseek-4.1-beta/mmseg-3.2.14
./bootstrap
./configure --prefix=/usr/local/mmseg3
make && make install

  5.安装coreseek

  5.1.修改配置

cd  ../csft-4.1
vim configure.ac
AM_INIT_AUTOMAKE([-Wall -Werror foreign])
修改为
AM_INIT_AUTOMAKE([-Wall foreign])

  5.2.下载软件

yum  -y  install  patch

  5.3.打补丁

patch  -p1  <  /yourpath/sphinx/sphinxexpr.cpp-csft-4.1-beta.patch
输入:
/yourpath/sphinx/coreseek-4.1-beta/csft-4.1/src/sphinxexpr.cpp

  5.4.安装

sh  buildconf.sh
./configure  --prefix=/usr/local/coreseek  --without-unixodbc  --with-mmseg  --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/  --with-mmseg-libs=/usr/local/mmseg3/lib/  --with-mysql
make  &&  make install

  6.1.测试中文分词

cd  ../testpack/
/usr/local/mmseg3/bin/mmseg  -d  /usr/local/mmseg3/etc var/test/test.xml #测试中文分词

 

   6.2.创建索引

/usr/local/coreseek/bin/indexer  -c  etc/csft.conf  --all

 

   6.3.搜索测试

/usr/local/coreseek/bin/search  -c  etc/csft.conf  李彦宏

 

  7.php连接sphinx

cd  ../csft-4.1/api/libsphinxclient/ #进入目录

aclocal
libtoolize --force
automake --add-missing
autoconf
autoheader
make clean
./configure  --prefix=/usr/local/sphinxclient
make  &&  make install  #编译

cd  ../../../../ #回到软件包目录
tar  -xzf  sphinx-1.3.0.tgz  #解压
yum  -y  install  php  php-devel #安装php-devel

cd  sphinx-1.3.0  #安装
phpize
./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/sphinxclient
make && make install

  7.1.开启php-sphinx扩展

vim /etc/php.ini
在末尾加上:
[sphinx]
extension=sphinx.so

  8.测试

  8.1.添加测试数据

mysql  -uroot  -p123456  <  /usr/local/coreseek/etc/example.sql

  8.2复制配置文件

cp  /usr/local/coreseek/etc/sphinx.conf.dist  /usr/local/coreseek/etc/csft.conf
cp  /home/lee/sphinx/coreseek-4.1-beta/mmseg-3.2.14/data/*  /usr/local/mmseg3/etc/

  8.3.修改配置文件

vim /usr/local/coreseek/etc/csft.conf
source src1
{
type            = mysql
sql_host        = 127.0.0.1
sql_user        = root
sql_pass        = 123456
sql_db          = test
sql_port        = 3306  # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_sock = /var/lib/mysql/mysql.sock
sql_query       = \\
    SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \\
    FROM documents
sql_attr_uint       = group_id
sql_attr_timestamp  = date_added
sql_ranged_throttle = 0
sql_query_info_pre = SET NAMES utf8
sql_query_info      = SELECT * FROM documents WHERE id=$id
}
source src1throttled : src1
{
sql_ranged_throttle = 100
}
index test1
{
source          = src1
path            = /usr/local/coreseek/var/data/test1
docinfo         = extern
mlock           = 0
morphology      = none
min_word_len        = 1
html_strip      = 0
charset_dictpath = /usr/local/mmseg3/etc/
charset_type        = zh_cn.utf-8
}
indexer
{
mem_limit       = 128M
}
searchd
{
listen          = 9312
listen          = 9306:mysql41
log         = /usr/local/coreseek/var/log/searchd.log
query_log       = /usr/local/coreseek/var/log/query.log
read_timeout        = 5
client_timeout      = 300
max_children        = 30
pid_file        = /usr/local/coreseek/var/log/searchd.pid
max_matches     = 1000
seamless_rotate     = 1
preopen_indexes     = 1
unlink_old      = 1
mva_updates_pool    = 1M
max_packet_size     = 8M
max_filters     = 256
max_filter_values   = 4096
max_batch_queries   = 32
workers         = threads # for RT to work
}

  8.4.复制二进制文件

cp  /usr/local/coreseek/bin/*  /usr/bin/

  8.5生成索引

indexer  --rotate  --all

  8.6. 启动服务

searchd

  8.7.停止服务

searchd  --stop

  9.测试

  编写测试脚本:

vim test.php
<?php
$sphinx = new SphinxClient();
$sphinx->SetServer("127.0.0.1",9312);
$sphinx->SetMatchMode(SPH_MATCH_ALL);
$sphinx->SetLimits(0, 20, 1000);
$sphinx->SetArrayResult(true);
$result = $sphinx -> query("one","test1");
var_dump($result);

  运行脚本:

php test.php

 

以上是关于PHP实现关键词全文搜索Sphinx及中文分词Coreseek的安装配置的主要内容,如果未能解决你的问题,请参考以下文章

Sphinx + Coreseek 实现中文分词搜索

Sphinx + Coreseek 实现中文分词搜索

Sphinx + Coreseek 实现中文分词搜索

Sphinx + Coreseek 实现中文分词搜索

php+中文分词scws+sphinx+mysql打造千万级数据全文搜索

后端中级php + MongoDB + Sphinx 实现全文检索