sphinx配置增量索引和索引合并

Posted Mr.Coder by LEE

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sphinx配置增量索引和索引合并相关的知识,希望对你有一定的参考价值。

 

配置增量索引

1,配置csft.conf文件。

  其中base为父类,scr1和tmp_src1都是他的子类,相应配置如下。

searchd{
        listen = 9312
        listen = 9306:mysql41
        read_timeout =5
        max_children = 30
        max_matches = 1000
        seamless_rotate = 0
        preopen_indexes = 0
        unlink_old = 1
        pid_file = /usr/local/coreseek/var/log/searchd.pid
        log = /usr/local/coreseek/var/log/searchd.log
        query_log = /usr/local/coreseek/var/log/query.log
        binlog_path = 
}
#全局配置
source base
{
        type         = mysql
        sql_host     = 127.0.0.1
        sql_user     = root
        sql_pass    = 
        sql_db         = test
        sql_port     = 3306
        sql_query_pre = SET NAMES utf8
        sql_query        =
}

source src1: base
{
        sql_query            = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents
        sql_attr_uint        = id
        sql_attr_uint        = group_id
        sql_attr_timestamp     = date_added
        sql_field_string    = title
        sql_field_string    = content
        sql_query_info_pre    = SET NAMES utf8

        
}

index src1{
        source = src1
        path = /usr/local/coreseek/var/data/test1
        docinfo = extern
        mlock =0
        morphology = none
        min_word_len =1
        html_strip =0
        #index_sp          = 1
        charset_type = zh_cn.utf-8
        charset_dictpath = /usr/local/mmseg3/etc/
}

source tmp_src1 : base
{
        sql_query            = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
        sql_attr_uint        = id
        sql_attr_uint        = group_id
        sql_attr_timestamp     = date_added
        sql_field_string    = title
        sql_field_string    = content  
}

index tmp_src1{
        source = tmp_src1        
        path = /usr/local/coreseek/var/data/tmp_src1        
        docinfo = extern
        mlock =0
        morphology = none
        min_word_len =1
        html_strip =0
        charset_type = zh_cn.utf-8
        charset_dictpath = /usr/local/mmseg3/etc/
}

 

 

对应的sql文件为:

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for documents
-- ----------------------------
DROP TABLE IF EXISTS `documents`;
CREATE TABLE `documents` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `group_id` int(11) NOT NULL,
  `date_added` datetime NOT NULL,
  `title` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `content` text NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4;

-- ----------------------------
-- Records of documents
-- ----------------------------
INSERT INTO `documents` VALUES (\'1\', \'1\', \'2017-06-06 21:45:58\', \'中国\', \'中国真美,地大物博\');
INSERT INTO `documents` VALUES (\'2\', \'1\', \'2017-06-06 21:45:58\', \'中国美食\', \'台北小吃,各地美食\');
INSERT INTO `documents` VALUES (\'3\', \'2\', \'2017-06-06 21:45:58\', \'美女之家\', \'美女之国\');
INSERT INTO `documents` VALUES (\'4\', \'2\', \'2017-06-06 21:45:58\', \'hello\', \'this is to test groups\');
INSERT INTO `documents` VALUES (\'5\', \'3\', \'2017-07-27 17:00:09\', \'熊猫\', \'中国国宝\');
INSERT INTO `documents` VALUES (\'6\', \'3\', \'2017-07-14 17:00:04\', \'竹子\', \'熊猫吃竹子\');
INSERT INTO `documents` VALUES (\'7\', \'4\', \'2017-07-14 17:30:36\', \'猫科动物\', \'老虎吃人\');
INSERT INTO `documents` VALUES (\'8\', \'4\', \'2017-07-14 17:30:36\', \'猫科动物2\', \'东北虎\');
INSERT INTO `documents` VALUES (\'9\', \'5\', \'2017-07-14 17:34:24\', \'动物园\', \'老鼠\');
SET FOREIGN_KEY_CHECKS=1;






SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for sph_counter
-- ----------------------------
DROP TABLE IF EXISTS `sph_counter`;
CREATE TABLE `sph_counter` (
  `counter_id` int(11) NOT NULL AUTO_INCREMENT,
  `max_doc_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`counter_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4;

-- ----------------------------
-- Records of sph_counter
-- ----------------------------
INSERT INTO `sph_counter` VALUES (\'1\', \'6\');
SET FOREIGN_KEY_CHECKS=1;

 

 

 

其中,sph_counter表中,的max_doc_id为当前在coreseek已经存放的索引的最大值。而配置文件tmp_src1中有这样一句话

sql_query            = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )

意思是将新更新的部分加入到tmp_src1中。

索引合并

将在mysql中的最新数据加入到tmp_scr1中
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft.conf tmp_src1 --rotate --quiet

 

进行索引合并

/usr/local/coreseek/bin/indexer --merge src1 tmp_src1 --merge-dst-range deleted 0 0

 

这样,新增加的数据就到了src1的索引内。



以上是关于sphinx配置增量索引和索引合并的主要内容,如果未能解决你的问题,请参考以下文章

sphinx和elasticseach使用感受

sphinx主索引和增量索引实时更新

sphinx主索引和增量索引实时更新

sphinx 增量索引 实现近实时更新

20160818分析各种搜索的优劣势(essolrsphinxmysql like)

sphinx 注意点