基于ElasticSearch的小型网络空间搜索引擎

Posted 2022-04-29 blackxu

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了基于ElasticSearch的小型网络空间搜索引擎相关的知识，希望对你有一定的参考价值。

技术水平已经好久没有在进步了，我一直在琢磨为什么，归根到底就是意志力不够坚强！遇到问题的时候没有自己去思考怎么解决，而是一味的百度找答案，一旦找不到答案就放弃了，这样的我怎么会进步呢！

这次的开发，对我来说算是一个比较五脏俱全的项目，虽小但是功能确很强悍！采用了ElasticSearch分布式全文搜索引擎，那么这是个什么东西呢？

看一下百科介绍：

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。官方客户端在Java、.NET（C#）、php、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示，Elasticsearch是最受欢迎的企业搜索引擎，其次是Apache Solr，也是基于Lucene。

简单来说，它能让你检索数据的能力变得十分强悍！来说说如何部署 Elasticsearch 环境（要先安装JDK1.8这个我就不多说了）。

下载下面三个文件并全部解压到某盘的根目录（比如：D:/）：

https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.0-windows-x86_64.zip

https://artifacts.elastic.co/downloads/kibana/kibana-7.0.1-windows-x86_64.zip

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.2.0/elasticsearch-analysis-ik-7.2.0.zip

将 kibana-7.0.1-windows-x86_64 拖到 elasticsearch-7.0.0-windows-x86_64 目录下

将 elasticsearch-analysis-ik-7.2.0 拖到 elasticsearch-7.0.0-windows-x86_64/plugin 目录下

启动 elasticsearch-7.0.0-windows-x86_64\bin 目录下的 elasticsearch.bat

再启动 elasticsearch-7.0.0-windows-x86_64\kibana-7.0.1-windows-x86_64\bin 目录下的 kibana.bat

此时 Elasticsearch 环境就启动好了！你可以访问 http://127.0.0.1:9200 看看是否成功。

由于本项目的语言采用的是Python，所以在百度搜索了Python操作Elasticsearch的一些资料，归根到底还是CRUD：

首先得有一个对象：

es = Elasticsearch(["http://127.0.0.1:9200"]),

创建索引：

es.indices.create(index=‘sadness‘,ignore)

插入数据：

es.index(index="sadness",doc_type="doc",id=1,body="name":"zhanshan","timestamp":datetime.now())

Get获取数据：

res = es.get(index=‘indexName‘, doc_type=‘typeName‘, id=‘idValue‘)

删除数据：

es.delete(index=‘indexName‘, doc_type=‘typeName‘, id=‘idValue‘)

条件删除：

query = ‘query‘: ‘match‘: ‘sex‘: ‘famale‘
es.delete_by_query(index=‘indexName‘, body=query, doc_type=‘typeName‘)

批量写入、删除、更新：

doc = [
     "index": ,
     ‘name‘: ‘jackaaa‘, ‘age‘: 2000, ‘sex‘: ‘female‘, ‘address‘: u‘北京‘,
     "index": ,
     ‘name‘: ‘jackbbb‘, ‘age‘: 3000, ‘sex‘: ‘male‘, ‘address‘: u‘上海‘,
     "index": ,
     ‘name‘: ‘jackccc‘, ‘age‘: 4000, ‘sex‘: ‘female‘, ‘address‘: u‘广州‘,
     "index": ,
     ‘name‘: ‘jackddd‘, ‘age‘: 1000, ‘sex‘: ‘male‘, ‘address‘: u‘深圳‘,
 ]
 doc = [
    ‘index‘: ‘_index‘: ‘indexName‘, ‘_type‘: ‘typeName‘, ‘_id‘: ‘idValue‘
    ‘name‘: ‘jack‘, ‘sex‘: ‘male‘, ‘age‘: 10 
    ‘delete‘: ‘_index‘: ‘indexName‘, ‘_type‘: ‘typeName‘, ‘_id‘: ‘idValue‘
    "create": ‘_index‘ : ‘indexName‘, "_type" : ‘typeName‘, ‘_id‘: ‘idValue‘
    ‘name‘: ‘lucy‘, ‘sex‘: ‘female‘, ‘age‘: 20 
    ‘update‘: ‘_index‘: ‘indexName‘, ‘_type‘: ‘typeName‘, ‘_id‘: ‘idValue‘
    ‘doc‘: ‘age‘: ‘100‘
 ]
 es.bulk(index=‘indexName‘， doc_type=‘typeName‘, body=doc)

搜索所有数据：

es.search(index="my_index",doc_type="test_type")

或者

body = 
    "query":
        "match_all":
    

es.search(index="my_index",doc_type="test_type",body=body)

term查询：

body = 
    "query":
        "term":
            "name":"python"
        
    

# 查询name="python"的所有数据
es.search(index="my_index",doc_type="test_type",body=body)

terms查询：

body = 
    "query":
        "terms":
            "name":[
                "python","android"
            ]
        
    

# 搜索出name="python"或name="android"的所有数据
es.search(index="my_index",doc_type="test_type",body=body)

match与multi_match：

# match:匹配name包含python关键字的数据
body = 
    "query":
        "match":
            "name":"python"
        
    

# 查询name包含python关键字的数据
es.search(index="my_index",doc_type="test_type",body=body)

# multi_match:在name和addr里匹配包含深圳关键字的数据
body = 
    "query":
        "multi_match":
            "query":"深圳",
            "fields":["name","addr"]
        
    

# 查询name和addr包含"深圳"关键字的数据
es.search(index="my_index",doc_type="test_type",body=body)

ids：

body = 
    "query":
        "ids":
            "type":"test_type",
            "values":[
                "1","2"
            ]
        
    

# 搜索出id为1或2d的所有数据
es.search(index="my_index",doc_type="test_type",body=body)

复合查询bool：
bool有3类查询关系，must(都满足),should(其中一个满足),must_not(都不满足)

body = 
    "query":
        "bool":
            "must":[
                
                    "term":
                        "name":"python"
                    
                ,
                
                    "term":
                        "age":18
                    
                
            ]
        
    

# 获取name="python"并且age=18的所有数据
es.search(index="my_index",doc_type="test_type",body=body)

切片式查询：

body = 
    "query":
        "match_all":
    
    "from":2    # 从第二条数据开始
    "size":4    # 获取4条数据

# 从第2条数据开始，获取4条数据
es.search(index="my_index",doc_type="test_type",body=body)

范围查询：

body = 
    "query":
        "range":
            "age":
                "gte":18,       # >=18
                "lte":30        # <=30
            
        
    

# 查询18<=age<=30的所有数据
es.search(index="my_index",doc_type="test_type",body=body)

前缀查询：

body = 
    "query":
        "prefix":
            "name":"p"
        
    

# 查询前缀为"赵"的所有数据
es.search(index="my_index",doc_type="test_type",body=body)

通配符查询：

body = 
    "query":
        "wildcard":
            "name":"*id"
        
    

# 查询name以id为后缀的所有数据
es.search(index="my_index",doc_type="test_type",body=body)

以上就是Python对Elasticsearch的基本操作了，我的项目就是采用了上面的部分代码。

项目预览：http://www.iqiyi.com/w_19saegbh6x.html

项目地址：https://github.com/BlackXu007/CrawlerEngine

参考文章：https://blog.csdn.net/u013429010/article/details/81746179

以上是关于基于ElasticSearch的小型网络空间搜索引擎的主要内容，如果未能解决你的问题，请参考以下文章