使用ElasticSearch 和 BERT进行NLP文本分析

Posted shiter

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用ElasticSearch 和 BERT进行NLP文本分析相关的知识,希望对你有一定的参考价值。

文章大纲


es 8.0 新特性

https://www.elastic.co/cn/blog/whats-new-elastic-8-0-0

新版es 新增的 机器学习 算法(比如异常检测)

  • https://www.elastic.co/guide/en/machine-learning/current/anomaly-examples.html

wsl2 下使用 docker 搞一下es

如何在wsl2 下面安装docker,可以参考我之前的博客

curl https://get.docker.com | sh
# Linux gpasswd 是 Linux 下工作组文件 /etc/group 和 /etc/gshadow 管理工具,用于将一个用户添加到组或者从组中删除。
sudo gpasswd -a <你的用户名> docker

sudo service docker start

拉取 Elasticsearch Docker image

Obtaining Elasticsearch for Docker is as simple as issuing a docker pull command against the Elastic Docker registry.

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.2.0

启动单个 ES 节点

Start a single-node cluster with Dockeredit
If you’re starting a single-node Elasticsearch cluster in a Docker container, security will be automatically enabled and configured for you. When you start Elasticsearch for the first time, the following security configuration occurs automatically:

Certificates and keys are generated for the transport and HTTP layers.
The Transport Layer Security (TLS) configuration settings are written to elasticsearch.yml.
A password is generated for the elastic user.
An enrollment token is generated for Kibana.
You can then start Kibana and enter the enrollment token, which is valid for 30 minutes. This token automatically applies the security settings from your Elasticsearch cluster, authenticates to Elasticsearch with the kibana_system user, and writes the security configuration to kibana.yml.

The following commands start a single-node Elasticsearch cluster for development or testing.

Create a new docker network for Elasticsearch and Kibana

docker network create elastic
Start Elasticsearch in Docker. A password is generated for the elastic user and output to the terminal, plus an enrollment token for enrolling Kibana.

docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.2.0
You might need to scroll back a bit in the terminal to view the password and enrollment token.

Copy the generated password and enrollment token and save them in a secure location. These values are shown only when you start Elasticsearch for the first time.

If you need to reset the password for the elastic user or other built-in users, run the elasticsearch-reset-password tool. This tool is available in the Elasticsearch /bin directory of the Docker container. For example:

docker exec -it es01 /usr/share/elasticsearch/bin/elasticsearch-reset-password
Copy the http_ca.crt security certificate from your Docker container to your local machine.

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
Open a new terminal and verify that you can connect to your Elasticsearch cluster by making an authenticated call, using the http_ca.crt file that you copied from your Docker container. Enter the password for the elastic user when prompted.

curl --cacert http_ca.crt -u elastic https://localhost:9200

使用docker 安装 es

主体参考:

  • https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

docker官方的镜像库比较慢,在进行镜像操作之前,需要将镜像源设置为国内的站点。

新建文件/etc/docker/daemon.json,输入如下内容:


    "registry-mirrors" : [
        "https://registry.docker-cn.com",
        "https://docker.mirrors.ustc.edu.cn",
        "http://hub-mirror.c.163.com",
        "https://cr.console.aliyun.com/"
    ]

然后重启docker的服务:

systemctl restart docker

早期版本方案 bert-server

https://towardsdatascience.com/elasticsearch-meets-bert-building-search-engine-with-elasticsearch-and-bert-9e74bf5b4cf2

https://github.com/Hironsan/bertsearch


Es 8.0 版本方案

未完待续

es 与 nlp

https://www.elastic.co/guide/en/machine-learning/master/ml-nlp.html


参考文献

Introduction to modern natural language processing with PyTorch in Elasticsearch

  • https://www.elastic.co/cn/blog/introduction-to-nlp-with-pytorch-models
  • https://eland.readthedocs.io/en/v8.1.0/

以上是关于使用ElasticSearch 和 BERT进行NLP文本分析的主要内容,如果未能解决你的问题,请参考以下文章

Elasticsearch:使用 Elasticsearch 和 BERT 构建搜索引擎 - TensorFlow

Elasticsearch:使用 Elasticsearch 和 BERT 构建搜索引擎 - TensorFlow

我想用使用 BERT 隐藏状态的分类算法进行分析

使用 BERT 和 Keras 的神经网络进行文本分类

如何使用 BERT 对相似的句子进行聚类

使用 HuggingFace 库在 Pytorch 中训练 n% 的最后一层 BERT(训练 12 个中的最后 5 个 BERTLAYER。)