自然语言处理 - 标记化 1. NLP 零到英雄 Natural Language Processing - Tokenization

Posted 2022-02-17 AI架构师易筋

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了自然语言处理 - 标记化 1. NLP 零到英雄 Natural Language Processing - Tokenization相关的知识，希望对你有一定的参考价值。

https://bit.ly/tfw-nlp1

from tensorflow.keras.preprocessing.text import Tokenizer

sentences = [
    'i love my dog',
    'I, love my cat',
    'You love my dog!'
]

tokenizer = Tokenizer(num_words = 100)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index)

参考

https://youtu.be/fNxaJsNG3-s

以上是关于自然语言处理 - 标记化 1. NLP 零到英雄 Natural Language Processing - Tokenization的主要内容，如果未能解决你的问题，请参考以下文章

排序 - 将句子转化为数据 2 NLP 零到英雄 Sequencing - Turning sentences into data

训练 AI 创作诗歌 6 NLP 从零到英雄 Training an AI to create poetry

带有RNN循环神经网络的机器学习 4 NLP 从零到英雄 ML with Recurrent Neural Networks

训练模型以识别文本中的情绪 3 NLP 零到英雄 Training a model to recognize sentiment in text

入门 | 自然语言处理是如何工作的？一步步教你构建 NLP 流水线

在PyCharm中安装nltk，以及nltk data的下载。