markdown 构建Rasa NLU训练数据集

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了markdown 构建Rasa NLU训练数据集相关的知识,希望对你有一定的参考价值。

#' Generate rasa NLU training data file
#'
#' Take a dataframe and export a json dataset in rasa NLUs format.  The dataframe must contain a column 
#' called text, which should represent the questions that you want to tag with intents and entities in the RASA NLU
#' training tool https://rasahq.github.io/rasa-nlu-trainer/.
#' https://nlu.rasa.com/dataformat.html
#' @param dat The dataframe holding the text data for questions
#' @param path character the path/name of the file to be exported. Default is in current directory with a name of train.json
#' @export
#' write_rasa_nlu

write_rasa_nlu = function(dat, path="train.json") {
  ## ensure that the text column is present
  stopifnot("text" %in% colnames(dat))
  ## ensure there is at least one row
  stopifnot(nrow(dat) > 0)
  ## keep the column
  dat2 = dplyr::select(dat, text)
  ## for each row, build the entry
  ## TODO: vectorize and remove for-loop
  rasa = list()
  for (i in 1:nrow(dat2)) {
    rasa[[i]] = list(text=dat2$text[i], intent="", entities=list())
  }
  ## finish the file
  rasa_json = list(rasa_nlu_data=list(common_examples=rasa))
  ## write the file
  jsonlite::write_json(rasa_json, path, auto_unbox=TRUE)
}
# About

This `R` script provides a helper function to take a `dataframe`, and build a proper 
rasa NLU training file in JSON format.  This would be helpful for when you want to:

- export records from a database 
- import the training data file into the webapp found here: https://rasahq.github.io/rasa-nlu-trainer/
- tag your data with the tool and export 
- re-import the JSON into R and associate with the database ID
- 
## Why

Export data in an order that we can match back to our original file.  This is not shown, but the order is preserved so it's just a matter of aligning the data once it is tagged from the webapp and read in from the JSON file.

## Example

Build a dataset and write a JSON data file that can imported into https://rasahq.github.io/rasa-nlu-trainer/

```
dat = data.frame(id = 1:3, text = c("test 1", "i like turtles", "where are you"))
write_rasa_nlu(dat)
```

From here, tag your data and read back into `R` and align the records 1:1.  Typically, you would save out the raw file, and correlate the tagged data from above with each row of that raw file.

```
## bring in the data trained from the webapp
x = fromJSON("~/Downloads/2018-07-train.json", flatten = TRUE)
y = x$rasa_nlu_data$common_examples
glimpse(y)
```

Which assumes above that you exported the JSON file from rasa-nlu-trainer to your default downloads directory on your local machine.

and merge

```
z = y %>% rename(text2 = text)
msgs = cbind(dat, z)
glimpse(msgs)
```

以上是关于markdown 构建Rasa NLU训练数据集的主要内容,如果未能解决你的问题,请参考以下文章

使用Botkit和Rasa NLU构建智能聊天机器人

如何使用RASA NLU处理复合类型的实体?

Rasa NLU 聊天机器人自然语言理解

Rasa学习记录 01

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

rasa 介绍文档