#' Generate rasa NLU training data file
#'
#' Take a dataframe and export a json dataset in rasa NLUs format. The dataframe must contain a column
#' called text, which should represent the questions that you want to tag with intents and entities in the RASA NLU
#' training tool https://rasahq.github.io/rasa-nlu-trainer/.
#' https://nlu.rasa.com/dataformat.html
#' @param dat The dataframe holding the text data for questions
#' @param path character the path/name of the file to be exported. Default is in current directory with a name of train.json
#' @export
#' write_rasa_nlu
write_rasa_nlu = function(dat, path="train.json") {
## ensure that the text column is present
stopifnot("text" %in% colnames(dat))
## ensure there is at least one row
stopifnot(nrow(dat) > 0)
## keep the column
dat2 = dplyr::select(dat, text)
## for each row, build the entry
## TODO: vectorize and remove for-loop
rasa = list()
for (i in 1:nrow(dat2)) {
rasa[[i]] = list(text=dat2$text[i], intent="", entities=list())
}
## finish the file
rasa_json = list(rasa_nlu_data=list(common_examples=rasa))
## write the file
jsonlite::write_json(rasa_json, path, auto_unbox=TRUE)
}
# About
This `R` script provides a helper function to take a `dataframe`, and build a proper
rasa NLU training file in JSON format. This would be helpful for when you want to:
- export records from a database
- import the training data file into the webapp found here: https://rasahq.github.io/rasa-nlu-trainer/
- tag your data with the tool and export
- re-import the JSON into R and associate with the database ID
-
## Why
Export data in an order that we can match back to our original file. This is not shown, but the order is preserved so it's just a matter of aligning the data once it is tagged from the webapp and read in from the JSON file.
## Example
Build a dataset and write a JSON data file that can imported into https://rasahq.github.io/rasa-nlu-trainer/
```
dat = data.frame(id = 1:3, text = c("test 1", "i like turtles", "where are you"))
write_rasa_nlu(dat)
```
From here, tag your data and read back into `R` and align the records 1:1. Typically, you would save out the raw file, and correlate the tagged data from above with each row of that raw file.
```
## bring in the data trained from the webapp
x = fromJSON("~/Downloads/2018-07-train.json", flatten = TRUE)
y = x$rasa_nlu_data$common_examples
glimpse(y)
```
Which assumes above that you exported the JSON file from rasa-nlu-trainer to your default downloads directory on your local machine.
and merge
```
z = y %>% rename(text2 = text)
msgs = cbind(dat, z)
glimpse(msgs)
```