开始使用 rasa
bash
rasa init
文件说明
antions/actions.py
可以自定义 actions 的代码文件
data/nlu.yml
Rasa NLU 的训练数据
data/rules.yml
Rasa 规则数据
data/stories.yml
Rasa 故事数据
domain.yml
领域指定了 rasa 应该知道的意图、实体、插槽、响应、表单和动作。它还定义了会话会话的配置。
models
初始训练的模型数据
config.yml
Rasa NLU 和 Rasa Core 的配置文件
yaml 具体说明
- nlu.yml
yml
version: "2.0"
nlu:
- intent: greet #意图字段,intent对应的值为意图名
examples: |
- hey
- hello
- hi
- synonym: 水果 #同义词字段,具有synonym键表明当前对象用于存储同义词信息
examples: |
- 苹果
- 草莓
- lookup: 城市 #查找表字段,具有lookup键表明当前对象用于存储查找表
examples: |
- 北京
- 上海
- regex: help #正则表达式字段
examples: |
- \bhelp\b
- rules.yml
yml
version: "2.0"
# 规则 需要使用 RulePolicy
rules:
- rule: Say goodbye anytime the user says goodbye
steps:
- intent: goodbye
- action: utter_goodbye
- rule: Say 'I am a bot' anytime the user challenges
steps:
- intent: bot_challenge
- action: utter_iamabot
- stories.yml
yml
version: "2.0"
# rasa通过学习故事的方式来学习对话管理知识
# 故事本身结构为字典,必须拥有story与steps键
# story键给出的值代表故事的备注
# steps记录了故事的主体
stories:
- story: happy path
steps:
- intent: greet
- action: utter_greet
- intent: mood_great
- action: utter_happy
- story: sad path 1
steps:
- intent: greet
- action: utter_greet
- intent: mood_unhappy
- action: utter_cheer_up
- action: utter_did_that_help
- intent: affirm
- action: utter_happy
- story: sad path 2
steps:
- intent: greet
- action: utter_greet
- intent: mood_unhappy
- action: utter_cheer_up
- action: utter_did_that_help
- intent: deny
- action: utter_goodbye
- domain.yml
version: '2.0'
config:
store_entities_as_slots: true
session_config:
session_expiration_time: 60
carry_over_slots_to_new_session: true
intents: #意图
- greet:
use_entities: true
- goodbye:
use_entities: true
- affirm:
use_entities: true
- deny:
use_entities: true
- mood_great:
use_entities: true
- mood_unhappy:
use_entities: true
- bot_challenge:
use_entities: true
entities: [] #实体列表
slots: {} #插槽列表
responses: #回复
utter_greet: #对应意图 以utter_为前缀
- text: Hey! How are you?
utter_cheer_up:
- image: https://i.imgur.com/nGF1K8f.jpg
text: 'Here is something to cheer you up:'
utter_did_that_help:
- text: Did that help you?
utter_happy:
- text: Great, carry on!
utter_goodbye:
- text: Bye
utter_iamabot:
- text: I am a bot, powered by Rasa.
actions: [] #动作
forms: {} #表单
e2e_actions: []
- config.yml
yml
recipe: default.v1
language: en
pipeline: #流水线 配置组件
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
policies: #策略 负责学习故事从而预测动作
- name: MemoizationPolicy
- name: RulePolicy
# - name: UnexpecTEDIntentPolicy
# max_history: 5
# epochs: 100
- name: TEDPolicy
max_history: 10
epochs: 100
constrain_similarities: true
流水线组件配置
一个 nlu 应用通常拥有实体识别与意图识别两个任务,为了完成这些任务,一个典型的 Rasa NLU 配置通常会包含一下各类组件
语言模型组件
组件 备注 spaCyNLP 分词组件
组件 备注 spaCyTokenizer JiebaTokenizer 特征提取组件
组件 备注 spacyFeaturizer LanguageModelFeaturizer 命名实体 NER 组件
组件 备注 CRFEntityExtractor spacyEntityExtractor 意图分类组建
组件 备注 FallbackClassifier 实体意图联合提取组件
组件 备注 DIETClassifier 回复选择器
组件 备注 ResponseSelector
目前我正在使用的两套中文流水线配置:
yml
pipeline:
- name: JiebaTokenizer
dictionary_path: "pipline/jieba_userdict"
- name: LanguageModelFeaturizer
model_name: "bert"
model_weights: "bert-base-chinese"
- name: RegexFeaturizer
- name: DIETClassifier
epochs: 100
learning_rate: 0.001
- name: ResponseSelector
epochs: 100
learning_rate: 0.001
- name: FallbackClassifier
threshold: 0.3
- name: EntitySynonymMapper
yml
pipeline:
- name: MitieNLP
model: "pipline/total_word_feature_extractor_zh.dat"
- name: JiebaTokenizer
dictionary_path: "pipline/jieba_userdict"
- name: MitieEntityExtractor
- name: EntitySynonymMapper
- name: RegexFeaturizer
- name: MitieFeaturizer
- name: SklearnIntentClassifier
- name: DIETClassifier
epochs: 100
- name: DucklingEntityExtractor
dimensions: ["number"]
- name: ResponseSelector
epochs: 100
- name: FallbackClassifier
threshold: 0.3
策略
内建策略
策略 备注 TEDPolicy MemoizationPolicy AugmentedMemoizationPolicy RulePolicy 自定义动作 自定义动作需要继承 RasaSDK 动作类,以下是一个例子:
python
# actions.py
from typing import Any, Text, Dict, List
from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher
import time
import json
import requests
class ActionSolt(Action):
# 动作名
def name(self) -> Text:
return "action_slot"
# 获取当前对话信息和用户信息对象,并利用这些信息完成业务动作
def run(self, dispatcher: CollectingDispatcher,
tracker: Tracker,
domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:
data = tracker.get_slot("firstslot")
print(data)
dispatcher.utter_message(text=data)
return []
常用命令
- 验证数据
bash
rasa data validate stroies #验证stroies是否写的正确
- 执行训练
bash
rasa train
rasa train --domain domain 训练多个domian文件,将yml文件存储在domain文件夹中即可
rasa train --num-threads 4 #多线程
rasa train nlu #训练rasa nlu
rasa train core #训练rasa core
- 运行动作服务器
bash
rasa run actions --debug
- 开始对话
bash
rasa shell
rasa interactive #交互式对话
rasa x #可视化对话
- 测试
bash
rasa data split nlu #拆分测试训练集
rasa shell nlu #测试nlu模型性能
rasa data split nlu 拆分nlu数据
rasa test nlu --nlu train_test_split/test_data.yml #测试数据集
rasa test nlu --nlu data/nlu.yml --cross-validation #交叉验证测试数据集
rasa test nlu --nlu data/nlu.yml --config config_1.yml config_2.yml #对比nlu管道