Skip to content

开始使用 rasa

bash
rasa init

文件说明

antions/actions.py

可以自定义 actions 的代码文件

data/nlu.yml

Rasa NLU 的训练数据

data/rules.yml

Rasa 规则数据

data/stories.yml

Rasa 故事数据

domain.yml

领域指定了 rasa 应该知道的意图、实体、插槽、响应、表单和动作。它还定义了会话会话的配置。

models

初始训练的模型数据

config.yml

Rasa NLU 和 Rasa Core 的配置文件

yaml 具体说明

  • nlu.yml
yml
version: "2.0"

nlu:
  - intent: greet #意图字段,intent对应的值为意图名
    examples: |
      - hey
      - hello
      - hi

  - synonym: 水果 #同义词字段,具有synonym键表明当前对象用于存储同义词信息
    examples: |
      - 苹果
      - 草莓

  - lookup: 城市 #查找表字段,具有lookup键表明当前对象用于存储查找表
    examples: |
      - 北京
      - 上海

  - regex: help #正则表达式字段
    examples: |
      - \bhelp\b
  • rules.yml
yml
version: "2.0"

# 规则 需要使用 RulePolicy
rules:
  - rule: Say goodbye anytime the user says goodbye
    steps:
      - intent: goodbye
      - action: utter_goodbye

  - rule: Say 'I am a bot' anytime the user challenges
    steps:
      - intent: bot_challenge
      - action: utter_iamabot
  • stories.yml
yml
version: "2.0"

# rasa通过学习故事的方式来学习对话管理知识
# 故事本身结构为字典,必须拥有story与steps键
# story键给出的值代表故事的备注
# steps记录了故事的主体

stories:
  - story: happy path
    steps:
      - intent: greet
      - action: utter_greet
      - intent: mood_great
      - action: utter_happy

  - story: sad path 1
    steps:
      - intent: greet
      - action: utter_greet
      - intent: mood_unhappy
      - action: utter_cheer_up
      - action: utter_did_that_help
      - intent: affirm
      - action: utter_happy

  - story: sad path 2
    steps:
      - intent: greet
      - action: utter_greet
      - intent: mood_unhappy
      - action: utter_cheer_up
      - action: utter_did_that_help
      - intent: deny
      - action: utter_goodbye
  • domain.yml
version: '2.0'
config:
  store_entities_as_slots: true
session_config:
  session_expiration_time: 60
  carry_over_slots_to_new_session: true
intents:  #意图
- greet:
    use_entities: true
- goodbye:
    use_entities: true
- affirm:
    use_entities: true
- deny:
    use_entities: true
- mood_great:
    use_entities: true
- mood_unhappy:
    use_entities: true
- bot_challenge:
    use_entities: true
entities: []  #实体列表
slots: {}  #插槽列表
responses:  #回复
  utter_greet:  #对应意图 以utter_为前缀
  - text: Hey! How are you?
  utter_cheer_up:
  - image: https://i.imgur.com/nGF1K8f.jpg
    text: 'Here is something to cheer you up:'
  utter_did_that_help:
  - text: Did that help you?
  utter_happy:
  - text: Great, carry on!
  utter_goodbye:
  - text: Bye
  utter_iamabot:
  - text: I am a bot, powered by Rasa.
actions: []  #动作
forms: {}  #表单
e2e_actions: []
  • config.yml
yml
recipe: default.v1
language: en

pipeline: #流水线 配置组件
  # # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
  # # If you'd like to customize it, uncomment and adjust the pipeline.
  # # See https://rasa.com/docs/rasa/tuning-your-model for more information.
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

policies: #策略 负责学习故事从而预测动作
  - name: MemoizationPolicy
  - name: RulePolicy
  # - name: UnexpecTEDIntentPolicy
  #   max_history: 5
  #   epochs: 100
  - name: TEDPolicy
    max_history: 10
    epochs: 100
    constrain_similarities: true

流水线组件配置

一个 nlu 应用通常拥有实体识别与意图识别两个任务,为了完成这些任务,一个典型的 Rasa NLU 配置通常会包含一下各类组件

  • 语言模型组件

    组件备注
    spaCyNLP
  • 分词组件

    组件备注
    spaCyTokenizer
    JiebaTokenizer
  • 特征提取组件

    组件备注
    spacyFeaturizer
    LanguageModelFeaturizer
  • 命名实体 NER 组件

    组件备注
    CRFEntityExtractor
    spacyEntityExtractor
  • 意图分类组建

    组件备注
    FallbackClassifier
  • 实体意图联合提取组件

    组件备注
    DIETClassifier
  • 回复选择器

    组件备注
    ResponseSelector

目前我正在使用的两套中文流水线配置:

yml
pipeline:
  - name: JiebaTokenizer
    dictionary_path: "pipline/jieba_userdict"
  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "bert-base-chinese"
  - name: RegexFeaturizer
  - name: DIETClassifier
    epochs: 100
    learning_rate: 0.001
  - name: ResponseSelector
    epochs: 100
    learning_rate: 0.001
  - name: FallbackClassifier
    threshold: 0.3
  - name: EntitySynonymMapper
yml
pipeline:
  - name: MitieNLP
    model: "pipline/total_word_feature_extractor_zh.dat"
  - name: JiebaTokenizer
    dictionary_path: "pipline/jieba_userdict"
  - name: MitieEntityExtractor
  - name: EntitySynonymMapper
  - name: RegexFeaturizer
  - name: MitieFeaturizer
  - name: SklearnIntentClassifier
  - name: DIETClassifier
    epochs: 100
  - name: DucklingEntityExtractor
    dimensions: ["number"]
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3

策略

  • 内建策略

    策略备注
    TEDPolicy
    MemoizationPolicy
    AugmentedMemoizationPolicy
    RulePolicy
  • 自定义动作 自定义动作需要继承 RasaSDK 动作类,以下是一个例子:

python
# actions.py
from typing import Any, Text, Dict, List

from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher
import time
import json
import requests

class ActionSolt(Action):
    # 动作名
    def name(self) -> Text:
        return "action_slot"
    # 获取当前对话信息和用户信息对象,并利用这些信息完成业务动作
    def run(self, dispatcher: CollectingDispatcher,
            tracker: Tracker,
            domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:

        data = tracker.get_slot("firstslot")
        print(data)
        dispatcher.utter_message(text=data)

        return []

常用命令

  • 验证数据
bash
rasa data validate stroies #验证stroies是否写的正确
  • 执行训练
bash
rasa train
rasa train --domain domain 训练多个domian文件,将yml文件存储在domain文件夹中即可
rasa train --num-threads 4  #多线程
rasa train nlu  #训练rasa nlu
rasa train core  #训练rasa core
  • 运行动作服务器
bash
rasa run actions --debug
  • 开始对话
bash
rasa shell
rasa interactive  #交互式对话
rasa x  #可视化对话
  • 测试
bash
rasa data split nlu  #拆分测试训练集
rasa shell nlu  #测试nlu模型性能
rasa data split nlu  拆分nlu数据
rasa test nlu --nlu train_test_split/test_data.yml  #测试数据集
rasa test nlu --nlu data/nlu.yml --cross-validation  #交叉验证测试数据集
rasa test nlu --nlu data/nlu.yml --config config_1.yml config_2.yml  #对比nlu管道