IrisTK

Java-based dialogue system framework


Creating an application

In this tutorial, you will learn the basics of how to set up an application and how a simple dialogue system can be defined.

Note: this tutorial assumes that:

IrisTK provides tools for rapidly setting up a skeleton for an application based on templates. We will use this to set up a simple dialogue system, based on a template called "simple_dialog", which is useful for speech only-applications. It creates a very simple application - a game in which you are asked to guess a number between 1 and 10. Since it provides the stubs for flow and grammar, you can use it in the future to set up skeletons for new applications, giving them any name of your choice. All places below where the name "guess" is used, the name of your application should then be used.

Open a command window an type the following command:

iristk create simple_dialog guess

This will create an application called "guess" based on the "simple_dialog" template in the "app" folder under the IrisTK installation.

Now type:

iristk eclipse

This will set up eclipse properly. Open up eclipse, or refresh the IrisTK project if eclipse is already open. The new application should come up as a source folder.

Have a headset or microphone ready before running the application. Locate the file GuessSystem.java in the source folder, right click and choose "Run As" -> "Java application".

Once eclipse has compiled the Java code, you can also run the application from the command line:

iristk guess

Understanding IrisSystem

Open the file GuessSystem.java. The constructor should contain the following:

// Create the system
IrisSystem system = new IrisSystem();
// Add a monitor window
new IrisMonitorWindow(system).setVisible(true);
// Add the flow
system.addModule(new FlowModule(new GuessFlow()));
// Create a recognizer module (using the windows recognizer)
RecognizerModule recognizer = new RecognizerModule(new WindowsRecognizer(Language.ENGLISH_US));
// Load a grammar in the recognizer
recognizer.loadGrammar("default", getClass().getResource("grammar.xml").toURI());
// Set the default grammar for the recognizer
recognizer.setDefaultGrammars("default");
// Add the recognizer to the system
system.addModule(recognizer);
// Add a synthesizer to the system      
system.addModule(new SynthesizerModule(new WindowsSynthesizer(Language.ENGLISH_US)));
// Start the system
system.start();

An IrisTK system (iristk.system.IrisSystem) consists of a number of modules (subclasses of iristk.system.IrisModule) that send and receive events. Events can represent anything that updates the system, typically some action that the user has done which is perceived by a module in the system or an action that some module wants some other module to execute. By default, IrisSystem relays all events to all modules and it is up to each module whether to process the event or not.

Understanding the flow

Open the file GuessFlow.xml. It starts like this:

<flow name="GuessFlow" package="iristk.app.guess" 
    initial="Start" xmlns="iristk.flow" xmlns:p="iristk.flow.param" xmlns:dialog="iristk.flow.SimpleDialog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="iristk.flow flow.xsd iristk.flow.SimpleDialog SimpleDialog.xsd">
    
    <var name="number" type="Integer"/>
    <var name="guesses" type="Integer"/>
    
    <state id="Start">
        <onentry>
            <exec>number = new java.util.Random().nextInt(10) + 1</exec>
            <exec>guesses = 0</exec>
            <dialog:say>I am thinking of a number between 1 and 10, let's see if you can guess which one it is.</dialog:say>
            <goto state="Guess"/>
        </onentry>
    </state>

The top-level <flow> element starts by defining the following things:

Then two flow-level variables are defined: number (the number that the system is thinking of) and guesses (the number of guesses that the user has made).

The initial state State contains one event handler called <onentry>, which is triggered when the state is entered. The event handler does in turn contain a set of actions that will be executed:

<state id="Guess" extends="Dialog">
    <onentry>
        <dialog:listen/>
    </onentry>
    <onevent name="sense.user.speech" cond="event?:sem:number">
        <exec>guesses++</exec>
        <if cond="asInteger(event:sem:number) == number">
            <dialog:say>That was correct, you only needed <expr>guesses</expr> guesses.</dialog:say>
            <goto state="CheckAgain"/>
        <elseif cond="asInteger(event:sem:number) &gt; number"/>
            <dialog:say>That was too high, let's try again.</dialog:say>
            <reentry/>
        <else/>
            <dialog:say>That was too low, let's try again.</dialog:say>
            <reentry/>
        </if>
    </onevent>
</state>
    
<state id="Dialog">
    <onevent name="sense.user.speech">
        <dialog:say>I am sorry, I didn't get that.</dialog:say>
        <reentry/>
    </onevent>
    <onevent name="sense.user.speech.silence">
        <dialog:say>I am sorry, I didn't hear anything.</dialog:say>
        <reentry/>
    </onevent>
</state>

As can be seen, the Guess state has an extends attribute. This means that all event handlers in the Dialog state are inherited by the Guess state, but they are checked after the event handlers in the Guess state. This is a very important functionality: it allows you to define generic behaviour across states. In this case, the Dialog state defines what will happen if the user say something the system doesn't understand or doesn't say anything. In both cases, the user will first be informed, after which the <reentry> action re-triggers the <onentry> event handler of the current state.

Upon (re-)entry of the Guess state, the speech recognizer is instructed to start listening for input. When a speech recognition result is received, a sense.user.speech event will be raised. This is caught by the event handler <onevent>. There is also a cond attribute specifying that for this event handler to trigger, the event must contain the semantic field number (if not, the event will be caught by the Dialog state, as described above). The event handler then checks whether the guessed number was correct, too low, or too high. (Since the characters < and > have a special meaning in XML, &lt; and &gt; are used instead).

Finally, the CheckAgain state checks whether the user wants to play again. Notice how the behaviour in the Dialog state is reused:

<state id="CheckAgain" extends="Dialog">
    <onentry>
        <dialog:say>Do you want to play again?</dialog:say>
        <dialog:listen/>
    </onentry>
    <onevent name="sense.user.speech" cond="event?:sem:yes">
        <dialog:say>Okay, let's play again.</dialog:say>
        <goto state="Start"/>
    </onevent>
    <onevent name="sense.user.speech" cond="event?:sem:no">
        <dialog:say>Okay, goodbye</dialog:say>
        <exec>System.exit(0)</exec>
    </onevent>      
</state>

Compiling the flow

If you change the flow XML, it needs to be compiled to Java code for the change to take effect. You can do this from the command line like this:

cd %IrisTK%\app\guess
iristk cflow GuessFlow.xml

Then refresh the "guess" source folder (or the whole IrisTK project) in eclipse.

If you want to compile the flow from within eclipse, there are two other ways of doing it:

  1. There is also an Ant task set up for you. Locate the build.xml file in the app/guess folder in eclipse (not the source folder!). Right-click and choose "Run As"->"Ant Build". Remember to refresh afterwards.
  2. Download the eclipse plug-in and compile the flow by right-clicking on it and choose "Compile Flow" from the context menu. This is the most convenient way of compiling the flow, since it also notifies Eclipse (and you don't have to refresh).

Understanding the grammar

The grammar of what the user can say is defined in grammar.xml, according to SRGS, the Speech Recognition Grammar Specification. It also specifies the semantics of the utterance using <tag> elements.

<?xml version="1.0" encoding="utf-8"?>
<grammar xml:lang="en-US" version="1.0" root="root"
    xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.iristk.net/xml/srgs.xsd" tag-format="semantics/1.0">

  <rule id="root" scope="public">
      <one-of>
          <item>one <tag>out.number=1</tag></item>
          <item>two <tag>out.number=2</tag></item>
          <item>three <tag>out.number=3</tag></item>
          <item>four <tag>out.number=4</tag></item>
          <item>five <tag>out.number=5</tag></item>
          <item>six <tag>out.number=6</tag></item>
          <item>seven <tag>out.number=7</tag></item>
          <item>eight <tag>out.number=8</tag></item>
          <item>nine <tag>out.number=9</tag></item>
          <item>ten <tag>out.number=10</tag></item>
          <item>yes <tag>out.yes=1</tag></item>
          <item>no <tag>out.no=1</tag></item>
      </one-of>
  </rule>
  
</grammar>

It is also possible to use the ABNF format for grammars. In that case, the grammar should be loaded like this:

recognizer.loadGrammar("default", new ABNFGrammar(getClass().getResource("grammar.abnf").toURI()));

The corresponding ABNF grammar would look like this:

#ABNF 1.0 ISO-8859-1;

language en-US;
root $root;

public $root = 
((one   {out.number=1})  | 
 (two   {out.number=2})  | 
 (three {out.number=3})  | 
 (four  {out.number=4})  | 
 (five  {out.number=5})  | 
 (six   {out.number=6})  | 
 (seven {out.number=7})  | 
 (eight {out.number=8})  | 
 (nine  {out.number=9})  | 
 (ten   {out.number=10}) | 
 (yes   {out.yes=1})     | 
 (no    {out.no=1}));

Download IrisTK

Guide to IrisTK


Copyright © Gabriel Skantze, 2013-