This article was originally published @ https://davide.dev:
Experiments with HoloLens, Bot Framework and LUIS: adding text to speech
Experiments with HoloLens, Bot Framework, LUIS and Speech Recognition
Recently, I had the opportunity to use a HoloLens device for some personal training and building some simple demos.
One of the scenarios that I find very intriguing is the possibility of integrating Mixed Reality and Artificial Intelligence (AI) in order to create immersive experiences for the user.
I decided to perform an experiment by integrating a Bot, Language Understanding Intelligent Services (LUIS), Speech Recognition and Mixed Reality via a Holographic 2D app.
The idea was to create a sort of "digital assistant"� of myself that can be contacted using Mixed Reality: the first implementation contains only basic interactions (answering questions like "What are your favourite technologies"� or "What's your name") but these could be easily be expanded in the future with features like time management (via the Graph APIs) or tracking projects status, etc.
To start, I created a new LUIS application in the portal with a list of intents that needed to be handled:
In the future, this could be further extended with extra capabilities.After defining the intents and utterances, I trained and published my LUIS app to Azure and copied the key and URL for usage in my Bot:
I proceeded with the creation of the Bot using Microsoft Bot framework downloading the Visual Studio template and creating a new project:
The Bot template already defined a dialog named RootDialog so I extended the generated project with the classes required for parsing the JSON from the LUIS endpoint:
public class LUISDavideBot { public static async Task<DavideBotLUIS> ParseUserInput(string strInput) { string strRet = string.Empty; string strEscaped = Uri.EscapeDataString(strInput); using (var client = new HttpClient()) { string uri = "<LUIS url here>" + strEscaped; HttpResponseMessage msg = await client.GetAsync(uri); if (msg.IsSuccessStatusCode) { var jsonResponse = await msg.Content.ReadAsStringAsync(); var _Data = JsonConvert.DeserializeObject<DavideBotLUIS>(jsonResponse); return _Data; } } return null; } } public class DavideBotLUIS { public string query { get; set; } public Topscoringintent topScoringIntent { get; set; } public Intent[] intents { get; set; } public Entity[] entities { get; set; } public Dialog dialog { get; set; } } public class Topscoringintent { public string intent { get; set; } public float score { get; set; } public Action[] actions { get; set; } } public class Action { public bool triggered { get; set; } public string name { get; set; } public object[] parameters { get; set; } } public class Dialog { public string contextId { get; set; } public string status { get; set; } } public class Intent { public string intent { get; set; } public float score { get; set; } public Action1[] actions { get; set; } } public class Action1 { public bool triggered { get; set; } public string name { get; set; } public object[] parameters { get; set; } } public class Entity { public string entity { get; set; } public string type { get; set; } public int startIndex { get; set; } public int endIndex { get; set; } public float score { get; set; } }
And then processed the various LUIS intents in RootDialog (another option is the usage of the LuisDialog and LuisModel classes as explained here):
[Serializable] public class RootDialog : IDialog<object> { public Task StartAsync(IDialogContext context) { context.Wait(MessageReceivedAsync); return Task.CompletedTask; } private async Task MessageReceivedAsync(IDialogContext context, IAwaitable<object> result) { var activity = await result as Activity; if (activity != null) { await ProcessLuis(activity, context); } context.Wait(MessageReceivedAsync); } private async Task ProcessLuis(Activity activity, IDialogContext context) { if (string.IsNullOrEmpty(activity.Text)) { return; } var errorResult = "Try something else, please."; var stLuis = await LUISDavideBot.ParseUserInput(activity.Text); var intent = stLuis?.topScoringIntent?.intent; if (!string.IsNullOrEmpty(intent)) { switch (intent) { case "FavouriteTechnologiesIntent": await context.PostAsync("My favourite technologies are Azure, Mixed Reality and Xamarin!"); break; case "FavouriteColorIntent": await context.PostAsync("My favourite color is Blue!"); break; case "WhatsYourNameIntent": await context.PostAsync("My name is Davide, of course :)"); break; case "HelloIntent": await context.PostAsync(@"Hi there! Davide here :) This is my personal Bot. Try asking 'What are your favourite technologies?'"); return; case "GoodByeIntent": await context.PostAsync("Thanks for the chat! See you soon :)"); return; case "None": await context.PostAsync(errorResult); break; default: await context.PostAsync(errorResult); break; } } return; } }
Then, I tested the implementation using the Bot Framework Emulator:
And created a new Bot definition in the framework portal.
After that, I published it to Azure with an updated Web.config with the generated Microsoft App ID and password:
Since the final goal was the communication with an UWP HoloLens application, I enabled the Diret Line channel:
Windows 10 UWP apps are executed on the HoloLens device as Holographic 2D apps that can be pinned in the environment.
I created a new project using the default Visual Studio Template: And then added some simple text controls in XAML to receive the input and display the response from the Bot:<Page x:Class="HoloLensBotDemo.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d">I decided to use the SpeechRecognizer APIs for receiving the input via voice (another option could be the usage of Cognitive Services):<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <Grid.ColumnDefinitions> <ColumnDefinition Width="10"/> <ColumnDefinition Width="Auto"/> <ColumnDefinition Width="10"/> <ColumnDefinition Width="*"/> <ColumnDefinition Width="10"/> </Grid.ColumnDefinitions> <Grid.RowDefinitions> <RowDefinition Height="50"/> <RowDefinition Height="50"/> <RowDefinition Height="50"/> <RowDefinition Height="Auto"/> </Grid.RowDefinitions> <TextBlock Text="Command received: " Grid.Column="1" VerticalAlignment="Center" /> <TextBox x:Name="TextCommand" Grid.Column="3" VerticalAlignment="Center"/> <Button Content="Start Recognition" Click="Button_Click" Grid.Row="1" Grid.Column="1" VerticalAlignment="Center" /> <TextBlock Text="Status: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="2" /> <TextBlock x:Name="TextStatus" Grid.Column="3" VerticalAlignment="Center" Grid.Row="2"/> <TextBlock Text="Bot response: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="3" /> <TextBlock x:Name="TextOutputBot" Foreground="Red" Grid.Column="3" VerticalAlignment="Center" Width="Auto" Height="Auto" Grid.Row="3" TextWrapping="Wrap" /> </Grid>
</Page>
private async void Button_Click(object sender, Windows.UI.Xaml.RoutedEventArgs e) { TextStatus.Text = "Listening...."; // Create an instance of SpeechRecognizer. var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer(); // Compile the dictation grammar by default. await speechRecognizer.CompileConstraintsAsync(); // Start recognition. Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync(); TextCommand.Text = speechRecognitionResult.Text; await SendToBot(TextCommand.Text); }
The SendToBot() method makes use of the Direct Line APIs which permit communication with the Bot using the channel previously defined:
private async Task SendToBot(string inputText) { if (!string.IsNullOrEmpty(inputText)) { var directLine = new DirectLineClient("<Direct Line Secret Key here>"); var conversation = await directLine.Conversations.StartConversationWithHttpMessagesAsync(); var convId = conversation.Body.ConversationId; var httpMessages = await directLine.Conversations.GetActivitiesWithHttpMessagesAsync(convId); var command = inputText; TextStatus.Text = "Sending text to Bot...."; var postMessage = await directLine.Conversations.PostActivityWithHttpMessagesAsync(convId, new Activity() { Type = "message", From = new ChannelAccount() { Id = "Davide" }, Text = command }); var result = await directLine.Conversations.GetActivitiesAsync(convId); if (result.Activities.Count > 0) { var firstOrDefault = result .Activities .FirstOrDefault(a => a.From != null && a.From.Name != null && a.From.Name.Equals("Davide Personal Bot")); if (firstOrDefault != null) { TextOutputBot.Text = "Bot response: " + firstOrDefault.Text; TextStatus.Text = string.Empty; } } } }
And then I got the app running on HoloLens and interfacing with a Bot using LUIS for language understanding and Speech recognition:
Happy coding!