Video Demo on YouTube
"JARVIS Web" is a web voice assistant app that resembles the fictional character "JARVIS", a virtual AI assistant from the comic/movie Iron Man. The app is built with DialogFlow
, Node.js
, Firebase Functions
, actions-on-google
, Socket.io
and Pixabay API
. Part of the project is enabled by a previous project. For more information please find the blog for more information.
Welcoming React to waking Introduce what he is Tell the user what he can do Image search Ask the user to pick and play the corresponding sound effects from the movie Respond to compliments Schedule an event Read out events Ask about a day Ask about how many days until the day React to unavailable request Goodbye
The task for this final project is to essentially connect the dots, giving the Dialogflow
a front end, meanwhile replacing SSML
with native JavaScript. In addition to just text, having a frontend means I will be able to provide image contents, that is why I am also including an image search function.
Code below is a direct reference from Nicole He.
const sessionClient = new dialogflow.SessionsClient({
keyFilename: process.env.KEYFILENAME
});
const projectId = "jarvis-ai-oxwx";
app.use(express.static("public"));
const PORT = process.env.PORT || 8080;
const server = app.listen(PORT, () => {
console.log(`listening on port ${PORT}`);
});
const io = socketIO(server);
io.on("connection", (socket) => {
console.log("new user: ", socket.id);
// receive what the person said from the browser
socket.on("send to dialogflow", (data) => {
console.log(data);
sessionClient
.detectIntent({
session: sessionClient.projectAgentSessionPath(projectId, "1"),
queryInput: {
text: {
text: data.query,
languageCode: "en-US"
}
},
})
.then((response) => {
const result = response[0].queryResult;
console.log(result);
let params = result.parameters.fields;
let text = result.fulfillmentText;
let intent = result.intent.displayName;
socket.emit("stuff from df", {
params,
text,
intent
});
});
});
});
A holly grail hamburger layout is incorporated into the frontend design - a top bar, the main content window, and the footer. While the conversation goes on, the text bubble will be populated and appended to the window. Using jQuery
, a very simple fade-in effect is implemented.
// speech synthesis
const speak = (text) => {
// create British male voice
let utterThis = new SpeechSynthesisUtterance(text);
var voices = synth.getVoices();
var selectedOption = voices[50];
utterThis.voice = selectedOption;
// create dialog bubble
let itemDiv = document.createElement("div");
itemDiv.className = "item";
let computerSpeechDiv = document.createElement("div");
computerSpeechDiv.className = "computer-speech-div";
computerSpeechDiv.textContent = text;
itemDiv.appendChild(computerSpeechDiv);
// show dialog bubble, and scroll down
$(itemDiv).hide().appendTo(".conversation").fadeIn(500, () => {
conversation.scrollTop = conversation.scrollHeight;
});
synth.speak(utterThis);
// toggle mic button status
mic.id = "mic";
};
and the below line of CSS is very handy for getting a hamburger layout in a sec
grid-template-rows: auto 1fr auto;
In order to play a sequence of audio files, a recursion method is possibly the most neat way to do so.
const playNextSounds = (sounds) => {
if (sounds.length > 0) {
const audio = new Audio();
console.log(sounds);
audio.src = sounds[0];
audio.currentTime = 0;
audio.play();
sounds.shift();
audio.addEventListener('ended', function () {
return playNextSounds(sounds);
})
}
}
// front end js code
const getImage = (phrase) => {
let url = `/image/${phrase}`;
let result = fetch(url, {
mode: "no-cors"
})
.then((response) => response.text())
.then((result) => {
console.log(result);
let imgUrl = result;
let computerSpeechDiv = document.querySelector(".conversation").lastChild.lastChild;
let br = document.createElement("hr");
let img = document.createElement("img");
img.src = imgUrl;
// computerSpeechDiv.appendChild(img);
$(br).appendTo(computerSpeechDiv);
$(img).hide().appendTo(computerSpeechDiv).fadeIn(500, () => {
conversation.scrollTop = conversation.scrollHeight;
});
});
}
// backend nodejs code
const fetch = require("node-fetch");
app.get("/image/:phrase", async (req, res) => {
let phrase = req.params.phrase;
phrase = phrase.replace(/\s+/g, '+').toLowerCase();
console.log(phrase);
let url = `https://pixabay.com/api/?key=${process.env.PIXABAYKEY}&q=${phrase}&image_type=photo&pretty=true`;
fetch(url, {
mode: "cors"
})
.then((response) => response.json())
.then((result) => {
console.log(result);
let imgUrl = result.hits[randomNumber(0,3)].webformatURL;
res.send(imgUrl);
});
});
Please, never ever upload your authentication files or API key to a public environment, like GitHub. What I experienced at the very last minute was my Google Cloud Project
got suspended, as possibly some hacker exploited my service-account.json
file and used it for bitcoin mining. Therefore Google shutdown my project and I have no access to the original project anymore. I had to copy all the DialogFlow
intents line by line manually.
So the rule of thumb is, start the .gitignore
file right away whenever you creat authentication files in your working folder. If it is an API key, be very careful to delete that in the code and replace that with your corresponding process.env.Something
variable name.
As the authentication file is no longer available to the public on GitHub, the only way to have your project hosted online will be pushing your project straight to Heroku
with the terminal CLI tool. Which might be less convenient then having GitHub
to auto deploy changes to Heroku
, but this is the way, to have the authentication file living on your host without security issues...
Like how Nicole suggested, I can easily turn this AI to be something more interesting by incorporating a more "Iron Man" like context, by giving some interesting functions that might appear in the movie, instead of something that the Google Assistant can do. I can easily come up with more examples for such case.
- ask about the status of the iron man armor
- ask JARVIS to perform some very complicated scientific task
- call Pepper Potts
- be even more creative
Definitely I was so bounded by the idea of a practical voice assistant that works for us in real life.