Chatsim GUI

Chatsim

@ UBC Emerging Media Lab

Chatsim is a language learning application that utilizes the immersive benefits of VR to enhance the quality and rate of language learning.

client

Director of UBC English Institute
Department of Central, Eastern and Northern European Studies

contribution

UX Design
Conversation Design

team

Sandy Co, Julia Chu, Felicia Chan, Dante Cerron

Emerging Media Lab

EML is an experimental space where faculty, students, and staff from all disciplines collaborate with industry and the community. Its mission is to evolve learning by creating tools and techniques using emerging media. For this particular project, we worked alongside the UBC English Language Institute and German Language Faculty.

Our goal with Chatsim was to harness the immersive benefits of VR to enhance the quality and rate of language learning.

In the longer-term Chatsim would have involved full teaching modules of varying levels of linguistic proficiency, as well as tailored tasks and environments.

Problem

Graphic User Interface

I was brought in after the first version of the application had been completed. We wanted to identify the main issues in the current design and understand why students were reluctant to use the application in their language learning studies.

After compiling and organizing the results and insights for the alpha user test, my team and I identified the two main issues that stumped our users: unfamiliar vocabulary and mistake identification.

For new language learners, the openness of the VR scene was too overwhelming given their limited vocabulary. They were already unfamiliar with the language, let alone the technology. Therefore, we identified that certain features were needed to help guide the user through the experience.

Conversation Design

The original plan proposed by the PIs was to design 2D Canadian geographical features in the background and 3D Canadian food models across the scene to make the interaction uniquely Canadian and immersive. However, as a conversation designer, I believed that the immersion experience would be best determined by the dialogue design between the user and the NPC of the scene. I proposed these changes to the Principal Investigators, design, and technical leads and was promoted to spearhead the [Immersion] functionality of this sprint.

As the conversation designer for this project, I also began to integrate voice UX design principles and explore the changes that can be made to the current dialogue interactions. I identified a few issues with the current dialogue design as well.

The user is only notified of a mistake when the NPC says: “I don’t understand”. It is hard for the student to self-correct without any help in identifying their mistake: e.g. “Which word did I pronounce wrong?” Users are thus generally left in the dark as to what to do next. It is equally difficult to differentiate between a pronunciation error vs. a semantic one: e.g. “Did I pronounce the word wrong or did I use the wrong word altogether? There is no visual help or guidance when it comes to language learning.

Challenges

Graphic User Interface

The current VR scene exists in a convenience store, where the user must interact with the store clerk to order a bottle of water or a cup of coffee. It was challenging to design a visual guide that follows the user without taking up too much visual real estate. Because we are working with VR, it is important that we make the interface simple and straightforward without overwhelming the user. The biggest challenge in this sprint was to design a sort of visual guide to help the user navigate the program as well as their language learning.

Conversation Design

Feature 1: Fine-tuning Error Handling

There was previously only one default fallback for all cases where our chatbot could not understand the user. However, this gave rise to unnatural interactions between the user and chatbot and discouraged our users from trying again. In other words, whether the chatbot had difficulty identifying the words spoken by the user or whether the user made utterances not in the chatbot's library, the same fallback was used. In that case, users will not know how to move forward in this interaction. For example, when a user asks "Can I get a cup of coffee" and mispronounces the word coffee in such a way that the chatbot cannot recognize that word, the NPC just responds with "I don't understand". With that blanket response, the user will not know which word they mispronounced, what was wrong with the sentence, and how to fix the issue moving forward.

With these things in mind, I worked with the tech team to find a way to fine-tune our error handling so that we can not only encourage our targeted users to self-correct and provide a more natural, immersive experience but also stay frugal with the resources that we currently have.

Feature 2: Canadian Flairs

Per our principal investigator's hypothesis and goals, the scene was to include Canadian flairs for “immersiveness”. The principal investigators proposed 3D models of unique Canadian goods such as ketchup and all-dressed chips to let the international students studying at our Canadian university feel immersed in Canadian culture. However, these minute details may not necessarily provide the experience the PI's were aiming for. For one, the new international students may not recognize that these goods are uniquely Canadian and may not even notice the details at all. The students will be primarily focused on the task at hand and the dialogue interactions with the chatbot.

Solutions

Graphic User Interface

AR Within VR - “Learning Mode”

We wanted to solve two main issues that may stumble our users: unfamiliar vocabulary and mistake identification. By creating a “Learning Mode”, we hope to alleviate the frustrations of our users while staying true to our researcher/language instructor’s pedagogical goals. This “Learning Mode” is presented as an AR function within the VR experience, activated when students require assistance.

After discussing with our principal investigators, we began to map out the interactions in the scene where this functionality would take place. We mulled over what items users would need in the “learning mode” based on the results from the first user test and the language learning goals of the professors. For users to better identify the “Learning Mode”, the scene is tinted when the AR within VR function is activated.

Items in the AR within VR feature:

Speech to text showing conversation between user and NPC
Mistake identification (Different colours for different types of mistakes)
Legend for the representative colours
Vocabulary labeling

Based on the line of sight principles for VR, we decided to place the Speech to Text function in the scene right at the border of the green and yellow zone. We did this so the speech bubble would not get in the way of the interaction but would stay enough in view so the user would not have to strain.

Conversation Design

Using Per Word Confidence Levels

By leveraging our current speech recognition setup in Unity, I fine-tuned the branching so that sentences with 1 below threshold lexical item would warrant a response from one pool of responses whereas sentences with 2 or more below threshold lexical items would warrant a response from a different pool. By doing this, dialogue between chatbot and user flowed more seamlessly.

User: “Can I have a cup of cooffee?”

NPC: “Sorry, did you say you wanted a coffee?”

————————————————————————————

User: “Can I haf a coop of cooffee?”

NPC: “Sorry, I couldn’t quite catch what you said! Could you repeat your order please?”

Refinement

Feedback UX

Given that our project is for language instruction, our VUI feedback differs slightly:

Our users are assured their utterances are processed, given the response our NPC gives. Otherwise, the NPC will resort to various fallback streams. However, we have implemented real-time text as a VUI feedback for an AR within VR mode, triggered after 2 rounds of attempted utterances.

This is done to encourage our targeted users, the students, to continue attempting the task before receiving the answer. The AR within VR learning mode is where our real-time text will be implemented, to aid our students in identifying the errors they have made.

Testing

Originally, the lab was to collaborate with the English institute at UBC for testing. We would have conducted testing over a span of two English class blocks, with 32 participants, all beginner English learners.

The schedule would have been as followed:

Introduce Project
Complete consent form
Introduce the dialogue practice as conversation designer
Testing with VR
Complete Qualtrics questionnaire

However, due to COVID-19, the testing (set for March 23, 2020) was canceled.