2020-12-28: Nutri-Assist: A Personal Voice-Enabled Assistant to Find Food Nutrition Values

This Fall I took the CS 795/895 Natural Language Processing offered by Dr. Ashok. This was a research-focused and project-based course where we were briefly introduced to the tools and techniques in NLP. We learned about Language Models, POS Tagging, Word Embedding, Dialog Systems, along with other topics. Later we were assigned to build up different systems based on our interests. Dr. Ashok encouraged us to build something that would solve some problems that do not exist in the market. Being fascinated with dialog systems and information extraction techniques, I chose to work on building a personal assistant that would help users to find food nutrition values using natural language voice commands.



Figure 1: A snapshot of the JSON response returned by the FoodData Central API

So what problem Nutri-Assist really solves

My target was to make a website for computer browsers as well as a progressive web app for mobile devices just to increase user accessibility.

While using the PC, generally we go to a web browser, open up the search engine, and type (or speak) something like "apple nutrition" to get nutrition information about an apple. For mobile, there are currently two types of systems to choose from. One is the general assistant like Google Assistant, Siri, Alexa and another one is the mobile applications. With general assistants like Google Assistant acts just like the search engine from the web browser and does not allow users to put a casual command like "I had an apple" for getting the information. On the other hand, there are a handful of mobile applications where users need to type their choice of food and then have to choose from an enormous list of search results. Moreover, no system (assistant or application) can process more than one food item and show the nutrition values at once in an arranged fashion.

Hence, my goal was to build a system that can take casual voice command like "I ate an apple and an orange", process it, and then show the user about the nutrition value for apple and orange on the same page in an organized way, instead of throwing a list of search results or redirecting to a search page. Therefore, it will essentially help to fill the gaps that currently exist in the industry.

System Architecture

Nutri-Assist architecture and workflow diagram

Figure 2: Nutri-Assist architecture and workflow diagram


When a user provides a command, it is received by Automatic Speech Recognizer (ASR) and gets converted into the text from speech. It takes a couple of seconds to do the operation and send the text to the Interaction Manager.

The Interaction Manager does a couple of things. First, it separates the food items from the text. Then it sends them to the Database Manager, one by one. When the food item name reaches the Database Manager, it gets checked against the Food Database to determine the related items to send back a proper response. Food Database contains a lot of food items with detailed information like nutrition values, ingredients etc.

After getting a response from the Database Manager, the Interaction Manager decides if the food item name (extracted from the voice command) is clear enough by ranking the responded items. If it's not clear then Nutri-Assist asks the user to be more specific about that item using Dialog Generator. Otherwise, it provides the nutrition information of the highest matched item. Currently, the ranking is calculated using cosine similarity.

Case Studies

I have divided the type of user commands in several case scenarios:

  1. The user asked for only one item that is directly matched with the food database. E.g. I ate an apple. 
  2. The user asked only one item that is directly matched with the food database, but there are many options for the same match, which in a sense creates a vague command. E.g. I had soup. There can be hundreds of types of soup, which one is requested? The system further asks the user to clarify the choice in this case
  3. The user asks for multiple food items and each of them matches directly with the item in the database. E.g. I ate an apple and orange
  4. The user asks for multiple food items but one of them(or each of them) has so many matches in the database. Which also creates vague commands. E.g. I ate an apple and I had soup for lunch.

Also, there can be two types of user scenarios:

  1. If the user uses the system for the first time
  2. If it’s the returning user and gives the same or different command

For the first time users, it’s straightforward and gives just the results. For the returning users, there can be a case where they have a history of choices which the system goes over before showing the result. For example, if the returning user chose tomato soup when they were asked to be specific after saying soup, then if they again say soup the system will show the result for tomato soup without further asking.

Demo

Before the final presentation, I made a small demo to show off how Nutri-Assist works. For the purpose of demonstration, I used apple (provides direct result) and soup (invokes Dialog Manager to get a more specific choice).


The response is in JSON format. I will just have to extract the values and show them nicely in the UI (which is yet to be done). Also, storing the users' profiles for providing a personalized experience is yet to be implemented.

Github: https://github.com/rayansami/nutrition-assistant

 

--Sami 

Comments

Post a Comment