Making Sense of Senses

3 minute read Published:

Restructuring the Dictionary - Sumo Card Devblog 4

Changes to the Sumo Card Dictionary

There were a few changes that I wanted to make to the dictionary API which will make it more accurate and provide a better user experience for everyone. Initially, the English meanings were just thrown together for each word, without regard for meaning or part of speech. We decided to re-write the dictionary subsystem to keep track of each “sense” which will improve learning and allow us to be better stewards of the data we’re mirroring from our generous dictionary source jisho.org.

Tracking the senses

A “sense” is a linguistic term for an individual meaning. Previously we were not storing the grouping of the various English meanings for each Japanese word which is poor form. We’ve decided to alter our internal storage of the words to mirror the structure of data we are receiving from Jisho.org’s api. This will allow us not only have a more accurate database, but we can also provide an external Japanese language API for people to use. While we do not save 100% of the data that Jisho.org provides, we do offer a fair amount of dictionary data which may be useful for others to use.

The upgrade process

In order to upgrade the dictionary system, we’ve had to make a few design decisions that deviate from jisho.org’s dictionary in order to make the quiz application easy to use. Here are the changes:

  • Words that have more than 50% of their meanings tagged “Usually written using Kana Only” are now permanently tagged as kana only. Kana only words will not have their Kanji in the quiz system. We do store the alternative Kanji readings and are planning to implement a feature in the future that will allow users to control how these Kanji are presented in quizzes in the future.

  • Certain words have had their primary Kanji changed to match the primary key in Jisho.org’s data. This is to facilitate the import process in the future for words that may not match the jisho data. You will see these changes in the upcoming days.

Why?

We’re making this change for a few reasons, first, we really appreciate the jisho.org api and want to ensure that we are carefully and economically using this valuable resource. We are doing our best to reduce the amount of queries that hit their servers and preserve the data correctly so we can offer our own data to users. Also, properly separating the senses will allow us to add more features such as, example sentences, an improved quiz interface and an overall better experience to our community.