Anki Chinese Sentence Suggester
Enhancing Chinese vocabulary learning through contextual sentence review
The "Sentence Suggester" Project
Background
Pure spaced repetition vocabulary review for language acquisition (using, e.g., Anki) is boring! And even if you have the discipline to sit through hundreds of card reviews each day, you're leaving learning outcomes on the table by not reviewing the vocabulary within a natural language context. Yet, a scheduled and tracked review system gives back the motivation of visualized progress towards goals and the comfort of knowing you're "on track" and not wasting time. Other methods of language acquisition, such as graded readers or pre-recorded and subtitled audio/video, give the most natural learning experience, but they also frustrate with their non-optimal and impersonal approach to content. What if these approaches could be combined? They can. (Obviously.)
I began pondering that question in 2020 after years spent studying thousands of Anki flashcards for Chinese, totalling hundreds of hours (no joke!) doing recall word-by-word. At the time, just on the cusp of GPT-3's debut, I hatched a plan to brute force through scraped Chinese web content to surface natural Chinese sentences that contained a high density of due-to-be-reviewed flashcards: a review of a single sentence would count as a review of all of its constituent words. I started tinkering with a possible Anki extension to allow this. By the end of 2020, I was working on a standalone R Shiny application (the only web framework I knew) with the OpenAI API to create novel sentences on the fly. [Life happens.] At the end of 2024 I picked up the project again. Not only had LLMs exploded in capabilities, but I could now plausibly write software applications for myself, aided by coding assistants like Aider and Claude. And now at the beginning of 2025, my dream seems within reach...
The highlight of this project is a standalone Anki "extension"
The Anki Chinese Sentence Suggester is an Anki "extension" that suggests natural language sentences to review based on your existing Anki word deck. It is an extension in the sense that it expects to find an Anki installation and collection on your local machine; however, it is in every other way a separate application. The app assumes that the user is an English speaker trying to learn Chinese, reviewing Chinese vocabulary flashcards on a daily basis. The normal flashcard review process within Anki is to be shown a chinese word written in simplified Chinese characters, with the challenge to read the word aloud in Chinese and recall its English equivalents. The app promises to significantly improve the process of reading and vocabulary acquisition. Because the app talks with Anki's database, it is able to credit you with the words that you review in the app, and you won't need to review them in Anki later on. This lets you spend more time reading natural language and less time reviewing atomic vocabulary words, while still keeping track of those words in need of review.
The key problem that is addressed by the app: spaced repetition vocabulary review (e.g. with Anki) of a total vocabulary numbering 1,000+ cards is laborious, easily entailing 200-400+ reviews per day. The solution: Prepare a custom reading sample each day that uses most or all of the words that are due for revision. The benefits here are manifold:
- The reviewer gets to see their needed vocabulary in a natural language context.
- The reviewer gets to absorb Chinese culture, since the reading sample can be geared towards any relevant topic, including Chinese cultural topics.
- The reviewer likely spends less time in review, and yet still sees all of their cards.
This project is designed to provide a more context-rich and efficient approach to language learning, particularly for those managing large Anki decks. Updates will follow as the tool evolves.
Key features of this system are:
- Interactive Review Interface:
- Web-based platform for studying Chinese sentences
- Multiple Corpus Sources:
- AI-generated custom sentences using OpenAI
- Existing corpus filtering and optimization
- Manual text input with sentence separation
- Translation and Dictionary Tools:
- Real-time translation via Google Cloud
- MDBG dictionary integration for word lookups
- Anki Integration:
- Direct database connection
- Review progress tracking
- Tag and deck-based filtering
- Progress Tracking:
- Comprehensive statistics and visualization
- Session progress monitoring
- Review history insights
The key embedded technologies are:
- Text segmentation (using Jieba Python Package currently), to identify words within a Chinese text.
- Text generation
- Using LLMs (OpenAI/ChatGPT, Anthropic/Claude, Meta/Llama, etc.) to build totally custom reading passages from a word list
- Using a "corpus sorter" (custom built) to sort and recommend pre-stored text passages.
- Allowing user-input of a reading passage
- Text translation (using cloud services, e.g. google cloud here) to assist the reader in semantically grokking the reading package
- Word translation (using html requests to MDBG) to help the digger understand individual words in great depth
- Spaced repetition (using Anki's existing database and methods) to identify words for review, and to mark them as reviewed
Resources and Bibliography
Tech to build from
Other good Chinese-learning apps and approaches out there
- Anki, of course
- Hack Chinese
- A great "Anki Reskin", but without the natural language functionality I am interested in.
- LingQ
- Very close to what I want, but bloated and slow, and behind obtrusive paywall features.