Hacker News new | past | comments | ask | show | jobs | submit login

Can you please share either the frequency list or the method you used to automate the deck generation? I've been looking to do something like this, perhaps with genanki.



I'm learning Italian, so I found this list:

https://www.internazionale.it/opinione/tullio-de-mauro/2016/...

It's based on the most frequent words in a collection of texts, with additional manual curation. The words have grammatical classification, but no translations.

I basically wrote a node.js script using some npm libraries, one that can parse PDFs and another one that can drive a headless Chrome.

I parsed the words from the list, and then looked up each one in an online dictionary using the chrome driver. Then I retrieved the translation from the DOM, making sure that I get the right part of speech for each word in the list (sometimes the same word can be an adjective and a noun for example).

The dictionary that I used also has example sentences, grammatical gender for nouns, a description of the meaning of the word in Italian, and very importantly, a phonetic transcription in IPA. I grabbed those as well.

Then I just dumped the whole thing to a CSV file and imported it into Anki, and I created a custom card template to show all this information. I also added text to speech which is wrong a lot of the time so I don't rely on it.

I only use Italian to English cards. I make sure that I know the pronunciation and the grammatical gender for nouns and not just the spelling.

The whole thing took maybe a weekend to write (I don't even regularly use javascript). The learning was the hard part (I spent at least an hour per day all of these days). It was only possible because I'm working from home.

The cards are perfect in most cases, but some require manual fixes (I'd say 5%).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: