Hacker News new | past | comments | ask | show | jobs | submit login

With this solution, it looks like Django is keeping the Elasticsearch index in sync with the SQL database. What happens if the Django process crashes after having committed data to the SQL database but just before having updated the Elasticsearch index? How do you reconcile the index with the database after the fact?



Three options, depending on how demanding your users are:

1.) Don't. So what? That entry will never show up in search results, which is probably exactly what would've happened if you use a search engine with poor ranking, and exactly what will happen if you don't provide search at all.

2.) Blow away your index periodically and re-create it at off-peak times, or upon crash. Works as long as your data set is small enough to read it all off-peak.

3.) When you read your search results back, check them against the source-of-truth and re-index anything that's inconsistent. Relatively easy if there's a 1:1 correspondence between ElasticSearch documents and RDBMS tables; gets more difficult with complex joins.


From my memory, haystack gives you a couple of manage.py commands for synchronisation. So you'd run one of those when spinning things back up.

All the usual caveats about the inherent complexity of distributed systems apply, but it's still pretty convenient.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: