Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
NLP analysis of Pride and Prejudice (github.com/cytora)
78 points by hribo on Dec 14, 2016 | hide | past | favorite | 13 comments


Hmm, why isn't the cell output included?

Additionally, Jupyter Notebooks on GitHub do not render; instead, you should visit the site of the original post: http://www.cytora.com/insights/2016/11/30/natural-language-p...


They do if you don't strip the results before publishing, which seems to be what happened here.


From reading the commands, the cells were never even executed.


Cell output is now included in repository. Thanks for the feedback.


That original post on cytora seems quite a bit better to me than the github link since it gives more context. Thanks!


Check out textacy, a higher-level text library built on top of spaCy, for better keywords extraction:

https://github.com/chartbeat-labs/textacy

https://textacy.readthedocs.io/en/latest/api_reference.html#...


Spacy is a great library and tutorials like this give a clear and simple path for testing it out. They even included a function for reading a file, rather than assuming the audience are all python programers.

(Not that reading a file is hard, but it's an extra few minutes as you google how to read a file in X language.)


Spacy is awesome. The API is clean and powerful. I have been using it heavily over the past few months, using it almost exclusively for feature extraction now. I'm currently working on extracting subject-verb-object tuples, which is amazingly easy to do, because I am finding these to be much more powerful than unigrams or ngrams for classification.

Named Entity extraction in Spacy is another killer feature. It's been instrumental in some fraud detection work I've been doing as well.


> working on extracting subject-verb-object tuples, which is amazingly easy to do

Can you share any of your work or give examples?


This is not my work, but will show you how it's done: https://nicschrading.com/project/Intro-to-NLP-with-spaCy/


They forgot to run the notebook before uploading the file to github, so there are no results. What a shame.

Anyway, this is the 2nd part of a 3 part tutorial. Full setup instructions and the 2 other parts are available here: https://github.com/cytora/pycon-nlp-in-10-lines


Output is not included in the notebook. Hope you still like the tutorial.


* Output is NOW included in the notebook




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: