Hacker Newsnew | past | comments | ask | show | jobs | submit | spandan-madan's commentslogin

I think people have done something similar with reddit actually. They tried to make redit conversations, and well as expected they were pretty funny. Try google searching ou might be able to find it.



To everyone - I literally came back home from lab to find this blown up. Little overwhelming frankly.

I built the bot as a gift for a friend, and didn't really see the Black Mirror angle, even though I have seen the series.

Anyway, if someone is interested in building extensions of this, please write to me at smadan@mit.edu, I'd be happy to collaborate and guide :)


Yup! That version was in Keras. It's now been re-written in PyTorch as well! Thanks to https://github.com/AnshulBasia.


Any update on the NLP tutorial? I keep checking this repo [0] but seems it hasn't been updated lately. I hope you didn't abandon this project

[0] - https://github.com/Spandan-Madan/NLP-Intuition-and-Applicati...



For feature requests on this, please create an issue on the github Repo!

For future tutorial suggestions, mail me at smadan@mit.edu. A new one on NLP is coming soon!


Is your code intentionally verbose (for the sake of being explicit)? It seems like it could be condensed a lot by using Pythonic structures. For example you could replace block 39 with a one-liner:

Genre_ID_to_name=dict([(g['id'], g['name']) for g in list_of_genres])

In other places, you would benefit a lot from the enumerate(..) function, which returns (index, item) tuples when called on a list.


Precisely. I strongly believe that the purpose of tutorials is to be inclusive of all people. That's something I realized as a TA, making things explicit never hurts. There's always someone who can gain from more detail :)


I'm going to throw out a plug for that mindset going beyond tutorials.

It is a suspect proposition that anything is gained by turning 3 lines of code into one line of code. Unless it is javascript for the Google homepage or somesuch where the bytes matter. Moving code from a bad data model to a good one usually correlates with a big reduction in line count, but the gain is in choosing more appropriate data structures and not in the number of lines removed.

Every reader of code, including the author after 3 months, is going to have to read and understand the code from scratch. One line doing a multidimensional transform of the data is going to scan for a small fraction of people. That one liner would take about 3 times as long to understand as any one line of the tutorial code. The data model hasn't changed either. If anything, I'd argue that the nature of the transform being done is clearer in 3 lines.


Readability of the code is always relative to the reader. In any language that I've mastered, I've always preferred more verbose options at first, but with experience I found out that I find condensed versions to be more elegant and time-saving.

In the end, choose code style that matches level of your audience. If you write a one-off python script that automizes some build process in mostly non-python codebase, it should probably be very verbose and easy to understand. If, on the other hand, you're writing code in a decently advanced codebase and most of your colleagues are fluent in the language (or at least, supposed to be), it makes sense to use as much condensed syntax sugar as possible.


I agree, gains should go to reading the code most of the time. But when you say

> It is a suspect proposition that anything is gained by turning 3 lines of code into one line of code.

Some code is way more verbose than its description would be. A named function signals intent:

  function multidimensional_transform(the_data)
And for Javascript ES2015 there is new syntax (like spread operators) that improves code the same way.


Also, the code for the PyTorch version has been contributed by https://github.com/AnshulBasia. But it is basically a port of my original version in Keras, which was equally verbose :)


Couldn't agree more. Coming from a world where I strive to come with great self explanatory naming conventions, Python code most often looks minified to me, and I have a hard time reading it...


I agree with the sentiment in general, but that example from the OP is actually pretty clear, no? Some might argue it is at least as clear as the code it is intended to replace.


A more Pythonic way do to this would be

    id_to_name = {g['id']: g['name'] for g in list_of_genres}
And

    for i in range(len(list_of_genres))
is really a dangerous antipattern better replaced with

    for genre in list_of_genres:


And if you need the index:

    for idx, genre in enumerate (list_of_genres)


Out of curiosity, how is that dangerous?


My guess is based on Python 2, where range(n) would return a fully inflated list of 0..(n-1). If len(x) is large, you could be allocating giant temporary lists just to iterate through them once. Using xrange() was the Python 2 solution (it would return a generator instead), but, if I recall correctly, Python 3 fixed this s.t. range() returns a generator.


Just started an ML course this semester. I am not sure if I even have time to use this as additional resource, but it looks super awesome after skimming through it. Definitly going into my favorites and if I don't use it as additional resource now, I will read it later. Thanks for making all this work public!


Does the ML course have videos that can be accesed online?


Sure, would love to get in touch about my work over mail! What's your email ID Andrew?


andrew@pair3d.com


Working on them already! Next one is going to be on Word Embeddings for Natural Language Processing. Basically, how do we convert words and sentences to numbers so that a computer can work with them. Applications like Text classification, sentiment analysis all of them depend on this one single fundamental backbone!


That sounds great! Hope to catch it on here when you post it, thanks again for this tutorial - it's a fantastic resource.


To quote one of the greatest professor in ML Pedro Domingos - "First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it, and how much trial and error can go into feature design.....Learning is often the quickest part of this, but that’s because we’ve already mastered it pretty well! Feature engineering is more difficult because it’s domain-specific, while learners can be largely general-purpose."


Hi!

I couldn't find much, that's why I stressed on it in the tutorial. Scraping is a fun hobby but it's extremely useful. I strongly suggest spending time using python's selenium and beautiful soup libraries. The former is good to automate pages with javascript elements, and the latter to parse HTML!


And people like you who take time out to read and learn is exactly the reason why people like me write such articles! Absolutely thrilled that people liked it and that I will be contributing in people learning this beautiful field of science I do research in :)


Thanks for putting so much time and effort into this. This is definitely not "Yet-another-intro-to-ML".


I'm gonna go through it as well, that was a significant amount of work you put in.


Respect!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: