I think people have done something similar with reddit actually. They tried to make redit conversations, and well as expected they were pretty funny. Try google searching ou might be able to find it.
Is your code intentionally verbose (for the sake of being explicit)? It seems like it could be condensed a lot by using Pythonic structures. For example you could replace block 39 with a one-liner:
Genre_ID_to_name=dict([(g['id'], g['name']) for g in list_of_genres])
In other places, you would benefit a lot from the enumerate(..) function, which returns (index, item) tuples when called on a list.
Precisely. I strongly believe that the purpose of tutorials is to be inclusive of all people. That's something I realized as a TA, making things explicit never hurts. There's always someone who can gain from more detail :)
I'm going to throw out a plug for that mindset going beyond tutorials.
It is a suspect proposition that anything is gained by turning 3 lines of code into one line of code. Unless it is javascript for the Google homepage or somesuch where the bytes matter. Moving code from a bad data model to a good one usually correlates with a big reduction in line count, but the gain is in choosing more appropriate data structures and not in the number of lines removed.
Every reader of code, including the author after 3 months, is going to have to read and understand the code from scratch. One line doing a multidimensional transform of the data is going to scan for a small fraction of people. That one liner would take about 3 times as long to understand as any one line of the tutorial code. The data model hasn't changed either. If anything, I'd argue that the nature of the transform being done is clearer in 3 lines.
Readability of the code is always relative to the reader. In any language that I've mastered, I've always preferred more verbose options at first, but with experience I found out that I find condensed versions to be more elegant and time-saving.
In the end, choose code style that matches level of your audience. If you write a one-off python script that automizes some build process in mostly non-python codebase, it should probably be very verbose and easy to understand. If, on the other hand, you're writing code in a decently advanced codebase and most of your colleagues are fluent in the language (or at least, supposed to be), it makes sense to use as much condensed syntax sugar as possible.
Also, the code for the PyTorch version has been contributed by https://github.com/AnshulBasia. But it is basically a port of my original version in Keras, which was equally verbose :)
Couldn't agree more. Coming from a world where I strive to come with great self explanatory naming conventions, Python code most often looks minified to me, and I have a hard time reading it...
I agree with the sentiment in general, but that example from the OP is actually pretty clear, no? Some might argue it is at least as clear as the code it is intended to replace.
My guess is based on Python 2, where range(n) would return a fully inflated list of 0..(n-1). If len(x) is large, you could be allocating giant temporary lists just to iterate through them once. Using xrange() was the Python 2 solution (it would return a generator instead), but, if I recall correctly, Python 3 fixed this s.t. range() returns a generator.
Just started an ML course this semester. I am not sure if I even have time to use this as additional resource, but it looks super awesome after skimming through it.
Definitly going into my favorites and if I don't use it as additional resource now, I will read it later. Thanks for making all this work public!
Working on them already! Next one is going to be on Word Embeddings for Natural Language Processing. Basically, how do we convert words and sentences to numbers so that a computer can work with them. Applications like Text classification, sentiment analysis all of them depend on this one single fundamental backbone!
To quote one of the greatest professor in ML Pedro Domingos - "First-timers are often surprised by how little time in a machine
learning project is spent actually doing machine learning.
But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it,
and how much trial and error can go into feature design.....Learning
is often the quickest part of this, but that’s because we’ve
already mastered it pretty well! Feature engineering is more
difficult because it’s domain-specific, while learners can be
largely general-purpose."
I couldn't find much, that's why I stressed on it in the tutorial. Scraping is a fun hobby but it's extremely useful. I strongly suggest spending time using python's selenium and beautiful soup libraries. The former is good to automate pages with javascript elements, and the latter to parse HTML!
And people like you who take time out to read and learn is exactly the reason why people like me write such articles! Absolutely thrilled that people liked it and that I will be contributing in people learning this beautiful field of science I do research in :)