More

kaffeinecoma · on April 30, 2016

People are knocking this guy for not being an expert and maybe getting some details wrong. Maybe it's a little bit like watching a non-programmer stumble their way through a blog post about learning to program- experienced programmers may cringe a bit.

But I really appreciate these kinds of write-ups: he declares his non-expertise up-front, and then proceeds to document his understanding as he goes along. There's something useful about this kind of blog post for non-experts.

I'm working my way through Karpathy's writeup on RNNs (http://karpathy.github.io/2015/05/21/rnn-effectiveness). I've mechanically translated his Python to Go, and even managed to make it work. But I still don't entirely understand the math behind it. Now obviously Karpathy IS an expert, but despite his extremely well-written blog post, a lot of it is still somewhat impenetrable to me ("gradient descent"? I took Linear Algebra oh, about 25 years ago). So sometimes it's nice to see other people who are a bit bewildered by things like tanh(), yet still press on and try to understand the overall process.

And FWIW I had the same reaction as the author when I started toying around with neural nets- it's shocking how small the hidden layer can be and still do useful stuff. It seems like magic, and sometimes you have to run through it step-by-step to understand it.

karpathy · on April 30, 2016

Sorry about that! There's a lot to cover for one blog post to do satisfyingly. I encourage you to check CS231n for a more thorough treatment where we also discuss, for example, the tradeoffs of different activation functions like tanh(), have a more gentle introduction on gradient descent, I devote a whole lecture to char rnn, assignment #1 (they are available) would demystify the backward pass, etc.

Also definitely +1 for not putting down people who write similar posts. I encourage everyone who is trying to learn to do it through blog posts because it lets you explain/organize thoughts. I also enjoy reading them quite a bit because it illustrates the kinds of conceptual problems beginners face (which is not at all obvious once you've been in the area for a few years). And it's also interesting to see many different interpretations of the same concepts, as everyone has different background and the way they reason through things is usually quite unique. Granted, this one could have been named something more appropriate!

kaffeinecoma · on April 30, 2016

No need to apologize- I learned SO much from your blog, thank you. I didn't realize the course was online (https://www.youtube.com/watch?v=NfnWJUyUJYU). Also, looks like there's a subreddit for it as well: https://www.reddit.com/r/cs231n

It's really wonderful that all of this is freely available, thank you.

jdminhbg · on April 30, 2016

The lecture that covers gradient descent in the Youtube list you linked there is the first time gradient descent actually clicked for me, and I made it through the entire Andrew Ng Coursera ML course. Highly highly recommend it.

andycjw · on May 5, 2016

the video became private, anyone know the title of the video or is there another copy of it somewhere else?

stavros · on April 30, 2016

> People are knocking this guy for not being an expert and maybe getting some details wrong.

I think this style of teaching has great value. Someone who's learning something themselves is the person most suitable to teach it to others, since they know exactly what a novice user doesn't know. For example, I wanted to write something up for monads the other day, since it's a simple concept that's made super confusing by people who dive into mathematical notation right away. The downside with this approach is that the novice lacks experience, so what they're learning may not be entirely accurate.

I think the best approach is a hybrid: Someone who is learning the material explains it, and someone who already knows it points out mistakes. In this case, HN can serve as the expert, and we all end up with a very informative post.

tobz · on April 30, 2016

One of my great regrets with leaving my last job was that: I had little FP experience, and we had one guy with a lot of it. We had done informal teaching sessions, and had planned to try to write blog posts / record podcasts of our little sessions, precisely for the reasons you mentioned, hoping that it would help others absorb the material more easily.

stavros · on May 1, 2016

Why can't you still do it? I'd read that.

tobz · on May 2, 2016

There's probably no reason I can't still do it, aside from physical distance problems and getting the free time lined up on both sides, which was far easier when we could just schedule it during working hours as 20% time.

If we/I ever do it, I'll make sure to send you a link. :)

pvnick · on April 30, 2016

Nando De Freitas has a great youtube channel with videos that you might find helpful, including this one in particular on unconstrained optimization: https://www.youtube.com/watch?v=QGOda9mz_yA&list=PLE6Wd9FR--...

pmarreck · on April 30, 2016

FYI, gradient descent is covered in one of the very first weeks of Andrew Ng's Coursera machine learning class, so perhaps just watch those lessons (free)

Gradient descent is the approximation solution basically because getting the exact solution requires a good computation of inverse matrices which is apparently not yet doable (it's too slow)

toxik · on April 30, 2016

I think the reason people do gradient descent is that the datasets are too large to solve for all inputs simultaneously. It isn't impossible in theory, really.

ctandre · on May 1, 2016

Do you mean to say that it is possible to design your parameters over all inputs without gradient descent? I'm somewhat confused, as I think that that would not be possible in the general case (e.g. nonlinear problems are hard to crack without resorting to an iterative procedure like gradient descent). I can see that gradient descent might still make sense for problems that do have clean analytic solutions (if that's what you meant), as those solutions often turn out to be junk at scale. Linear regression is a good example, as it has a nice closed form expression if the solution exists. But the complexity scales poorly as the naive implementation requires a matrix inversion, so a different method might be employed for a large problem - gradient descent could be a candidate.

I think gradient descent is attractive because it's a memoryless process at the batch level - you can process training data in batches instead of processing the entire dataset in one go, without any explicit tracking of the previous batch history. This is a great feature when the scale of your dataset is mind-boggling. I think this is what you were suggesting?

nshm · on May 1, 2016

Strictly speaking if you split the parameter set on batches and iterate over batches optimizing each set of parameters with a gradient, it is not strictly a gradient decent, it is more a combination of coordinate decent (because you select the subset of coordinates to optimize first) and a gradient decent.

ctandre · on May 1, 2016

Ah yes - that sounds like the stochastic gradient descent I've been hearing about. That makes a lot of sense for very expensive models. Thanks for the response nshm - I've recently taken an interest in ML (coming in with some familiarity with optimization), and it's much appreciated to have some 'REPL' in the learning process.

SixSigma · on April 30, 2016

http://cs224d.stanford.edu/syllabus.html

covers RNN on lec 8

kaffeinecoma · on April 18, 2016

Re-writing Git in C++? Can't tell if you're trolling or not. http://harmful.cat-v.org/software/c++/linus

pjmlp · on April 18, 2016

Yet given the actual state of Gtk development, even Linus is using C++ nowadays via Qt.

https://github.com/torvalds/subsurface

geofft · on April 18, 2016

Argument 1 is nontechnical and by someone who does not have a lot of experience writing userspace software.

Argument 2 is valid, but only applies to kernels, and also only applies to 1992 (it works a lot better now).

kaffeinecoma · on April 18, 2016

My comment wasn't meant to knock C++; it was to point out that Linus' opposition to the language is well-documented, and an official git-rewrite in C++ is very unlikely.

geofft · on April 18, 2016

Linus is almost entirely uninvolved with git these days, though. `git log --author=Torvalds` finds no commits in 2015 or so far this year, three patches in 2014, none in 2013, five in 2012, four in 2011... if the git core team felt using C++ was a possible course, Linus would not be involved in the decision.

(That's probably why I misunderstoood your statement an argument against C++ on its own merits.)

kaffeinecoma · on April 14, 2016

I doubt it. I did Java for over a decade, but jumped at the opportunity to use Go with its statically compiled binaries. The JVM is great, but being tied to it is kind of a hassle. Being able to hand a small binary to someone and say "here, run this" with no worry about dependencies- it's a beautiful thing.

sievebrain · on April 14, 2016

The downsides are enormous.

You can use the 'javapackager' tool in Java 8 to make a tarball that contains a JRE. It's not small, but that's a result of the age of the platform; it just has a lot of features (in Java 9 there is a tool that can make a bundled but stripped and optimised package).

Go binaries are getting larger and the compiler is getting slower over time, as they add features. They don't have any magical solution for bloat: eventually, they'll add the ability to break the runtime out into a dynamic library as it gets fatter and fatter.

Or of course they can just refuse to add features and doom their users to a mediocre, frustrating experience forever.

natefinch · on April 14, 2016

Actually, the compiler is getting faster in 1.7, and binaries are getting smaller: http://dave.cheney.net/2016/04/02/go-1-7-toolchain-improveme...

mseepgood · on April 14, 2016

Go 1.7 (tip) binaries are smaller than Go 1.4 binaries.

kaffeinecoma · on April 13, 2016

I dread tax time every year. I'm self-employed, so I don't exactly expect it to be easy, but I generally end up spending 3-4 full days working on my return. Not including the meticulous record keeping I amortize over the year with a simple spreadsheet.

Every time I think I've finally got it figured out, there's always some weird new situation that throws a wrench in things. Common things like buying/selling a house, moving to a new state or locality, death of a parent, etc. I honestly don't know how most people ever manage to do it correctly, much less optimally.

This year it was 2 different banks that got IRA reporting wrong. Last year, Turbotax "interview mode" would not let me enter some crucial figure correctly. Previous year it was clients not sending 1099s (and agonizing over whether to just report manually or wait for the form). Two years before that it was a RITA (local tax authority) screwup.

I tried hiring a local CPA to take away some of the pain, but he ended up making a $5K mistake, and it took hours of my time to correct. I've had a good accountant in the past, so I know they can make a huge difference, but they are very hard to find.

kaffeinecoma · on April 10, 2016

If we can digitally describe something as a sequence of bytes, and we stack those bytes end-to-end, can we not say that the bytes together form a (very large) integer, and that the integer already appears in the set of natural numbers?

doomrobo · on April 11, 2016

Apparently the distinction isn't so clear.

https://en.m.wikipedia.org/wiki/Illegal_number

contravariant · on April 10, 2016

See also: library of babel [1].

[1]: https://libraryofbabel.info/

adrianN · on April 11, 2016

What color are your bits? http://ansuz.sooke.bc.ca/entry/23

virmundi · on April 11, 2016

The problem is that the US system doesn't view software as a series of mathematical manipulations. Rather software is considered a thing like a tractor. The internals are therefore patentable.

The US system does prevent math from being claimed. Others have tried to explain the inherent math nature by making products with function languages of varying purity. These claims fell on deft ears.

PatentTroll · on April 11, 2016

A lot of us here are software engineers, right? Why should a mechanical engineer's work product be protectable, while a software engineer's work product not? Aren't both the result of creativity and labor?

_ofdw · on April 11, 2016

>These claims fell on deft ears.

On deaf ears, you mean? (Or perhaps daft?)

virmundi · on April 11, 2016

dTal · on April 12, 2016

A succinct explanation of why that's true and irrelevant: https://qntm.org/number

tl;dr: Abstract "existence", as numbers have, is not the same thing legally or practically as concrete "existence", as files have.

kaffeinecoma · on April 8, 2016

You know I think there is this phenomenon of "listening in the wrong language". As an American in France, sometimes I'd meet people and they'd want to try out their English with me. Every once in a while I'd meet someone and be totally confused by their French, and I'd have to ask for help. "Dude, he's speaking English to you". Then we'd both feel bad.

rconti · on April 8, 2016

That's great. Yeah, if I don't expect a language, the first 10 words are just totally lost to me. Which is strange, because in normal, english conversation here back home, sometimes someone will say something and I'll completely miss it -- but then be able to stop, and replay what they said in my head. It's like some part of my subconscious records it.

On the same trip, I was on a train from Italy to France. In my car was a French woman living in England, and an Italian woman. They were practicing on each other so the Italian would talk to the French woman in French, and she'd reply in Italian. This went on for some time. As we prepared to depart the train, the woman's child and husband joined her. I talked to the French woman's husband for 5 minutes or so. Eventually he asked me where I lived and I told him I was from California but studying in France. He turned out to be British, and switched to English and said "oh, well then I guess we can just speak in English then."

I'm sure our French pronunciation wouldn't have fooled a native, but it was good enough for a couple of foreigners.

kaffeinecoma · on April 8, 2016

For a while now I've been thinking about something like this: once a month I put some amount of money in my "browser account". Maybe it's $25, or $10, maybe sometimes it's nothing at all. This money gets automatically distributed to the sites I visit, potentially subject to a white/black list that I control. Payments automated, cryptographically secure. I never see an ad or run ad-related JS. And the sites that I support get my patronage as directly as possible. Edit: oh, and somehow my privacy is magically preserved. Small detail. :-)

Maybe that's too close to the "public radio" model, and would never work at scale, but I'd pay for it.

bduerst · on April 8, 2016

You just described this: https://www.google.com/contributor/welcome/

malchow · on April 8, 2016

The only thing I fear with this (Brave) is that it is gameable. Whenever you create a pot of money, it will be gamed. Someone will come up with the Brave equivalent of a "One weird trick slideshow!" to extract maximum dollars.

Subscriptions, however, are consumer-empowering. And everyone gets how they work. (Netflix, Prime, etc.)

Karunamon · on April 8, 2016

I think you just described the basic concept of Brave, Brendan Eich's browser project.

https://www.brave.com/about.html

trophycase · on April 8, 2016

It's just like cable used to be! Before they started serving ads anyway...

tracker1 · on April 8, 2016

Most cable networks always showed ads, other than the premium channels (hbo, sho, etc)... They used to show fewer ads though... and the shows are now, mostly slotted to 44 minutes of show (give or take), closer to 36 for reality TV. So the rest gets filled with ads.. because money.

kaffeinecoma · on March 12, 2016

Particularly for "small" languages like Go and C, there is a benefit to sitting down with a comprehensive reference and reading it cover-to-cover. A few hours/days spent in quiet study will pay dividends over just searching Google/Stackoverflow for how to fix whatever your current problem is.

kaffeinecoma · on Aug 5, 2015

Do you rely on Time Machine? Drop to the shell right now and try "tmutil compare -n" to see which files are not making it onto your backups. For me, Time Machine randomly ignores certain files, for no discernible reason. Files that have sat on my disk literally for years will never get backed up. It's happened to me on two different Macs, and I've lost data because of it.

jre · on Aug 5, 2015

I ran into the same issue. After a restore from my time machine backup, I discovered that some files where missing (with no apparent pattern). Fortunately, I had a non time machine backup, but pheew that was a close one...

kaffeinecoma · on Aug 6, 2015

I discovered it when visiting an old project after a whole OS wipe-and-restore from TM. A week later I found that 'git' wouldn't recognize a project dir due to some missing pieces. I had a 2nd TM drive, and was able to manually restore files.

For my other machine, no such luck. I had a 2nd backup offsite (Backblaze) but it was months before I noticed the damage, and so the old files were no longer available via the offsite. I think the worst part is not knowing what you've lost. I have thousands of music and photos on that machine, and every once in a while I come across something that's gone.

maxwelljoslyn · on Aug 6, 2015

"-n" doesn't seem to be a valid flag for "tmutil compare". (OS X 10.8). What flag or comparison property did you mean?

I'm running the unflagged command now. Thanks for the heads up. I appreciate it. If this comes up bad I'll need a second backup solution.

Edit: Thanks again, parent: Time Machine was indeed missing some of my music!

kaffeinecoma · on June 12, 2015

For those confused by 25 year old Scala: https://en.wikipedia.org/wiki/Scala_(company).

douche · on June 13, 2015

Thank you. This being HN, I immediately was thinking the FP-ish programming language, which would seem to indicate that it predates the JVM...