Hacker Newsnew | past | comments | ask | show | jobs | submit | 00ajcr's commentslogin

There's a Cython implementation (the .pyx file). The Cython code is compiled to C code (then that C code is compiled).


Less ambitious than where?

I don't think UK people are inheritantly less ambitious than anywhere else, but the very low social mobility here has always dragged down potential for some, while the punishing austerity and lack of investment over the past decade has limited other people's opportunities further.


My interpretation of the point in the blog post was that explicitly spelling out variable names makes APIs and the underlying code much more accessible to a wider audience.

Sure, there'll be a subset of users of these libraries that have read ML/textbooks and are familiar with what η means in this context.

Today, many (most?) users of ML libraries will probably not know what η means without looking it up. Adhering to mathematical notation puts up an unnecessary barrier to using the API/code and ultimately limits wider engagement/collaboration.

To attract a bigger slice of the ML community, choosing names that the ML hobbyyist can read, understand and use without pause is the better path forward.


You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work?

FYI, the documentation of the function https://fluxml.ai/Flux.jl/stable/training/optimisers/ explicitly says it is learning rate:

> Learning rate (η): Amount by which gradients are discounted before updating the weights.

so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.


> How does that work?

You can look up "learning rate" much easier than to look up "what is this Greek letter on my screen" followed by "what is the use of this Greek letter in my context" and only then followed by searching for "learning rate"

More importantly, it's possible to know what a learning rate is without knowing what Greek letter it's commonly denoted as. Especially since mathematical notation is so inconsistent across authors. I want less ambiguity in code, not more. Explicit is better than implicit.

Mathematical notation is notorious for being an absolute mess of inconsistencies. Who in their right mind looked at it and went "yep, I want more of this in my source code".


This depends a lot on the target audience for your code.

For research-focused code, it is likely that whatever you're implementing was initially described in terms of mathematical notation (e.g., in a paper or book). It can be helpful to have variables that unambiguously match that canonical source. In fact, a lot of my Julia code has docstrings containing references/links to the original paper and a comment noting that it uses the notation therein.

This sidesteps the problem where textual descriptions like `learning_rate` can sometimes be ambiguous: is it the original learning rate, or perhaps the current rate after applying some sort of schedule or decay? I think the Flux documentation is pretty close to ideal, in that it's got a symbol you can match against equations (though no reference to them) as well as text that you can search to learn more.


You are not answering the (rhetorical) question that you quoted, and the answer to your response is already in the paragraph that followed it:

As I said, the necessary keywords for Googling it, along with a brief description is already present in the documentation.

The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

> Mathematical notation is notorious for being an absolute mess of inconsistencies.

According to whom? What exactly is inconsistent?


> The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

No, that is not the quibble. My quibble is with choosing identifiers that make code less legible when taken by itself. The best code teaches future readers about how it works.


The struct's field name is `eta`, but this is an internal detail. Its constructor takes a positional-only argument, no public name.

The greek letter is used in the documentation. And the reason is that every optimiser's documentation links to the original paper, and tries to follow that. If the Adam paper calls the two decay rates β1, β2, then staying close to that seems the least confusing option.


Perhaps I'm missing your point, but I think you're focussing too much on the specific case that someone who isn't me came up with.

My most general point is that the identifiers we use in our code are almost never just convention or taste when we are sharing that code with anyone else (and for most, "anyone else" includes our future selves). Getting a little more specific, I'm specifically interested in Julia and look forward to working in it further, but I've personally felt pain around scientific/mathematical notation when trying to understand code I've found on github. tagrun dismissing my pain as nonsense and the people who argue for my ilk as perpetrators of bikesheddding is dipshitted. Yeah, I'm probably the asshole for being a college dropout trying to leverage modern scientific computing for my own ends (snoogins), but I'm also willing to bet tagrun is probably the member of a team that talks down to junior members and complains they haven't read enough papers or the right papers to see the magnificience of their code ;).

It's fine to write code that demands a domain expert to understand, but don't pretend like its good across all dimensions. There are tradeoffs involved.

Personally I find the preponderance of scientific/mathematical notation (whatever you want to call it) in Julia to be cute; It certainly does bind the code to linked papers in a pretty cool way when it all fits together properly. That said, its a pain in the ass when it doesn't fit together properly and I've personally had a journey into Julia spoiled due to frequency at which I had to figure out how to notate something or what word to use when regarding some squiggle I haven't encountered before. I look forward to having a better intuition for the greek alphabet but until then Julia will often be harder to read, let alone understand when compared to ruby or javascript or go or C# or any other of the roughly dozen programming languages I've worked with and feel comfortable translating between.


> > Learning rate (η): Amount by which gradients are discounted before updating the weights.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

As far as I can tell it's a documentation complaint. He has to remember "η" from the line with the signature, past the line "Gradient descent optimizer with learning rate η ...", and a heading "Parameters" until the line quoted which explains this in full.

He says this is the API, but that's inaccurate. The API being explained is that the first positional argument is the learning rate. It's not a keyword argument, so you cannot supply it by name. What variable names are used in the code is private, and in fact the struct's field name is `eta` so that you can access it without typing greek.

If this makes the top 10 list (even the top 10 list of documentation complaints) then Flux is doing OK. Especially the top 10 list of a guy with a PhD in a mathematical field. (From the sort of university which used to require students to know latin & greek, too.)


> Unlike the example in the link you give, η isn't a generic random name like a,x that can mean anything. If you ever read a paper on stochastic gradient optimization, you'd know that η means learning rate in the context.

Why should reading a paper on stochastic gradient optimization be a prerequisite to understanding your idiosyncratic choices for identifiers? The fact of the matter is that I can understand code much better than acedemic prose. I'll learn the code and then supplement with the paper as needed. By using idiosyncratic identifiers you're gating off your code from people who haven't jumped through the same specific hoops you have and have the same mental muscles you have developed.

> It is bikeshedding because it is analogous to insisting that using "angle" instead of "θ", or "radius" instead of "r" in a 2D geometry library is superior and takes your code from being a lackluster to something that shines (in the words of the original author), while not having anything useful to say anything about the mathematical/technical aspects of the code itself.

No. To someone who doesn't have an established mental muscle for mathematical notation it is analgous to using thai script to write for an audience that primarily reads english: I can still use google translate, but the cognitive load is much higher to members of the audience who are native thai. That isn't bikeshedding, that's caring about understandability.

> You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work?

I learn from code a whole lot faster than I do from acedemic prose. I usually start with code then read the whitepapers the code refers to as I go. I learn slower from code that uses symbology I'm not familiar with. In the context of learning a new code base, unfamiliar symbols are bad in several ways.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

My quibble is with the assumptions I think you make about what comprises good code quality while at the same time having suffered through code from people who share your attitudes. Naming matters to me in more ways than you're apparently versed with. The article I linked is just one small discussion on naming but not comprehensive by any means. I was linking it more in hopes that you would do further thinking of your own about a pretty wide subject. Just my opinion.

Furthermore, the page on FluxML demonstrates the problem I'm referring to. Just down from where you linked you'll find an entry like `RMSProp(η = 0.001, ρ = 0.9, ϵ = 1.0e-8)` in which `ϵ` is described nowhere in the entry. It's a random symbol that I understand to be used usually in set membership notation, but in this context (the context of some random link some random person posted in the interwebs) I have no clue what it means and thus it is a barrier to my understanding.


Unicode should not be in public APIs. This is a standard around Julia. Flux is breaking the standard. Yes, it's not a good thing.


The unicode epsilon isn't in the public API, it's describing the 3rd positional argument.

This was added recently, and for some reason the PR (1840) didn't fix the docs, which is bad. The Optimisers.jl version has an explanation: https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RMSProp


Interesting! do you have any links talking about that standard? I'm super interested in Julia and this seems like a good opportunity to learn something I've been missing so far.


I'm not sure if/where it's formalized, but it's just generally something that's been enforced throughout Julia's Base, along with many of the package organizations (like SciML among others). It's something that would be mentioned at code review time by most contributors. It's why you don't see unicode keyword arguments. There's a lot of reasons. I think the best one is that you want the API to be compatible with old terminals you tend to get on HPCs which do not tend to support unicode. We should probably make it a part of the standard formatter rules or something at this point.


Word. Thank you for the response.


Aside from significant whitespace (which I suppose takes a little getting used to), is there anything that marks out Python's syntax as particularly rigid compared to other languages?

I'm struggling to understand what the author is trying to argue here.


> Aside from significant whitespace (which I suppose takes a little getting used to),

If Python's whitespace takes getting used to, then you're probably poorly formatting your code to begin with.


One thing I do find mildly annoying about the whitespace is that you can't jump to the end of a block with e.g. % in Vi (goes to the matching bracket). Not a huge deal but something I notice.


To broadcast operations (such as addition) between arrays in NumPy, trailing dimensions have to be equal (or be of length 1).

In the example given above the 3D array and 2D array have shape (lengths of dimensions):

   (2, 3, 4)
      (2, 3)
That is - the suffixes do not agree (4 != 3 and 3 != 2) and NumPy raises an error.

However, for the same operation in J the prefixes agree:

   (2, 3, 4)
   (2, 3)
and the addition gives the expected result.

To add the arrays with these shapes in NumPy, one method is transpose each array (reverse order of the dimensions), add these arrays, and then transpose back:

  (a.T + b.T).T


Thinking about these as a "verbose imperative language" programmer, I'd say suffix agreement seems to make more sense to me at first glance, and I'd like to hear more about why it might be worse.

For why I think it makes sense: I think of multidimensional arrays as being arrays of arrays, and "normal" index lookup operating on the first dimension. If I have a

    float[100][3]
in some context I might think of it as 100 vec3s, and I might want to do some vec3 operation on each of them. I might want to dot them all with my some other vector, or add them all to some other vector. I almost never have 100 scalars and want to apply one scalar to all elements of the corresponding vec3.

But I guess maybe this is all widely agreed on, and maybe the contentious part is just index order? Like, maybe you'd say "100 vec3s" is actually

    float[3][100]
in which case prefix agreement would make more sense.


For fans of crime/detective novels, I would not hesitate to recommend 'The Chain of Chance'.

Utterly gripping, fascinating, and a gateway to Lem's sci-fi works.


I think it's the `--skip-string-normalization` flag which means "Don't normalize string quotes or prefixes".


But I want them normalized. Normalized to single quotes.


Yes, this. I have early-stage cateracs, which by-and-latge is not an issue on my 40 inch monitor. But it does mean that my astigmatism drifts around monthly. My prescription changes faster than I can get new lenses cut. The BIGGEST ergonomic issue for my with Python is single versus double quotes. Double quotes are the difference between happily coding all day versus a headache at lunch time and 5% less done. Fuck Black.


Can you say more about how single quotes vs double quotes is helpful for you? I rarely write python, but I have a passion for accessibility and I'm curious about this.


Surely the [sic] annotations are to indicate that the quotations are unchanged from the original source text, which contains somewhat archaic language and misspellings of words when compared to modern English.


You're right. I'm so used to seeing it used in a snarky way that I was overlooking the original meaning of "sic erat scriptum" (thus was it written).


Philip Guo has an excellent set of video lectures on CPython internals, which includes an overview of parts of ceval.c: http://pgbovine.net/cpython-internals.htm


I've watched several of those, and they are good. Though for me, playing with code is a different type of learning than watching videos. I should go back and watch the one on ceval though.


Useful tips and a nice skeleton!

There's also a great guide to cleaning bones from animals (in various states of decay) on the blog Jake's Bones that I've referred to a couple of times: http://www.jakes-bones.com/p/how-to-clean-animal-bones.html


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: