I cut my teeth on Perl 5 in the early 2000s, and I've been curious about Perl 6 for a long time.
Last year I sat down with Perl 6 for long enough to form opinions on the language's merits, which are many. I'm told many of my criticisms have been addressed in the five months since the article was published, but perhaps some of you will enjoy reading the original review. Cheers!
I applaud you for that, I feel like it would do good for most of the people here who are so quick to voice their opinion to actually try Perl6 out, as I get distinct feeling that majority probably haven't. I definitely feel the emotion here:
"current of frustration that Perl 6 isn’t more widely adopted. There’s a kind of righteous indignation that the language is very good — so dammit (I’m projecting), when will people start taking us seriously? "
despite not even being part of Perl 6 community in any way. So many are attacking it on such circumstantial, or even irrelevant, grounds instead of discussing the merits and failings of the language itself.
This is exactly the sort of thing that makes me frustrated (of tech world); people, myself included, scared to jump to new things, or even try out them leading to strong herd mentality. I suspect it is because everything has such strong network effects these days. And of course HN is kinda the worst because the business angle; there is always the question about if this makes commercial sense, which I'll admit might not be the strongest aspect of Perl6.
In this formulation, s_k equals utility. Like the Wilson score formula (and unlike the linked article), the provided equation takes into account the variance of the expected utility.
I find that article very hard to follow -- there are lots of detailed formulas, but no obvious place where the prior distribution is discussed, or the utility score given to different star ratings. And the examples are all very abstract.
Edit to add: ah, I think I see, the utility of N stars is assumed to be N, and the prior is all ones. But aren't those the most important things to tune in a Bayesian model?
With star ratings, I think an important point that often gets ignored is: different people use stars in different ways. One user might 5-star most things, but give the occasional 4- or 3-star review if they have a problem. But another user might 3-star by default, and save their 4- and 5-star reviews for exceptionally good cases.
I wonder if a simple way to fix that might be to reinterpret everyone's star ratings as percentiles, based on the overall distribution of stars in their reviews. "This user gives 5 stars 10% of the time, so we'll interpret a 5-star review from them as anything in the range 90-100 -- assume 95%."
You would probably also want to reinterpret the results for each user. "This review scores average out as 84%. For user A, that's 4.5 stars, but for user B, it's only 3.5 stars."
The big downside is that star ratings become subjective. But they're already subjective, and ignoring that problem doesn't make the results any better. Average star ratings on all the big websites and app stores right now are garbage -- they'll usually warn you if some Amazon product is terrible, but that's about all.
If you crunch all the review data and figure out the best possible recommendations, you end up with collaborative filtering and the Netflix Prize. It's a shame that so much great work was done for that competition, but nobody seems to be using it now. Netflix themselves just use a trivial upvote scheme now.
But I wonder if there's some much simpler approach that still gets pretty good results.
I wrote this a couple of years ago [1]. I think we need to remove subjectivity on ratings by asking more specific questions and only allowing a binary answer.
1. Is the food good?
2. Is the service good?
3. Is the atmosphere good?
That's a pretty simple answer. Often when I see 1 star reviews it's because of a single element of the experience but not the overall experience.
It's easier to leave a review because there's less cognitive load. It's easier to search for what you want: if I have my foodie hat on, I don't particularly care about the service. If it's a night out with a customer, that becomes more important all of a sudden.
And then you can generate some sort of average score based on the answers to these questions to calculate the 5 star rating.
I do prefer that over stars, but I think it potentially misses some information. Let's say most people answer "good" for all the categories. Does that just mean the place is good overall, or is it fantastic?
To put it another way, how do you distinguish the 4.0-star places from the 4.9-star places?
With conventional star ratings, you're reliant on most people using stars consistently. With a series of yes/no questions, you're relying on a potentially small pool of "no" answers to give you a useful signal.
I think stack ranking would be much more powerful. "How does this place compare to others? Average, better than average, in your all time top 5?" Everybody's feedback would be completely clear. It's not obvious how to aggregate that into a single rating number though.
Given a set of questions - e.g. "how's the food" "how's the atmosphere" "how's the service" etc. - you could figure out how the restaurant scores relative to others by stack ranking based on the % of answers to a particular question that got a "Yes". The numbers should hopefully reflect a normal distribution and from there you get your /5 rating.
If everybody answers "yes" to all of the questions - good value, service, food, atmosphere - then that suggests to me that it's a great restaurant. And you can have a lot of questions that are even asked randomly to limit the number of questions per user.
I rate a lot of places highly that have great a lot of things but not great service, because I don't think the service is bad enough to bring it down. But that's data that is being lost.
I like your idea of stack ranking but with a different flavour. I think that "in your all time top 5" is a hard question to answer. How about this though - if we know you've been to Taco Place X and now you're going to Taco Place Y, maybe the question is "are the tacos at Y better than X", "is the atmosphere at Y better than X" or even "is Y better than X" (but I like the idea of collecting more granular data).
If you collect this^ data to stack rank. Then it definitely gives you a better distribution of restaurants relative to each other in each category.
As a consumer, with this level of granularity, I can select what I care about tonight. If I'm grabbing takeout for lunch at work, does a five star rating even matter? I should ask Siri "show me the top fast and delicious takeout restaurants near me" and she should do: "select name from restaurants where distance < 500m order by (speed + flavour) limit 3;" and from there I will pick something from that list that looks nice. That seems like a nice UX.
There's a body of research on this, and it suggests that ratings are more meaningful if you add options, up to about 5 or 6 ratings.
That is, if you asked people to do the ratings once, and then asked them 1 hour later, there would be more consistency across time as you add options from 2 to 3 to 4, up to about 5 or 6.
The problem with binary ratings is that, as much as you might think otherwise, you're forcing a kind of hazy, grey experiential assessment into 0 or 1. And in doing so, people near the boundary (whatever that might be) will vacillate between them. E.g., people who feel "meh" about something are forced to choose something else, and sometimes they'll say 0 and sometimes 1. The more options you give, the more reliable / meaningful the ratings will be.
This example is interesting to me because it's something most people can relate to and illustrates the complications of utility-based and Bayesian formulations of the problem. You end up having to decide on utilities and/or priors.
To me the answer is to weight the data maximally in forming a posterior, in which case you end up using a reference prior. Similar kinds of arguments about utilities lead to reference priors. Reference priors can be complicated to compute, but for things like multinomials over ordinal ratings, reference priors have been worked out fairly well.
To me it always made sense to allow people to sort by the center of the estimate, or the lower bound (maybe using different language).
I think 1-4 stars is the ideal rating style. I wish that were used more often.
A choice of 1-4 stars gives you enough freedom to express your opinion, without being overwhelming. It's a small enough range to be reasonably objective (almost everybody will interpret it as 1 star = bad, 2 = passable, 3 = good, 4 = great). And with an even number of choices there's no middle "meh" option -- you're forced to make a choice between 2 and 3.
Of course it's important not to ruin it by adding extra options, like 0 stars or half-stars. (That was Ebert's big mistake!)
Edit to add: to relate this to the parent post, I'm thinking that maybe ranking things as 1-4 stars in several categories could be the best if both worlds.
This Economist article's a bit chatty and superficial (surprise), but in an age of mass anxiety and digital distraction, I think the goal of the Epicureans is as important as ever: How does one go about producing a calm mind? It's not a simple task, and I think correctly has to analyze the mind in relation to everything else.
The atomic hypothesis of the Epicureans seems like a side quest into physics, but the fruit of the journey is that everything's just combinations of atoms and the mind must be made of atoms too, so let's think of it as a physical system with inputs and outputs, and forget about any grander god-narratives. With this perspective comes some very practical advice; Lucretius for instance has an extended passage on how to deal with a "crush". I'll paraphrase but he points out that your crush exists purely as an image in your head, and you really have no idea what the person behind the image is like, and if you finally get together the sex will probably be very awkward, so it's better to direct your mind and amorous intentions elsewhere. I believe the phrase he used was to find smaller pleasures that carry no penalty -- because seeking the larger rewards almost always leads to misery.
Lucretius is a good read and the Latham translation has some felicitous turns of phrase. It's fun imagining arguing with the ancient philosophers about their physical theories, which they support (as best they can) with the available evidence about what wind, liquids, lightning, thunder, earthquakes, smells, tastes, sights, etc. are made of. It's a shame philosophy got distracted with "higher things" for so long (i.e. 2,000 years) because here we are realizing again that everything is made of atoms, and it sure would be nice to have more advice on living life in the face of this fact.
Nice interactive examples but I'm afraid the basic setup here doesn't make sense to me. The "atom" is defined as the average encoding of inputs with the feature ("faces with a smile"), but I'd think the proper definition should subtract off inputs without the feature (i.e. "smile" = "faces with a smile" minus "faces without a smile"). The way it's defined you end up adding an extra "average face" along with the feature of interest, which is clearly seen in "The Geometry of Thought Vectors" example -- the non-smiling woman isn't so much forced to smile as to have her face merged with that of a generic smiling woman.
I'm a fan of this library -- I used it to build Hecate (https://github.com/evanmiller/hecate), a terminal hex editor. If you get creative with Unicode box and box-drawing characters, you can build some interesting interfaces (tabs, progress bars, etc).
The method described here is simple because it's only looking at the mean of the belief about each item; it uses the prior belief as a way either to sandbag new items or to give them a bump. I tend to advocate methods that take into account the variance of the belief in order to minimize the risk of showing bad stuff at the top of the heap.
I have a newer article (not mentioned here) that ranks 5-star items using the variance of the belief. It ends up yielding a relatively simple formula, or at least a formula that doesn't require special functions. Like the OP I use a Dirichlet prior, but then I approximate the variance of the utility in addition to the expected utility:
The weakness of the approach (as well as the OP) is that it doesn't really define a loss function for decision-making (i.e. doesn't properly account for the costs of an incorrect belief), which one might argue is the whole point of being a Bayesian in the first place. In practice it seems that using a percentile point on the belief ends up approximating a multi-linear loss function, but I haven't worked out why that is.
This is interesting stuff, but I wonder has anyone verified the results in practice? These methods are all quite simple. They assume, for example, that the quality of an item is independent of the quality of the surrounding content. This is clearly not true. When Steve Jobs died, for example, no other new in the tech community was going to get air time. There is also the need for a variety of content. I think we all know how boring it is to read endless "I wrote X in Y" posts on HN, where X is some simple software system like a blog and Y is the language du jour (Node.js / Go / whatever).
In the machine learning community the above problems are addressed with submodular loss functions, bandit algorithms, and no doubt other methods I don't know about. Now I don't value complexity for its own sake, so I wonder if the additional power these approaches bring is warranted.
I tend to advocate methods that take into account the variance of the belief in order to minimize the risk of showing bad stuff at the top of the heap.
Penalizing variance would be the opposite of my intuition. Given a boring low-variance item with 10 3-star votes, and a divisive item with 5 1-star votes and 5 5-star votes, I'd think you'd want the one at the top to be the one with the medium chance that they'll "love" it than a high chance they'll find it passable.
If you further assume that the average person is going to check out the top few results but only "buy" if they find something they really like, the risky approach seems even more appealing. A list topped by known mediocre choices has a low chance of "success". What's the scenario you are envisioning?
The kind of divisive item you describe is rare, at least on Amazon. What happens most commonly is that everyone loves something or everyone hates it, with some noise (e.g. 10% 1 or 2 star reviews). In this case, it makes sense to promote the item that has a 4.5 mean score and 100 reviews over one that has a 4.7 mean score and only 5 reviews. You want to account for the uncertainty when there are few ratings. If you don't, all the items at the top of your search results will be 5-star 1-review products.
I saw this post's headline and it reminded me of a "how not to sort by average rating" post I read many years ago. I looked it up on Google [1], read it, clicked through to this HN thread... lo and behold, the top comment was written by the author of the article I just looked up. Great stuff.
This is a compelling little essay. I think the author left out a couple of important points, though.
1. Design systems may be "algorithmic," but they're primarily mathematical, and equations remain stubbornly hard to use. Metafont failed to attract designers because no one wanted to cook up high-order polynomials to express their visual ideas. (In contrast, Adobe came up with a good-enough interface for Bezier curves, and now the world uses non-algorithmic fonts.) The new class of designers will need solid grounding in at least high school algebra to get their curves and easing functions right.
2. Any argument about "XXX should learn to code", where XXX is anything other than "aspiring professional software developers," means that there is a significant market opportunity for creating usable software that does not require coding. If people are willing to spend thousands of dollars on bootcamps to learn to code -- when they'd really rather be focusing on their domain problem -- then they're theoretically willing to pay thousands of dollars to not have to learn how to code.
I don't know the state of design software, but if it's anything like other professional desktop tools, it's horrible, creaky software stuck in the early 1990s with very little competition in sight. When I read this essay I can't help but think there's an opening for usable algorithmic design software -- whatever that may look like.
I agree with your first point. I don't see designers learning to represent ideas as equations. Not even most of the people who studied Computer Science know the difference between a function and the graph of a function. So I don't see how more difficult concepts can be taught to designers who have no foundation in maths whatsoever. And I don't think that high school maths is really enough. You need a solid knowledge of Linear Algebra to understand how to display a 3D object properly.
Expressing a design as a mathematical expression that generates new representations that follow some guidelines means the designer needs to rigorously understand the very essence of what it is he's trying to do. And then he still needs to know the maths to actually formulate it. This is a tough problem if you start from scratch and I doubt that many designers are really capable of pulling it off.
AFAIK creating those layouts for iOS apps has mathematical reasoning. If you design your app, you basically specify rules to anchor each object on the screen. When the app tries to draw its interface, a linear system of equations gets solved and the solution is the best way to put all the objects on the screen, given the rules that were defined. This gives you an extremely powerful design tool to specify your layout for arbitrary screen sizes.
I would very much hope that they do the same for CSS any time soon.
> Not even most of the people who studied Computer Science know the difference between a function and the graph of a function.
I am surprised by how often I hear this example. Not because people can't define the graph of a function, but because in usual set theoretic terms[1] a function and its graph are the same thing. They are both the set of points (x, f(x)) where x ranges over the domain of f. This is an equivalence that isn't even made by most mathematicians, as they usually denote the graph a function as Γ(f), implying it is different from f itself as a function. Of course they know the equivalence, but prefer to separate the objects notationally for clarity's sake.
[1]: I'm ignoring the notion of a function in category theory, since CS students who are taught math are primarily taught naive set theory, not category theory. Of course, when you aren't working in a category of sets then the graph of a function doesn't make sense, I think.
> I am surprised by how often I hear this example.
Huh? That surprises me, too. I didn't know it was so common.
Anyway, I agree that there is an equivalence between a function and its graph. Knowing one implies you know the other. But just because it's equivalent, doesn't mean it's the same. A two-dimensional graph is, after all, a set of tupels in R^2. A function, however, is a mapping from one element in one set to another element in another set.
Mathematically they're very different. I can't draw functions, I can only draw sets. I can't apply sets, I can only apply functions. It makes sense to treat them as different things.
Actually, vector graphics are mathematically specified by graphs of functions, not pixel sets. This makes it possible to zoom in without pixelation, since the function in the graph can be defined analytically (like Bezier curves). If you started out with sets instead of functions, you'd get pixel graphics, since it's not possible to save an infinite set. So the equivalence of a function and its graph holds until you try to implement something.
For whatever reason, I hear it as a "challenge" trying to show someone they don't understand math.
But again, I would argue that you're ignoring the difference between an object and its representation for the purpose of communication. A function is defined to be a set of tuples (a relation) with some extra properties. These tuples are the input-output pairs of the function, which is exactly how you define the graph of a function. They are literally the same object, regardless of whether you choose to represent them as a picture or a mapping. The choice of language suggests they are different things because you want to treat them differently (and this is a good thing), but when you get down to the definitions they're the same.
Your example of vector graphics is not a counterexample to this because the sets that define those functions are uncountably infinite. When you render them on the screen to have to approximate them by a finite pixel set, but that is unrelated to the underlying function's mathematical nature.
These set-theoretic definitions of functions and relations are standard parts of an introductory course on set theory.
I really like the analysis of finding versus maximizing product cycles. There are a few aspects of "Moral authority" that I think have been left out.
1. The rank and file know (or at least think) that a professional CEO is less likely than a founding CEO to be around in 5 years. So when a new professional CEO says "Jump" it's much harder to get people to change their way of doing things, particularly if they suspect the next CEO will undo all the changes. When a founding CEO says "Jump," you might as well get with the program because the change is likely to be permanent. (For the economists in the audience, this is a version of the Lucas critique applied to organizations.)
2. The "knowledge pyramid" isn't just about the CEO's epistemological state and decision-making apparatus; it's also about communicating the values and identity of the company back to the company. Founding CEOs are in a superior rhetorical position because they can say "I hired you all because each of you ________", and give everyone the warm fuzzies. The professional CEO on the other hand has to speculate or impute motivations to the previous CEOs, which is less motivational. As a recent example, I think Satya Nadella has done a relatively poor job of communicating what makes Microsoft employees different from other employees; instead of saying, "You guys understand the full stack better than anyone" (or something), he can't help but to talk about market opportunities and cloud-first productivity blah blah blah. (Which is to say -- not only was he in an inevitably weakened rhetorical position with respect to BillG, he squandered it when it came to articulating the company's identity.)
3. Founding CEOs are in a better rhetorical position when it comes to navigating value conflicts between quarterly earnings and something else ("changing the world"). When communicating with immediate subordinates they can do a kind of good-cop bad-cop routine with the board of directors ("The board really wanted X, but I thought that'd be bad for customers, so we're doing Y."), whereas a board-picked CEO will be assumed to act only in the interest of the stock price. For many companies (e.g. organizations that profess to be on some kind of higher mission) this sort of CEO will be rather uninspiring, and therefore the professional CEO will be less capable of effecting change throughout the organization.
4. Professional CEOs have to navigate a more precarious political situation because there's a good chance that one or more subordinates want the CEO's job. (I would hazard to guess that insiders usually replace outsiders and outsiders usually replace insiders.)
One thing this article leaves out is a deeper analysis of why the conventional wisdom is to replace a founder with a professional when there are so many famous counterexamples. It could be that at a certain stage in a company's growth investors prefer to be more risk-averse and will give up a potential grand slam in order to get a two-run double, or something. (And by extension the OP is willing to take larger risks than the average VC.) Or that there are hidden social dynamics at play.
>One thing this article leaves out is a deeper analysis of why the conventional wisdom is to replace a founder with a professional when there are so many famous counterexamples. It could be that at a certain stage in a company's growth investors prefer to be more risk-averse and will give up a potential grand slam in order to get a two-run double, or something. (And by extension the OP is willing to take larger risks than the average VC.) Or that there are hidden social dynamics at play.
I think one of the reasons that this conventional wisdom is becoming less common is that 'technology startup' is more likely to mean a company with a product and customers now than in the past when it frequently meant a company that had brought a new technology to the commercialisation stage.
In companies of the latter type, it can make sense to move the founding CEO to a CTO role because that has been their actual role in the development phase anyway. Their skillset may not match well with running a company with marketing and sales departments and getting the product sold.
Companies of the former type, like Facebook, are different because by the time they get real traction, they are already dealing with customers and are out there in the market. They have those skillsets by definition because they're already doing that.
VirtualHostX was my first app, the one that grew slowly over a few years and then took off. Hostbuddy is a small, companion app that does moderately well. Hobo is my newest, still in beta, that I hope to kick off in February.
I received an interesting reply over email which I wanted to share here. With the author's permission I've reproduced it below but blanked out some details:
> I (or we as a company) faced pretty similar situation just a few weeks ago.
> The app is called ______, and it was our first app ever, our child - app that constituted our company brand and still is pretty useful for many small business owners (_______ is an invoicing app, making something like $20k/year on our domestic market).
> Unfortunately(?) business is business, so finally I decided to decline it. Now we’re making some final touches and will release it as open source project - again facing similar problems - the codebase is almost 6y old, app is not trivial, build procedure is not single click etc etc.
> Besides all those risks and problems, I still believe that opening the source code is worth doing. That way we can help other (less advanced) programmers to start their own mac products/businesses. I’m sure that you’ll agree that after a certain point you need to look inside something bigger than a trivial app from examples folder, something that is/was a real thing, something ‘alive'. That’s IMHO a single priceless source of practical knowledge.
Last year I sat down with Perl 6 for long enough to form opinions on the language's merits, which are many. I'm told many of my criticisms have been addressed in the five months since the article was published, but perhaps some of you will enjoy reading the original review. Cheers!
https://www.evanmiller.org/a-review-of-perl-6.html