Hacker News new | past | comments | ask | show | jobs | submit | willis77's comments login

The fine folks at <checks notes> FOPEAS would never tarnish their good name by stooping to such a stunt. I mean, we might expect such shenanigans from the likes of SMURGBLOZ, KINSURGE, or GSIROOZ, but not FOPEAS, fine purveyors of `FOPEAS an AI Language Model I do not Have Access to The Context of The SFD You are referring to. Can You Please Provide me with More Information so That I can Assist You Better`


it's a shocking accusation, truly.


The year is 1900. College-bound nephew says "what's the point of a physics major anymore?" I asked "what do you mean?" and he goes "Newton's Laws".


Each year Kaggle creates a for-fun, Santa-themed optimization problem. This year's is about controlling a robotic arm to efficiently print cards.


The digits of pi contain every pdf that ever could and ever will exist.


"Find the earliest valid pdf in consecutive digits of pi"


I mean, the answer is trivially zero, there exists a PDF-like structure somewhere in Pi, and the offset of that doesn't have to be zero, it can start or end anywhere. So the range [0, N] is a valid PDF.


"Find the last byte of the first valid PDF in the binary digits of Pi"


Since the PDF also doesn't have to be end-aligned, the answer is trivially [0, infinity].

The first place a valid PDF could be ended, perhaps.


A pdf at [0, N] sorts before the one at [0, N+1], by "first valid pdf".


No, both start at 0. Also, [0, infinity] and [0, infinity+1] are the same thing.



your example fails to satisfy the invariant. 11 is less than infinity.

you're just pasting random python snippits at me now. It's time to move on.

again, just to summarize: PDF files do not have to be zero aligned, and they do not have to be end aligned. Therefore the answer to the question "what is the first segment of Pi that is a valid PDF file" is trivially (0,infinity). That is a correct statement. The non-greedy (in the regex sense) answer to that question will be different, however.


Why is this so hard? If the tuple (0,10) represents the range of a valid pdf, then the next tuple (0,11) is also a valid pdf. Or any after it up to and including (0,infinity).

Note the word "next", implying that (0,10) sorts before (0,11); you even say it yourself "11 is less than infinity". Where I'm from "first" and "less" are related (the first element in a unique sorted list is defined to be less than all other elements). So if there is any valid pdf in pi that can be identified by the range tuple (0,N), then the first valid pdf must occur before N -> infinity. Therefore (0,infinity) can never be the first valid pdf, even though it may be a valid pdf.

Maybe a picture would help:

    Potential pdf file ranges in pi: [(0,0),(0,1),(0,2),(0,3),(0,4),...,(0,N-1),(0,N),(0,N+1),(0,N+2),...,(0,infinity)]
    Is it a valid pdf?                 no    no    no    no    no  (no)  no      yes   yes     yes   (yes) yes
    Which one is first?                                                          ^^^
I thought linking to a python script that shows the order comparison of a tuple (0,N) as less than the tuple (0,N+1) would clearly demonstrate this, but it appears to have failed to communicate that to you. We don't need non-greedy regex rules to do a less than comparison.


Please don't give them any ideas.. the whiteboard interview coding tests are hard enough as it is


How else will we weed out the fakers and people coasting for 10 years? Our CRUD SaaS app needs top people.


Doesn't sound like that hard of a question, given you are provided the structure of the PDF header. I guess it really comes down to substring search.


Imagine if it was a PDF that simply rendered the number 42.


If that happens we know for a fact that we are in a simulation


Well maybe. We don't know if pi is a normal number.


Actually it only needs to be a disjunctive (or rich) number which is a weaker condition.

We don't know whether pi is that either for any integer base.


> We don't know if pi is a normal number.

Sure we do. There are plenty of proofs out there that pi is an irrational number.


Irrational does not imply normal. For example, 1.01001000100001... is irrational but it's certainly not normal.


Technically, 1.01001000100001... can be normal depending on what ... stands for. :)


Well, obviously. But presumably the ... is meant to imply that this is the summation of 1/(10^(x(x+3)/2)).


Or what 1 or 0 or . stands for.


Actually I'd argue the example you provided is normal, as long as you authorise a particular encoding where every number n you're looking for is encoded as a string of n zeros.

It's then trivial to see that every number you can think of is encoded in there, and therefore any data, piece of music or movie that ever existed.

(I'm not sure we're allowed to fiddle with the encoding, but since we allow ourselves to represent a piece of music into a number, we're already talking about encoding anyway, so it doesn't seem like cheating to me...)


Normality of a number is with respect to number bases, so your trick with encoding is invalid. Otherwise, every computable number could be considered normal - take an algorithm for generating of it, supply a random string (this is the encoding), disregard the random string, and you have a perfectly valid normal representation of your number. So it is cheating.


I agree that normality is a specific formalized concept, but you could always require that an encoding function like this is injective.


Encoding doesn't count. Normality is a very specific mathematical concept: https://en.wikipedia.org/wiki/Normal_number

Also, 1.01001000100001... is a good example of a number that is both irrational and transcendental but not normal.


Normal in this sense means that all the frequency of all digits approaches a uniform distribution as the length of the sample increases towards infinity. Basically if we could see "all of" π and count all the 0s, 1s 2s, 3s, &c to 9 all the counts would be equal.


That on its own can't be right, because 0.12345678901234.....

According to wikipédia, you gave a definition for "simply normal", and for normal numbers the distribution of any sequence of digits is uniform. So 00, 01, ..., 99 each occur uniformally too.


Moreover you need to consider it with regards to all other bases than 10 too.


Is this correct, mathematically?

I understand the point that PI contains every possible piece of information, theoretically.

However, the chance of finding a given string in PI depends on the string’s length. The longer the string, the more the probability tends to 0.

The paradox therefore is that PI contains every PDF, but you will never find them, so in what sense does it really contain them at all?


No, all strings theoretically exist in 𝛑 given enough digits, so longer strings don't reduce probability of existence, they just mean that it will take more digits to find them.


See Borell-Cantelli lemma.


I looked this up but I’m not sure I grasp your point.

Are you saying that:

- given a long string, we might ask “can this string be found in PI?”

- the probability of finding a long string in PI is infinitely small

- the number of possible strings in PI is infinitely large

- it’s not possible to decide if the answer is yes or no?


If a tree falls in the forest and no one is around to hear it fall. Or a modern take, if a disease has no symptoms is it really a disease.


<citation needed>

Including a PDF that generates the digits of pi


actually, if you find the citation, let me know, you might be in for an award


I'm not sure that's necessarily true. It is true (at least with a non-constructive proof) that if you pick a 'random' real number then it contains all possible PDFs with probability one ( or that the set of numbers for which this is not true has lebesgue measure zero). But I'm not sure it's known that pi has this property.


Pi is thought to be normal but it hasn't been proven yet, so we can't say that for sure, but it's likely true.


I don't think that is a proven fact.


Since a PDF can begin with non-PDF content, then pi itself is a valid PDF file.


Any dog ever would gladly trade a few zaps for a wide open place to explore outside. Cruelty is depriving dogs of exercise and outdoor time and stimulation and the chance to run, not invisible fences.


My anecdata shows one amongst 4 dogs who I used a shock collar on wanted to get outside regardless. And he was aptly named Loki for a reason. So no, not 'any dog', not by a large stretch.

You're making assumptions that I didn't exercise or allow my dog(s) outdoor time. Wether it was daily visits to the dog park or visits to the beach or the hinterland on the weekend. You're wrong to assume that the dogs whether adopted or fostered where ever deprived of anything. Shock collars are barbaric but they are means to an end, and an effective one once all options are exhausted.

Plus the whole, let your dogs roam free in a country like Australia would come at a significant cost to the local wildlife where wild dogs and cats are pests and not all of us live on farms, I'm not sure what you would have suggested for the dogs under my care. Or if your opinion here is almost entirely biased?


Same issues here. I dread having to unplug or shutdown my MBP connected to 2 external monitors because it means I'll have to reconfigure the displays.


As of 2015, there were 9 car models without recorded deaths - https://www.nbcnews.com/business/autos/record-9-models-have-...

I'm not sure how statistically significant the numbers are, but it's something!


More complete driver death rate statistics can be found here.

http://www.iihs.org/iihs/topics/driver-death-rates


If I'm reading this correctly, those cars nave no "driver deaths" that means, that no one has died while driving those cars. This is a bit like a shotgun, which has potentially killed many people, but has not killed its owner.


Imagine if everyone was only allowed to drive those 9 car models.


Then there would be deaths. The stats are skewed significantly towards vehicles of which there are more of so you'd always have to discount this figure by how many vehicles there were of that type to begin with and to be even more accurate how many passenger miles were driven with those vehicles.


I'm a pedestrian. I don't drive. I'm sure the cars don't have air bags on their bumpers.


Actually some Volvos do have pedestrian air bags!

(But your point still stands, of course.)


Anyone here have thoughts on why, all these years later, Amazon still doesn't have a sort option along the lines of these proposals? It seems like such an easy win and an easy technical change. Do they have some business reason not to change their default sort?


I'm not sure what you mean -- could you elaborate?

Amazon probably doesn't use straigt score averaging to decide "best" items sort, and this is just proposals of how to change that to be better by not just using averages. So what is it you're looking for Amazon to add?

Disclaimer: work at Amazon, not on anything search related.


Amazon has the default "Featured" sort (I'm not sure what is behind this, but it intuitively seems like some combination of popularity + availability + rating). If this default doesn't fit your needs, your only option is to change to sort by "Avg. Customer Review", which gets you a list that is sorted by average rating regardless of the number of reviews. Evan called out nearly 10 years ago in the post that OP's article mentioned - http://www.evanmiller.org/how-not-to-sort-by-average-rating..... The root problem is that one random obscure product with a single 5-star rating out-ranks something with 499 5-star ratings and 1 4-star rating.

I'm often looking for what is the best/highest-quality item in a category, meaning I want not just a high average, but a high average that is statistically meaningful. I'm just surprised Amazon hasn't offered a way to do that (and have read umpteen threads on HN in the past years expressing the same frustration).


....Default 'featured' sort?

When I go to Amazon.com and search, I see 'relevance' as my default, with 'featured', some price related ones, 'average', and 'new' as options. ('Featured' only seems to exist on some products, and be related to ads.)

Is it not the same for you?

-----

As for your main point (because I think that your complaint is still valid even with 'relevance' as the default), it sounds like what you want is a way to choose what factors are applied to your sort.

I'm not sure, but it seems likely that 'relevance' is doing more than just averaging, and so being able to select which parts you apply (eg, only use a statistical notion of best, don't consider availability or shipping times) would cover your usecase, right?

Well, you might want to be able to choose between a few models of 'best', but the real issue, the core need, is that you want control over the model that Amazon is using to sort what you see and to have some input on what that looks like. (And not just have 'lolsux' or 'Amznsort', to be a little glib.)

Gotta say, that actually sounds like a pretty reasonable ask. I'm not sure why it doesn't work that way, either.


Yeah, my above comment was not using a text search, hence no "relevant" option (i.e. if you just drilled down the department hierarchy to, say, the TVs department).

> the core need, is that you want control over the model that Amazon is using to sort what you see and to have some input on what that looks like

Indeed, but I'm not even looking to have that much granular control over it. I just want "sort by rating, but toss out all the obscure crap that has 1 or 2 ratings, because that rating is meaningless."


Nothing warms our icy, cold, statistical hearts quite like hearing that a randomly chosen person is near the median. <3


Unlike the statistician who has his legs in the freezer and his head in the oven but who is on average the right temperature.


I don't know who this James Mickens fellow is, but I like the cut of his jib.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: