What does "ratio (bpb)" mean? I'd guess bytes-per-byte or something, like how many bytes of original you get for each byte of compression, but it doesn't work out: the original size is 1e9 bytes, compressed (rounded) 3.2e8, so that's a ratio of 3.1 (1e9/3.2e8=3.09989). The program size amounts to a rounding error on that figure. The bpb value given is 2.58, nowhere near 3.1.
Edit: the paper defines it as "bits per input byte". What kinda measure is that, it's like "how well did it compress as compared to a factor 8", why 8?!
The bit is the most fundamental unit of information. A base-e unit might be more elegant from a certain mathematical perspective, but the connections to formal logic and the ease of implementation make the base-2 bit the natural choice. At least when talking about things like information, entropy, and compression.
Bytes, on the other hand, are entirely arbitrary. At some point, the industry converged to using groups of 8 bits as the primary semantically meaningful unit smaller than a word. Probably because people at that time thought that having 256 distinct characters would be more or less the right choice. And because groups of power-of-2 bits are convenient on hardware level.
Entropy is usually expressed as bits per symbol (or bits per character), because that's what you get when you sum -P(c) log P(c) over all symbols c. People who are used to that convention often extend it to representing compression ratios. Using bits per byte is rare, because bytes are rarely semantically meaningful.
It's a common way to represent entropy (the information content). One could measure bits per x for any x, of course, but bits per character (nee byte) is quite common and goes back to Shannon.
Deceptive habits like this give ML a really bad reputation and gives me so little confidence that this technology will be used responsibly as ML becomes increasingly powerful.
Didn't google used to incorperate this into their search rankings? There's no way they still do that today right? Imagine how amazing it'd be if they suddenly turned on the button to downrank pages as a function of how many cookie banners and autoplaying videos etc you have to click away to get to the content.
I can't tell if you're being sarcastic. Until a few weeks ago, both monkeypox and smallpox were listed as airborne diseases. It's been changed in public health doc's since monkeypox started to spread, but not because it has changed. They're just trying to downplay it.
"Human-to-human transmission occurs through close proximity or direct physical contact (e.g., face-to-face, skin-to-skin, mouth-to-mouth, mouth-to-skin contact including during sex) with skin or mucous membranes that may have recognized or unrecognized infectious lesions such as mucocutaneous ulcers, respiratory droplets (and possibly short-range aerosols), or contact with contaminated materials (e.g., linens, bedding, electronics, clothing)."[1]
"Intubation and extubation, and any procedures likely to spread oral secretions should be performed in an airborne infection isolation room."[2]
Yes, it is, but IIRC the term "airborne" can also refer to disease particles that can survive in the air unencapsulated (such as certain fungi), and can therefore travel quite some distance, and can remain hanging in the air for hours.
Aerosols are heavier than air, and therefore have a very limited range and duration in which the virus can remain "airborne" in common parlance.
(edit: expanded the definition to include more than just viruses as I couldn't find an example of a virus that can survive unencapsulated)
"Shadowbanning" is if you do that without telling the user that they are banned, with the goal of them not realizing they are banned for a while so they waste time instead of trying to circumvent the ban.
> Writers have a responsibility to ensure that they take reasonable steps to ensure the accuracy of their statements, and that those statements are clear.
What a crazy take. Writers can write whatever they want. A book of poetry or prose doesn't have to meet some HN rationalist level of accuracy and rigour.
Writers can do whatever they want, sure, but actions which make their intention harder to parse also make it harder to engage with their work. Since writers who publish to the public want others to read their work, it becomes their responsibility in a capacity to write in a way that makes their intended message comprehensible.
Said another way, there is no law that forces me to write in English. I could interleave uncommon French and German across this comment (borrowing terms from other languages is nothing new!).
But if I flexed language proficiency without regard to which words are commonly known, it may make my comment less accessible, and therefore you may choose not to engage with what I'm saying or respond. That might not be a problem for me, but regardless the decision is my responsibility.
The responsibility for clarity does not come from someone else - it is an extension of the decision to publish.
This does extend to prose and poetry by the way. Both are written to conventions within the disciplines and follow patterns. Those patterns are very different to ordinary speech but they are certainly there, and readers judge writers on those merits.
It isn't to say a writer is lesser as a person for writing out of convention, but the convention provides a framework to interpret the art. If the convention isn't broken for a meaningful reason, it can detract from the wider message.
I was initially annoyed by this title but now I'm gonna switch my perspective to being happy that ideas like this are floating around since it acts as a really cheap signal to tell if someone knows what they're talking about or not when it comes to ML.
Another great option is https://textsynth.com/playground.html (made by the very impressive developer Fabrice Bellard - of linux in javascript and world pi digit calculation fame). He deserves some money funneled through that site for his efforts over the decades (and the output is about as good as gpt-3 imo).
If you copy a piece of GPL code but change the whitespace or some arbitrary number that produces the same output, does that still still break the license (if you don't make the source public)?
I'm not sure what you mean by that, the GPL only has constraints for redistribution. If you don't distribute anything (publicly or not), there is no way to break the GPL.
Assuming the code you modify is protectible in the first place, then yes, this sort of trivial modification or derivative would be covered by GPL and have to abide by its rules.
But just because you slap a license on something does mean you actually can enforce it.