Warning: This is AI generated, probably a low end model as some of the content is outright nonsense eg:
"""
concept of MoE is quite prevalent (refer Outrageously Large Neural Networks: the Sparsely-Gated Mixture-of-Experts Layer), with Langchain’s high-level implementation of an LLMRouterChain, and notable low-level integrated examples
"""
GPT-4 was clearly trained to fix typos and handle not well written written requests. That much is visible directly from just using it within chatGPT UI in normal usage and fits common user scenarios (eg fix my bad draft). We know it was trained on social media data from Reddit much of which is not great writing either. Now I'm wondering if it was trained on (imperfectly) OCRed data too...
I wonder if it's more of an emergent property you get for free with LLMs rather than something that needs specific training. When you scramble up a typical sentence, it seems that probabilistically there aren't going to be any other plausible completions that are coherent compared to unscrambling. It's basically unscrambling vs. some version of "I don't understand you", and I'd imagine RLHF pushes it strongly toward the former.
I haven't read the paper so I'm not sure if they did this, but it would be interesting to see at what point it breaks down. Just scrambling up letters within words makes it pretty easy for the LLM; what if you also start moving letters between words, or take out the spaces between words?
> Now I'm wondering if it was trained on (imperfectly) OCRed data too...
Or perhaps they inserted typos automatically in the training set as data augmentation. Tactics like that is known to increase the roboustness of some models, so why not?
Yup totally plausible. Things like word (token) dropout and inserting random uniform noise into embeddings or just edit distance perturbations to the tokens are all well known but still Figure 1 looks extremely impressive.
It is trained on data which may include typos, but that is very different from fixing typos. It knows what words likely come after typos in the same way it knows what words likely come after regular words.
No, that's not what I meant. I meant that in its reinforcement learning phase, GPT saw examples of "fix this text" style requests and was rewarded for doing a good job. That's different from seeing examples of typos and still predicting the right word which happens during the language model self supervised training. Both likely help it be good at it.
Transformer is the architecture. "Generative Pretrained" is just a term made up by the author to mean what everyone called for decades before and will call for decades after "Language Modelling". It was just a new way of saying "Language Modelling Transformer" that sounded cooler to the author and gave it cool initials. Coming up with cool names for models is hard.
Yikes! good luck, hope it works out. This is scary. Gmail is still, by far, the best email app. Something like this could happen to me and, I suspect, a lot of other people.
These ToS, CSAM, moderation, etc are very orthogonal to the app itself. Why conflate them? Most non-hackernews people are completely unaware of them, so it won't be a factor in their choice of platform. Are we even sure other apps aren't doing similar things? Or have equally dangerous practices around password changes?
But switching to Fastmail, I no longer agree that it is "by far" the best. Now I think Gmail is only better by a slight margin, and this margin is so small that it does not justify the drawbacks: Potentially getting locked out with no recourse, certainly getting everything you receive scanned to deliver you the best possible ads, contribute to the email monopoly where the big players decide the protocols.
I know it seems lo-tech, but it's a good idea to print out the ten Gmail recovery codes and keep them in a safe place. I've done the same for GitHub too.
I find this opinion absurd. It is an e-mail client. Even if it did perform the tasks that an e-mail client needs to perform somewhat better than all other clients, the loss of autonomy makes it a terrible deal, even for non-technical users.
I don't know what being in "tools for thought space" means but if it just mean "Roam or similar user", I can give my personal answer: I set up 2 or 3 templates when I started. The most used one I updated once since then. That's it. The time I spent doing this is essentially 0% of my Roam use time.
I can see how someone who checks out blogs/youtube about Roam-like tools might get the wrong impression but those are usually a business so they have to output content.
It also comes down to personality, I suspect some people have a tendency to optimize tooling as a form of procrastination.
Must've been quite the statistical masterpiece to disentangle it from the World Wars and the Great Depression. Off course extrapolating conclusions from early 20th century education and career progression to the 21st will be an even greater achievement.
In addition, I think it would be very hard to separate the effect of school closures from the effect of kids losing their parents to the flu. Those things might have traded against one another: More school closures, fewer parents catching the flu from their kids and dying or being incapacitated for a long time.
Research shows that doing the recall is the important part. The effect is by now well accepted. Off course if you do the cards yourself you can guarantee both their quality and their relevance to you. Ive heard of research pointing that learning things that are relevant to you helps. (If you are learning Spanish to speak with a relative or friend a flash card with an obscure literary word is probably more annoying than helpful). early episodes of Learning Scientists podcast talk about using vs creating flash cards if you want to know more.
I would if I could - I have an external GPU at home. Unfortunately Apple is (not without reason) angry at nvidia so they dropped support for Nvidia in Mac OS. I’d have to use Windows which is a big no no for me. Obvious pytorch can’t support it.