Remember seeing this a few years ago and love the idea of "zapier but for developers". Having just been building our Zapier integration, I'm think i'm even more of a fan of the concept. Zapier is so clicky and feels so limited. (and expensive if we were to encourage our customers to use it!)
Can I make an integration for others? Or is that stuff all done by your team?
We do a little work to create the shell of the app integrations, e.g. we add the logo, for OAuth apps we configure the details of the OAuth authorization / refresh process, etc. Then you can develop any sources or actions for the app and publish them for all Pipedream users.
It allows us to leverage domain expertise much more quickly and efficiently. Since we we can extrapolate from just a handful of examples, we can "solve" the easy parts of the dataset and get to "the crux" of the labelling problems much faster. The app also allows data scientists to interact deeply with the data in a way that is productive and meaningful, as it forces you to think like a model. This ultimately leads to faster model development cycles.
In a way, the weak labels generated are just a nice bonus for us!
Just like to clarify that this goes beyond a rule-based system. Rules can get you pretty far[1] but this improves on that by intelligently discounting the bad rules using weak supervision techniques. The end result here is a pile of labeled data which you train your model on. The model trained on this data can generalise well beyond those labels.
[1]: Aside: working at Alexa, I was surprised that something like 80% of utterances were covered by rules rather than an ML model. People have learned to use Alexa for a small handful of things and you can cover those fairly well using a way to generate rules from phrase patterns and catalogs of nouns.
I have respect for Andrew Gelman, but this is a bad take.
1. This is presented as humans hard coding answers to the prompts. No way is that the full picture. If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.
2. What is actually happening is far more interesting and useful. I believe that OpenAI are using the InstructGPT algo (RL on top of the trained model) to improve the general model based on human preferences.
>This is presented as humans hard coding answers to the prompts. No way is that the full picture. If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.
It's presented as humans hard coding answers to some specific prompts.
I feel like this is mostly people reactign to the title instead of the entire post. The author's point is:
>In some sense this is all fine, it’s a sort of meta-learning where the components of the system include testers such as Gary Smith and those 40 contractors they hired through Upwork and ScaleAI. They can fix thousands of queries a day.
>On the other hand, there does seem something funny about GPT-3 presents this shiny surface where you can send it any query and it gives you an answer, but under the hood there are a bunch of freelancers busily checking all the responses and rewriting them to make the computer look smart.
>It’s kinda like if someone were showing off some fancy car engine but the vehicle is actually being powered by some hidden hamster wheels. The organization of the process is itself impressive, but it’s not quite what is advertised.
>To be fair, OpenAI does state that “InstructGPT is then further fine-tuned on a dataset labeled by human labelers.” But this still seems misleading to me. It’s not just that the algorithm is fine-tuned on the dataset. It seems that these freelancers are being hired specifically to rewrite the output.
> If you try out his prompts the responses are fairly invariant to paraphrases. Hard coded answers don't scale like that.
This is discussed:
>> Smith first tried this out:
>> Should I start a campfire with a match or a bat?
>> And here was GPT-3’s response, which is pretty bad if you want an answer but kinda ok if you’re expecting the output of an autoregressive language model:
>> There is no definitive answer to this question, as it depends on the situation.
>> The next day, Smith tried again:
>> Should I start a campfire with a match or a bat?
>> And here’s what GPT-3 did this time:
>> You should start a campfire with a match.
>> Smith continues:
>> GPT-3’s reliance on labelers is confirmed by slight changes in the questions; for example,
>> Gary: Is it better to use a box or a match to start a fire?
>> GPT-3, March 19: There is no definitive answer to this question. It depends on a number of factors, including the type of wood you are trying to burn and the conditions of the environment.
> This is presented as humans hard coding answers to the prompts. No way is that the full picture...
This is something of a misrepresentation of what is being proposed here, which is actually essentially what you suggest: "OpenAI are using the InstructGPT algo (RL on top of the trained model) to improve the general model based on human preferences."
One of the things that makes GPT-3 intriguing and impressive is its generality. InstructGPT is the antithesis of that - its purpose is to introduce highly targeted influences on GPT-3's output in specific cases and sometimes ones very similar - and its use improves the output at the cost of diminishing the performance. Furthermore, if the output is being polished in cases like those presented here, that would impede a frank assessment of its capabilities.
It depends what stage you hardcode. Similarly to how you can say "ok Google, what time is it" in any voice and get a different time every run; the speech recognition is not hardcoded, the speaking the time is not hardcoded, but the action is.
Likewise, they can plug holes here in there by manually tweaking answers. The fact that it's not an exact-prompt-to-exact-result rule doesn't make it less of a fixed rule.
It makes sense for GPT-3 to thoroughly explore a search space only after repeated and similar questions.
The answers to, "Why did Will Smith slap Chris Rock?" will be much different five seconds after the event compared to five days after. Of course you would expect the Academy Awards to be part of the answer five days later, because practically every news article would mention the venue.
Going even further, a simple (undergrad-level) language model would detect the nominative and accusative, so you might even get a correction as an answer if you ask, "Why did Chris Rock slap Will Smith?"
Seven thousand people might ask this same question, while nobody wonders what the best rugby ball chili recipe is. GPT-3 will never try to organically link those ideas unless people start asking!
I'd even venture that negative follow-up feedback is factored in. If your first reaction to an answer is, "That was WRONG, idiot!" this is useful info!
Then again, if a negative feedback function exists, adding a human to the loop should be simple (and effective).
-----
Is 40 a weak army? It depends on whether they are classifying questions randomly/sequentially or if they hammer away at the weakest points... grading Q/A pairs (pass/fail) based on a mix of high question importance and strong uncertainty of the answer.
I agree. I suppose as an outsider learning about AI, first thoughts might be “wow look at all the things it can’t do”. But as someone who follows closely all I notice is how rapidly the list of things it can’t do is shrinking.
What are the risks of doing this? I would love to ramp up the nits for outside work, but presumably it's been limited to 500 nits for SDR for a reason.
> 1. If most of the screen is not full bright white, (e.g. white text on dark background), then LEDs will have plenty of time to cool down
Note that the display is not an OLED display, but a regular IPS LCD with local dimming zones for the backlight[1]. Thus only the dimming zones not covered by white text would get to cool off.
This also points to another downside of pushing up the nits: it will likely increase the bleed-through of the backlight, driving up the black level especially in white-text-on-black scenarios.
They added this 500 nits limitation to always leave room for HDR, so that SDR and HDR "coexist" properly. But it looks terrible imo I think they failed on that front.
Also they want HDR to be a selling point. Not many phones take HDR videos and images. "It goes brighter" is what makes HDR pop.
What makes it look terrible? The majority of consumer displays max out at 300-400 nits. This is the whole point of HDR, webpages are not designed with “sunlight-white” intended for the background.
I don't know what the cause is (the iPhone camera or the display methodology), but whenever I watch an HDR video shot on my iPhone on the MacBook Pro XDR display it looks really unnatural and bad. HDR demo YouTube videos in full screen look fine, so maybe it's the iPhone, or it's the integration with the SDR UI, not sure.
This is not the right way to think about it. What nits are books designed for? Do books look bad when you read next to a window despite being well past 1,600 nits?
The comfortable amount of nits is entirely dependent on ambient lighting conditions.
Next to a window, 1600 nits can look dim.
What Apple is doing now looks terrible because SDR and HDR do not coexist well at all. Any HDR video even ones that you wouldn't think would be "bright", blow up the screen while your SDR content next to it is now hard to read. HDR and SDR should have a comparable average brightness level with only "highlights" going "brighter than bright", but that's not what they do, they make the entire video brighter. People want uniform brightness. If I want my entire screen to go bright, I can set the damn brightness myself.
> HDR and SDR should have a comparable average brightness level with only "highlights" going "brighter than bright"
This is exactly how it works. When I open a video and switch between 500 nits and 1600 nits in Display Preferences, everything looks exactly the same, except for the highlights in the video. Of course outdoors daylight scenes will have a higher average brightness.
There is no point to 'high dynamic range' if a bright sky is the same brightness as a sheet of paper, or your website background; it's supposed to represent real-life brightness and contrast in photography & video, and there is no reason for your graphical interfaces to reach those light levels, i.e. Slack's white background should not be close in brightness to a cloudy sky. If they didn't "limit" non-HDR content to 500 nits (again, already above the average monitor), how would it be possible to have HDR at all?
I don't see how the content could become harder to read, when the light output is exactly the same. Your description reminds me of a shitty Benq 'HDR400' monitor, which would artificially limit the light output for non-HDR content making it gray and dull. That is not the case with the mini-led macs or better HDR monitors.
Some examples of decent HDR - these look fine next to a browser or anything else on a MBP 14", and also in Windows with HDR on:
Not in my experience. I've seen iPhone videos that were entirely in the HDR range despite being indoor scenes in ambient lighting lit from a single window.
> There is no reason for your graphical interfaces to reach those light levels,
Again, I'm using Lunar right now and I find it useful sitting in next to my window. Not sure how that can be considered "no reason". It's literally useful to me right now as I use it.
> how would it be possible to have HDR at all?
It doesn't have to be possible! I don't use HDR. I don't care that theoretically HDR content would clip. It's not on my screen.
I'd rather HDR just use the same brightness range as SDR. Let me choose myself where HDR maps compared to SDR. I really do not want HDR as it is implemented today.
> I don't see how the content could become harder to read,
My eyes adjust to the HDR average picture level, and then SDR appears too dim.
Some of these problems stem from bad HDR tone mapping. But regardless, I want to control where SDR white maps because my workspace is bright. And I enjoy well lit rooms.
I have a Samsung monitor that does this too, but it has an override in its menus. In normal mode it’s stupidly dim. HDR looks like shit, so I just want SDR with proper brightness.
It always looks washed out. Is it perhaps only for certain games or image/video editors that support it properly? Or does the support overall just suck?
I wrote this to try to clarify the space as people often talk about different things with HITL. Some mean active learning, others mean 'worker in the loop, researchers sometimes mean 'users in the loop'.
So, three main categories to HITL:
- HITL training -- e.g. active learning and interactive machine learning development
- workers in the loop -- the old school mechanical turk idea, but with a model only falling back to the worker when it's unsure
- users in the loop -- getting the user to steer the AI response, e.g. Smart Reply in gmail.
Although they're not applicable in all cases, we're starting to see far more companies adopt these approaches as it solves several problems with AI.
For example, Amazon Alexa (worked there) can't do standard user-in-the-loop. With voice interaction the user does not have patience to be read out a list of options. However, it does get weaker signals where the stops the current action ("alexa stop!"), and that informs the next action. Active learning is getting adopted there too, as with millions of live utterances coming through, reducing their annotation efforts can be a huge cost saving.
Not how i intended to kick off the discussion but is anyone else seeing really messed up formatting? Like this https://ibb.co/5LF2fY0 (bit of mare today getting ghost on a subdirectory...)
This does feel like a big part of the future. An AI coach, which expands on your text and tailors it to you're style and to the p̵r̵e̵f̵e̵r̵e̵n̵c̵e̵s̵ optimized for persuation of the recipient.
Does feedback fine tune or customise the model behind this?