I hope the competition will inspire people to make breakthroughs in the open, so I won't take any rights to the IP, instead the winning solutions must use open source code and open weight models.
1) since we are creating a contamination-free version of SWE-bench (i.e. scraping a new test set after submissions are frozen) it is guaranteed that agents in this contest can't "cheat", i.e., models can't have trained on the benchmark / agents cant memorize answers.
2) as a general rule in life, don't cheat on things (not that there aren't exceptions)
(reposting from locallama and lower down here) yep that's true.
one of my goals is to inspire and honor those that work on open source AI. Those people tend to be motivated by things like impact and the excitement of being part of something big. i know that's how i always feel when i'm around Berkeley and get to meet or work with OG BSD hackers or the people who helped invent core internet protocols.
those people are doing this kind of OSS work and sharing it with the world anyway, without any cash prize. i think of this as a sort of thank you gift for them. and also a way to maybe convince a few people to explore that path who might not have otherwise.
yeah i agree. one of my goals is to inspire and honor those that work on open source AI.
people who work on open source tend to be motivated by things like impact and the excitement of being part of something bigger than themselves - at least that's how i always feel when i'm around Berkeley and get to meet or work with OG BSD hackers and people who helped invent core internet protocols or the guys who invented RISC or more recently RISC-V
those people are going to do this kind of OSS work and share it with the world anyway, without any cash prize. i think of this as a sort of thank you gift for them. and also a way to maybe convince a few people to explore that path who might not have otherwise.
I tweeted this on stage at NeurIPS on weds: "I'll give $1M to the first open source AI that gets 90% on this sweet new contamination-free version of SWE-bench - http://kprize.ai" - K Prize is a new kaggle competition
I tried to contrast to `expect` in a couple of my other responses, but yeah this is my sense too after looking briefly at `expect` - that ht always transparently sets up a terminal for you under the hood and you interact with that so you can always grab a screenshot of any terminal UI.
I don’t think `expect` is targeted at this use case (though I am only learning about `expect` right now so could be wrong)
andyk here. it's clear our readme is lacking use cases! adding some now. When we introduced ht on twitter I gave a little more context -- https://x.com/andykonwinski/status/1796589953205584234 -- but that should have been in the project readme.
Also a few people comparing to `expect`. I haven't used `expect` before, but it looks very cool. Their docs/readme seem only slightly more fleshed out than ours :-D
Looks like the main way to use expect is via:
spawn ...
expect ...
send ...
expect ...
etc.
so, the expect syntax seems targeted more towards testing where you simultaneously get the output from the underlying binary and then check if it's what you expect (thus the name I guess). I can't see if there is a way to just get the current terminal "view" (aka text screenshot) via an expect command?
ht is more geared towards scripting (or otherwise programmatically accessing) the terminal as a UI (aka Terminal UI). So ht always runs a terminal for you and gives you access to the current terminal state. Need to try out expect myself, but from what I can tell, it doesn't seem to always transparently run a Terminal for you.
There might already be some other existing tool that overlaps with the ht functionality, but we couldn't find it when looked around a bunch before building ht.
Sorry, my wording wasn't very clear. I wasn't trying to imply that ht is more geared towards scripting than `expect` (in fact I'd say `expect` is more scripting-oriented being an extension of a scripting language) but rather that ht is more geared towards scripting the terminal as a UI than `expect`.
Am I wrong about that? (I may very well be since I haven't used `expect` before)
Based on my understanding/recollection of `expect`, the concept is that you're scripting a command/process (or command sequence) via a "terminal connection" (or basic stdin/stdout), based on the (either complete or partial) "expected" dialogue response.
I guess that might in theory be possible to script a TUI with but I suspect it'd get pretty convoluted over an extended period of time.
(BTW I mentioned this in a comment elsewhere in this thread but check out https://crates.io/crates/termwiz to avoid re-inventing the wheel for a bunch of terminal-related functionality.)
I see what you're saying. When I was writing scripts in `expect`, I didn't really ever try to automate tui programs. So, this could absolutely be a better way to script the the terminal as a ui as you said.