More

andyk · on Dec 16, 2024

I hope the competition will inspire people to make breakthroughs in the open, so I won't take any rights to the IP, instead the winning solutions must use open source code and open weight models.

andyk · on Dec 16, 2024

yes the prize money is from me to the winners

andyk · on Dec 16, 2024

That has a double meaning - half tongue in cheek.

1) since we are creating a contamination-free version of SWE-bench (i.e. scraping a new test set after submissions are frozen) it is guaranteed that agents in this contest can't "cheat", i.e., models can't have trained on the benchmark / agents cant memorize answers.

2) as a general rule in life, don't cheat on things (not that there aren't exceptions)

andyk · on Dec 16, 2024

(reposting from locallama and lower down here) yep that's true.

one of my goals is to inspire and honor those that work on open source AI. Those people tend to be motivated by things like impact and the excitement of being part of something big. i know that's how i always feel when i'm around Berkeley and get to meet or work with OG BSD hackers or the people who helped invent core internet protocols.

those people are doing this kind of OSS work and sharing it with the world anyway, without any cash prize. i think of this as a sort of thank you gift for them. and also a way to maybe convince a few people to explore that path who might not have otherwise.

andyk · on Dec 16, 2024

andy here - happy to answer questions.

Also, I answered a bunch of questions yesterday on LocalLLaMA that people here might find interesting https://www.reddit.com/r/LocalLLaMA/comments/1hdfng5/ill_giv...

andyk · on Dec 14, 2024

yeah i agree. one of my goals is to inspire and honor those that work on open source AI.

people who work on open source tend to be motivated by things like impact and the excitement of being part of something bigger than themselves - at least that's how i always feel when i'm around Berkeley and get to meet or work with OG BSD hackers and people who helped invent core internet protocols or the guys who invented RISC or more recently RISC-V

those people are going to do this kind of OSS work and share it with the world anyway, without any cash prize. i think of this as a sort of thank you gift for them. and also a way to maybe convince a few people to explore that path who might not have otherwise.

andyk · on Dec 13, 2024

I tweeted this on stage at NeurIPS on weds: "I'll give $1M to the first open source AI that gets 90% on this sweet new contamination-free version of SWE-bench - http://kprize.ai" - K Prize is a new kaggle competition

andyk · on June 4, 2024

I tried to contrast to `expect` in a couple of my other responses, but yeah this is my sense too after looking briefly at `expect` - that ht always transparently sets up a terminal for you under the hood and you interact with that so you can always grab a screenshot of any terminal UI.

I don’t think `expect` is targeted at this use case (though I am only learning about `expect` right now so could be wrong)

sesm · on June 5, 2024

Is it fair to say that 'expect vs ht' is the same as 'curl vs headless browser'?

ku1ik · on June 5, 2024

I believe it’s a pretty good analogy, yes.

andyk · on June 4, 2024

andyk here. it's clear our readme is lacking use cases! adding some now. When we introduced ht on twitter I gave a little more context -- https://x.com/andykonwinski/status/1796589953205584234 -- but that should have been in the project readme.

Also a few people comparing to `expect`. I haven't used `expect` before, but it looks very cool. Their docs/readme seem only slightly more fleshed out than ours :-D Looks like the main way to use expect is via:

  spawn ...
  expect ...
  send ...
  expect ...
  etc.

so, the expect syntax seems targeted more towards testing where you simultaneously get the output from the underlying binary and then check if it's what you expect (thus the name I guess). I can't see if there is a way to just get the current terminal "view" (aka text screenshot) via an expect command?

ht is more geared towards scripting (or otherwise programmatically accessing) the terminal as a UI (aka Terminal UI). So ht always runs a terminal for you and gives you access to the current terminal state. Need to try out expect myself, but from what I can tell, it doesn't seem to always transparently run a Terminal for you.

There might already be some other existing tool that overlaps with the ht functionality, but we couldn't find it when looked around a bunch before building ht.

metadat · on June 4, 2024

Expect is The Original Way, and has been the standard since before I learned to program more than 20 years ago. :-D

Expect is also extra cool because of `autoexpect'.

  generate an Expect script from observing a (shell) session

https://manpages.ubuntu.com/manpages/focal/en/man1/autoexpec...

m0shen · on June 4, 2024

`expect` is absolutely geared towards scripting, as it's an extension of TCL. Though as far as getting a "current terminal view" `expect` has `term_expect`: https://core.tcl-lang.org/expect/file?name=example/term_expe...

andyk · on June 4, 2024

Sorry, my wording wasn't very clear. I wasn't trying to imply that ht is more geared towards scripting than `expect` (in fact I'd say `expect` is more scripting-oriented being an extension of a scripting language) but rather that ht is more geared towards scripting the terminal as a UI than `expect`.

Am I wrong about that? (I may very well be since I haven't used `expect` before)

follower · on June 5, 2024

Based on my understanding/recollection of `expect`, the concept is that you're scripting a command/process (or command sequence) via a "terminal connection" (or basic stdin/stdout), based on the (either complete or partial) "expected" dialogue response.

e.g.

1. make initial connect over ssh (e.g. spawn ssh cli process) 2. expect "login: " response 3. send "admin" 4. expect "password: " response 5. send "password" 6. expect "$ " 7. send "whoami\n" 8. etc etc

I guess that might in theory be possible to script a TUI with but I suspect it'd get pretty convoluted over an extended period of time.

(BTW I mentioned this in a comment elsewhere in this thread but check out https://crates.io/crates/termwiz to avoid re-inventing the wheel for a bunch of terminal-related functionality.)

m0shen · on June 5, 2024

I see what you're saying. When I was writing scripts in `expect`, I didn't really ever try to automate tui programs. So, this could absolutely be a better way to script the the terminal as a ui as you said.

I'll certainly tuck it into my toolbox. Thanks :)

dheera · on June 5, 2024

Thank you for this, this is exactly what I was needing last week and got into a hellhole of wrapping Python subprocess pipes in a class.

andyk · on June 4, 2024

thanks for surfacing `expect` to our attention. I'll add a compare/contrast to the ht readme