for pass@1 HumanEval tells how well the model solves a task from a set, given only one chance to solve it. It's not the perfect metric, there're other like DS-1000, MBPP (we have included them on HuggingFace model card). HumanEval is good for benchmarking with other models as it gives a fast idea how powerful the model is.
my understanding is that there are 2 usages of the pass@{number} syntax. the HumanEval/Codex paper interprets the {number} as number of attempts[0]. however language modelers seem to use it to denote the number of few shot example demonstrations given in the context. these are starkly different and i wish the syntax wasnt overloaded
> Kulal et al. (2019) evaluate functional correctness using
the pass@k metric, where k code samples are generated
per problem, a problem is considered solved if any sample
passes the unit tests, and the total fraction of problems
solved is reported.
we want to help developers who need either on-premise or permissive code assistant, copilot has neither of this. We also wanted to lower the barriers for self-hosting, so that the model is available on most GPUs with just 3GB Ram. Plus making the code completions fast and efficient (understanding entire context, not just the previous tokens).
We’ve finished training a new code model Refact LLM which took us about a month. The main use-case is for blazing-fast code completion with fill-in-the-middle, additionally, the model could reply to chat prompts.
It has much better performance than all of the code models of similar size, and almost reaches the same HumanEval as Starcoder being 10x smaller in size.
With the small size, it can work with most modern GPUs requiring just 3GB Ram.
How does it compare to Copilot? A metric I'd like to see is % of proposed completions accepted by a human user. If you had an extension that 50% of the time proposed a Copilot extension and 50% of the time proposed a Refact extension (blind to the user) then you could come up with a metric like this.
Is it possible to run it as an LSP so that it can be used in editors other than VSCode and JetBrains? (sorry if this question is completely mad, my understanding of how these things work is extremely limited)
hi, i try to fine tune refact model using evolve code alpaca, but the loss is always bigger than 2, i try some different params but it doesn't work, can you give me some advice?
we try to eliminate this problem by using code models trained only on permissevely licensed code, then you can run them locally without sending code anywhere
we're going in this direction for code models with Refact https://github.com/smallcloudai/refact/ - right now you self-host code models, fine-tune them on local files, get the model running locally inside your IDE