Do you think this would be appropriate for a command line tool that hits various apis as the function calls? Ex: "what's the weather in SF tomorrow?" Or "daily price change of apple, Tesla stock for past week"? (Let's assume I have documented the apis thoroughly somewhere that the model has access to or fine tuned it on this data)
Hi, also on the FunctionGemma team! Something like this would be a good use case for the model. Based on how complicated the API is you might need to finetune it (we released a colab that guides you through the experience + how to export/run it locally). Generally better tool descriptions help although if it is something very complicated finetuning would be better.
Does anyone know what the state of the art industry solvers do for these problems? I had dabbled a bit in ml approaches to combinatorial optimization with great interest a few years back, but I don't think any of these rl based methods ended up being used in production.
The state of the art solvers are the proprietary ones like Gurobi, FICO, Cplex, Mosek, etc. A major contributor to the proprietary "sauce" is in the heuristics they use. For example, all solvers will have a "presolve" phase which attempts to eliminate redundant constraints/variables. There may be some ML they are using behind the scenes to derive these heuristics, I'm not sure, although I know it is a major research area.
Otherwise, the basic underlying algorithms are all the same, as in the textbook: branch-and-bound and so on.
For the sake of comparison, you can train a 124M model on a 3090 (see nanoGPT). In that case, each batch ends up having about 500,000 tokens and takes maybe around 10ish seconds to run forward and backward. Then the 6 trillion tokens that this model was trained on would take about 4 years, approximately. Or just "too long" for a shorter answer.
The world reasonable is vague but assuming you mean something that could be run in a residential unit it would long a very long time if training from pure scratch.
This is part of the rationale for releasing this model. Now you don't have to start from scratch and finetuning is reasonable on a wide variety of hardware, including reasonable GPU setups (and smaller)
True in most countries. The president or more generally the chief of the executive often has legal immunity. It makes sense because that are the law, at least in part.
In democracies there a usually some protection against abuse of that power (ex: impeachment).
You're conflating a president (highest executive) with a monarch. Perhaps on purpose given current goings on, but a key distinction between monarchies and democracies is explicitly that all people in the country are subject to the same laws and there is no sovereign immunity.
The Monarch also needs permission from the Mayor of the City of London to enter the city, so we do need to make a distinction between de jure and de facto law here.
> It is sometimes asserted that the Lord Mayor may exclude the monarch from the City of London. This legend is based on the misinterpretation of the ceremony observed each time the sovereign enters the City at Temple Bar, when the Lord Mayor presents the City's Pearl Sword to the sovereign as a symbol of the latter's overlordship. The monarch does not, as is often purported, wait for the Lord Mayor's permission to enter the City. When the sovereign enters the City, a short ceremony usually takes place where the Lord Mayor presents a sword to the monarch, symbolically surrendering their authority. If the sovereign is attending a service at St Paul's Cathedral this ceremony would take place there rather than at the boundary of the City, simply for convenience.
From what I've heard, the llama3 models are fairly easy to fine-tune (please correct me if I'm wrong or if there are more amenable models here). How easy is it to finetune smollm3? I know a lot of the MoE LLMs have been quite fickle in this regard.
I think this is still an incredible outcome given how many dice rolls you can take in parallel with multiple claude/o3/gemini attempts at a problem with slightly different prompts. Granted, each rollout does not come for free given the babysitting you need to do but the cost is much lower than going down the path yourself/having junior colleagues make the attempt.
I guess some prefer to stick with the stdlib instead of third party libs.
Also, dataclasses feels more straightforward and less "magic" to me (in the sense that it is more or less "just" a way to avoid boilerplate for class definition, while pydantic does way more "magic" stuff like de-/serialization and validation, and adding numerous methods and attributes to the classes).
When I started to implement typedload, when types were just introduced, I supported NamedTuple, and then as more things were added, also attrs, dataclasses, typed dict…
What would be the point to require migrating the whole codebase to use something different to use your library?
On the other hand, if you wrote your code from scratch to use basemodel you're pretty much stuck with pydantic.