Hacker Newsnew | past | comments | ask | show | jobs | submit | eachro's commentslogin

Do you think this would be appropriate for a command line tool that hits various apis as the function calls? Ex: "what's the weather in SF tomorrow?" Or "daily price change of apple, Tesla stock for past week"? (Let's assume I have documented the apis thoroughly somewhere that the model has access to or fine tuned it on this data)


Hi, also on the FunctionGemma team! Something like this would be a good use case for the model. Based on how complicated the API is you might need to finetune it (we released a colab that guides you through the experience + how to export/run it locally). Generally better tool descriptions help although if it is something very complicated finetuning would be better.


Both your examples require Internet access so there's no reason not to use cloud-hosted model which would work magnitudes better.


Does anyone know what the state of the art industry solvers do for these problems? I had dabbled a bit in ml approaches to combinatorial optimization with great interest a few years back, but I don't think any of these rl based methods ended up being used in production.


The state of the art solvers are the proprietary ones like Gurobi, FICO, Cplex, Mosek, etc. A major contributor to the proprietary "sauce" is in the heuristics they use. For example, all solvers will have a "presolve" phase which attempts to eliminate redundant constraints/variables. There may be some ML they are using behind the scenes to derive these heuristics, I'm not sure, although I know it is a major research area.

Otherwise, the basic underlying algorithms are all the same, as in the textbook: branch-and-bound and so on.


I know about only one such library, and works great for toy problems: PuLP [0][1].

[0]: https://coin-or.github.io/pulp/

[1]: https://pypi.org/project/PuLP/


Didn't Amazon aquihire Adept Labs?


If you wanted to train it from scratch, how long would it take on a reasonable GPU setup?


For the sake of comparison, you can train a 124M model on a 3090 (see nanoGPT). In that case, each batch ends up having about 500,000 tokens and takes maybe around 10ish seconds to run forward and backward. Then the 6 trillion tokens that this model was trained on would take about 4 years, approximately. Or just "too long" for a shorter answer.


The world reasonable is vague but assuming you mean something that could be run in a residential unit it would long a very long time if training from pure scratch.

This is part of the rationale for releasing this model. Now you don't have to start from scratch and finetuning is reasonable on a wide variety of hardware, including reasonable GPU setups (and smaller)


I'm reminded of the nixon quote: "When the president does it, that means it's not illegal."


It was aspirational then, but after 50 years of working to create the Unitary Executive it is now fact.


Only if we let it be fact. Surely there’s a line.


That line was crossed when we re-elected Mr. January 6th.


Will no one rid me of this meddlesome priest!


At least he seemingly pretended afterwards not to have meant to order "go kill him".

The knights who murdered the archbishop weren't so lucky... my direct ancestor fled to Ireland afterwards (as family legend has it).


The German people of the 1930s would like to have a word...


There isn't.


True in most countries. The president or more generally the chief of the executive often has legal immunity. It makes sense because that are the law, at least in part.

In democracies there a usually some protection against abuse of that power (ex: impeachment).


The UK has sovereign immunity.

The monarch is literally above the law. They cannot be arrested, questioned, tried, or punished for any reason.

Of course it would raise eyebrows if King Charles went on a shooting spree. But what happens behind closed doors is none of the public's business.


Reality: If His Maj just doesn't look fit for purpose, then he can be suspended. Or forced off the throne entirely:

https://en.wikipedia.org/wiki/Regency_Act_1811#Care_of_King_...

https://en.wikipedia.org/wiki/Abdication_of_Edward_VIII


You're conflating a president (highest executive) with a monarch. Perhaps on purpose given current goings on, but a key distinction between monarchies and democracies is explicitly that all people in the country are subject to the same laws and there is no sovereign immunity.


The Monarch also needs permission from the Mayor of the City of London to enter the city, so we do need to make a distinction between de jure and de facto law here.


From https://en.wikipedia.org/wiki/Lord_Mayor_of_London

> It is sometimes asserted that the Lord Mayor may exclude the monarch from the City of London. This legend is based on the misinterpretation of the ceremony observed each time the sovereign enters the City at Temple Bar, when the Lord Mayor presents the City's Pearl Sword to the sovereign as a symbol of the latter's overlordship. The monarch does not, as is often purported, wait for the Lord Mayor's permission to enter the City. When the sovereign enters the City, a short ceremony usually takes place where the Lord Mayor presents a sword to the monarch, symbolically surrendering their authority. If the sovereign is attending a service at St Paul's Cathedral this ceremony would take place there rather than at the boundary of the City, simply for convenience.


"I'm not gonna do it, but I need the legal ability to murder innocent people"


We used to have the concept of the divine right of kings. The current arrangements are a step down from that. Your framing has it back to front.


What would it take to make NYC more like Tokyo where you have consumer/retail level things on the not-ground floor level.


This already exists, especially in the outer boroughs. But of course I'd love to see more of it!


I've seen some of this around ktown. The elevators are always tiny and dingy. Not a fan at all.


Among other things, a culture of shoppers who know to look upstairs


From what I've heard, the llama3 models are fairly easy to fine-tune (please correct me if I'm wrong or if there are more amenable models here). How easy is it to finetune smollm3? I know a lot of the MoE LLMs have been quite fickle in this regard.


"And 50% of the time they work 50% of the time."

I think this is still an incredible outcome given how many dice rolls you can take in parallel with multiple claude/o3/gemini attempts at a problem with slightly different prompts. Granted, each rollout does not come for free given the babysitting you need to do but the cost is much lower than going down the path yourself/having junior colleagues make the attempt.


Is there a reason to use data classes over pedantic base models anymore?


I guess some prefer to stick with the stdlib instead of third party libs.

Also, dataclasses feels more straightforward and less "magic" to me (in the sense that it is more or less "just" a way to avoid boilerplate for class definition, while pydantic does way more "magic" stuff like de-/serialization and validation, and adding numerous methods and attributes to the classes).


I’ve never really gotten along with Pydantic. Something about it just doesn’t feel ergonomic.

If I need something more than dataclasses, I’ll normally go for attrs/cattrs. Dataclasses were originally based on attrs, so it’s not much of a leap.


I never understood why basemodel even exists.

When I started to implement typedload, when types were just introduced, I supported NamedTuple, and then as more things were added, also attrs, dataclasses, typed dict…

What would be the point to require migrating the whole codebase to use something different to use your library?

On the other hand, if you wrote your code from scratch to use basemodel you're pretty much stuck with pydantic.


Speed and size, mainly. If you don't need the data validation there's no reason to use pydantic, it's a huge dependency


did you mean: "pydantic base models" ?


Yeah haha I got autocorrected


speed? Not pulling in a huge dependency?


A lot of people are saying 12gb is too small to do anything interesting with. What's the most useful thing people __have__ gotten to work?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: