Yet it's interesting how we put the blame and punishment on the people being taken advantage of, and not the employers who are exploiting them. If both parties are breaking the law shouldn't we at the very least ensure that the business owner who is exploiting any number of workers is held to the same standard as an undocumented person whose only crime was not having the proper paperwork?
I don't blame them. If I were them, I would do the same thing. However, as someone with the ability to vote and influence (to a very small degree) public policy, I would prefer we move toward a system in which strong labor rights exist in this country, and this is simply impossible in an environment in which employers are free to hire labor off the books for "pennies". To be clear, I think both political parties in the US are terrible, and all of this debate serves the interests of the employers that benefit from this situation.
Because it would hurt our little elitist exceptionalist hearts if we gave an H1B to a construction worker. There are low wage industries that could use such a program, but our little hearts can't take it because "its not the best and brightest".
This is a technology demo, not a model you'd want to use. Because Bitnet models are only average 1.58 bits per weight you'd expect to need the model to be much larger than your fp8/fp16 counterparts in terms of parameter count. Plus this is only a 2 billion parameter model in the first place, even fp16 2B parameter models generally perform pretty poorly.
I'm also confused by that, but it could just be the model being agreeable. I've seen multiple examples posted online though where it's fairly clear that the COT output is not included in subsequent turns. I don't believe Anthropic is public about it (could be wrong), but I know that the Qwen team specifically recommend against including COT tokensfrom previous inferences.
Claude has some awareness of its CoT. As an experiment, it's easy, for example, to ask Claude to "think of a city, but only reply with the word 'ready' and next to ask "what is the first letter of the city you thought of?"
Oops! I tried a couple experiments after writing this, and I believe I was mistaken, though I don't know how. It appears Claude was simply playing along, and convinced me it could remember the choices it secretly made. I must either have given it a tell, or perhaps it guessed the same answers twice in a row.
This administration under through Elon is pushing to cut 50% of NASA's science funding. Mapping galaxies we'll never visit is a purely scientific endeavor. Trump seems to care more about military expansion or for lack of a better term more "masculine" expansion of space. The science stuff is not interesting to him, and I'm honestly not sure I think Musk cares about it that much anymore either.
I'm not a hater (or OP), but I think it's because GLP-1s are often talked about as if they're a kind of miracle drug. And it's very rare that drugs don't have some kind of side effects, especially after long term use. They might not even be universal, and we might not know what they are for years.
Yes, I understand there's an obesity epidemic, I also understand that GLP-1 drugs can have benefits outside of overeating. But with any drug, it's worth being thoughtful about its use.
Although I haven’t used these new models. The censorship you describe hasn’t historically been baked into the models as far as I’ve seen. It exists solely as a filter on the hosted version. IOW it’s doing exactly what Gemini does when you ask it an election related question: it just refuses to send it to the model and gives you back a canned response.
This is incorrect - while it's true that most cloud providers have a filtering pass on both inputs and outputs these days, the model itself is also censored via RLHF, which can be observed when running locally.
That said, for open-weights models, this is largely irrelevant because you can always "uncensor" it simply by starting to write its response for it such that it agrees to fulfill your request (e.g. in text-generation-webui, you can specify the prefix for response, and it will automatically insert those tokens before spinning up the LLM). I've yet to see any locally available model that is not susceptible to this simple workaround. E.g. with QwQ-32, just having it start the response with "Yes sir!" is usually sufficient.
Didn’t go there but worked with a lot of great folks who did. The main thing I think is that they require their students to get industry experience before they graduate.
I don’t remember the actual requirement, unfortunately.
Thanks, is it something different from coop? I checked their graduation requirement and it is indeed interesting:
> Students admitted at the 1A level (except for those in Business Administration and Computer Science double degree), will normally have eight academic terms and six work terms.
Six work terms seem to be a LOT. Most coops I know is just a few summers.
The 32B parameter model size seems like the sweet spot right now, imho. It's large enough to be very useful (Qwen 2.5 32B and the Coder variant our outstanding models), and they run on consumer hardware much more easily than the 70B models.
I hope Llama 4 reintroduces that mid sized model size.
reply