Hacker Newsnew | past | comments | ask | show | jobs | submit | more snake_doc's commentslogin

These just seems like over engineered solutions trying to guarantee their job security. When the dataflows are so straight forward, just replicate into pick your OLAP, and transform there.


I came from traditional engineering into data engineering by accident and had a similar view. But every time I tried to make a pipeline from first principles it always eventually turned out something like this for a reason. This is especially true when trying to bridge many teams and skillsets - everyone wants their favourite tool.


our approach wasn’t about over-engineering, we were trying to leverage our existing investments (like Confluent BYOC) while optimizing for flexibility, cost, and performance. We wanted to stay loosely coupled to adapt to cloud restrictions across multiple geographic deployments.


What is the current state of the art (open source) when doing oltp to olap pipelines in these days? I don’t mean a one-off etl style load at night but a continuous process with relatively low latency?


Idk what the state of the art is, but I’ve used change data capture with Debezium and Kafka, sink’d into Snowflake. Not sure Kafka is the right tool as you don’t need persistence, and having replication slots makes a lot of operations (eg DB engine upgrade) a lot harder though.


I would say clickhouse


Hmm weren’t there also supposed to be the SM re-allocation, doesn’t look like it was included; I may have been mis-remembering the explanation.


Holy unnecessary use of terminology to explain a reverse graph traversal. “Loss”, “gradients”, “differentiating”— no! stop!

This must be what AI hype actually is. Complete incoherent language to explain a very straight forward concept.

This is just: LLMs judging intermediate node outputs, and reverse traversing the graph while doing so until it modifies the original prompt.


i am old enough to remember the opposite: people would try to sell deep learning to the mainstream ML community by pointing out that backprop is just message-passing on a Bayesian network with modified sum/product operations.


Valid point, but at least that was mathematics. This paper isn’t even math, it’s a data control flow masquerading as math.


Interestingly, backpropagation is the natural way I'd describe this process, not reverse graph traversal.

Background difference I suppose.

> This must be what AI hype actually is. Complete incoherent language to explain a very straightforward concept.

True, a lot of papers overdo the jargon just for hype purposes. My favorite funniest example is this one from Google Research (and universities) (have linked the paper review video below)

https://youtu.be/Pl8BET_K1mc

See the YouTube chapter about "Multidiffusion" (around 38minutes)

They spent multiple paragraphs formulating an "optimisation problem" which when peeled down amounts to taking the mean, just to be able to superficially cite their own previous paper.

Quite the sorry state of things.


Without taking a position on unipolar vs. multi-polar:

Dario makes an astounding implicit assumption:

- China originating labs cannot acquire chips providing 80-90% similar utility without the US within the next 2-3 years.

I'll make an observation, re: DeepSeek's incentives that drove them to create the innovations from the V2 and V3 papers.

DeepSeek, compared to American AI labs, are much more compute constrained, but in a unique way. Their chips are more memory bandwidth constrained (depending on type anywhere from 50% to 80% less bandwidth).

Therefore, each dollar/hour of investment towards memory optimization is worth MORE to DeepSeek than to American labs.

In the V2/3 paper, they've demonstrated exactly that with these memory optimization techniques.

1. MLA -> reduces KV cache by nearly 80% compared to GQA. By the way, this was published in V2 in May 2024.

2. FP8 matmul (while still accumlating in FP32 gradients) without losing significant quality.

3. DualPipe scheduling and reworking of Hopper SM's allocation on communication vs. computation -> DeepSeek's V3 paper has 2 full pages of hardware suggestions for "hardware designers" (read NVIDIA)

Export controls in a global market create different incentives in parties. The resulting incentives will change, and agents (using it as an traditional economics term) will change their capital allocation strategy.


You should still be mocked.

1. ChatGPT data is widely on the internet, just google Sharegpt dataset and you can scrap 200k+ conversations with a few stroke of huggingface commands. These were then used by the open source community like Vicuña models, there was a period of several months in the open source community where RLAIF was all the rage; so this data populated the internet. So if a company is crawling and scraping the internet, this will eventually be in the dataset.

2. The v3 deepseek model was trained on 15T tokens. Please educate yourself and calculate how long (in latency, inference for 1k token output will take almost 30seconds) and cost it would be to extract 15T tokens from ChatGPT / Azure API. Granted API accounts all have spend limits, and will trip fraud detection on OAI billing, how long would the subterfuge had to take place? With which model? At what time? Wouldn’t they have to keep repeating this for subsequent generation of OAI models?

3. OAI didn’t invent MLA, they didn’t invent multi token prediction with disconnected ROPE, they didn’t invent FP8 matmul training dynamics (while accumulating in FP32) without losing significant quality.

So go away


#1 is a valid and important point, that would explain the model name issue legitimately, and on that I am duly mocked.

#2 You wouldn't want to extract all 15T tokens by API, as it wouldn't be desirable to have that as your only source of ground truth. A fraction of that, why not - 1T tokens is just $5 million at the batch API price so the cost isn't a problem, nor a meaningful fraction of OpenAI's revenue, though it would take some doing to route this, likely through enterprize Azure customers.

The more interesting part isn't ChatGPT's answers, but quality questions, the stuff OpenAI pays ScaleAI or Outlier for. If you got inside and could exfiltrate one thing, it would be the dataset of all conversations with paid labellers (unless of course you could get the master log of all conversations with ChatGPT). Even the weights aren't as useful as that to a replication effort.

#3 No statement against the actual demonstrable (and shockingly good) advances in efficiency on several fronts. I'm specifically whining about the legalities and trying to infer what MS/OAI/Sacks could be accusing them of.


@dang please link to either the GitHub https://github.com/Jiayi-Pan/TinyZero

or the primary source twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655


Thanks, I just commented on their sub to give credit!


Commented where? I don't see any source links in the substack article.


Go to the bottom, you'll find my comment with the links: my name is Elie Berreby.


The other way is certainly also true. Your short piece is rational, but lacks insight into the inference and training dynamics of ML adoption unconstrained.

The rate of ML progress is spectacularly compute constrained today. Every step in today’s scaling program is setup to de-risked the next scale up, because the opportunity cost of compute is so high. If the opportunity cost of compute is not so high, you can skip the 1B to 8B scale ups and grid search data mixes and hyperparameters.

The market/concentration risk premium drove most of the volatility today. If it was truly value driven, then this should have happened 6 months ago when DeepSeek released V2 that had the vast majority of cost optimizations.

Cloud data center CapEx is backstopped by their growth outlook driven by the technology, not by GPU manufacturers. Dollars will shift just as quickly (like how Meta literally teared down a half built data center in 2023 to restart it to meet new designs).


Everyone can say things that sound smart. When it comes to markets the only thing that matters is if your portfolio was green or red.


Since when gambling on a RNG output makes you smart?


It seems to me like both of you are saying same thing.


My entry into Nvidia is 2016, my portfolio has never been red since then.


No, the only thing that matters is if the portfolio delivers returns in excess of your cost of capital.

If your portfolio is green, you can still be a poor performer.


Okay, I'll help him humble brag:

Bryan is in his last year of high school.

</end>

Keep building!


This is incredible work for anyone, let alone a high schooler. Seriously impressive!

I hope this turns into something I can buy (maybe a diy kit), in the future!


Thanks! I've been considering it (or enough detailed instructions to build one) since starting the project. I need to get a working model first though ;)


We are going full circle, Woz will be proud.


You study quantum mechanics in High School in the USA?


We discussed wave functions, probability, fermions/bosons, did calculations for particle in a box, the Schrödinger model, and went just up to deriving the hydrogen atom. Nothing super fancy, but it was one heck of an experience!


But did you win the Putnam?


For those who don't know, stackghost is referencing this classic moment on HN:

https://news.ycombinator.com/item?id=35079


I wonder how @sanj feels about their moment of fame (they’re still active on HN).


As I've pointed out before, his concession at https://news.ycombinator.com/item?id=35350 was both witty and graceful. It's great that he's still active here! and anyway he's done a ton of things that are a lot more important than that bit.


I hope that’s in the highlights :)


It's really interesting, in the UK I don't think we did (but I later studied Physics at university) - but we did have Further Maths which covered more advanced mathematics.

Also your project is incredible btw, maybe look into robotics too.


> in the UK I don't think we did

Perhaps you didn't go to a high school quite like this one: https://exeter.edu/admissions/financial-aid/tuition-costs/


> High school quantum theory

> Nothing super fancy

Yeah, that's college level stuff, it's pretty fancy for high school, you go to a nice place :)


Some do- He thanks Phillips Exeter at the bottom of the project page, which is a very fancy private highschool, probably the best in the US.


I went to a peer school that had at least a couple of math teachers with PhDs—my friends at the time who took their classes were, if I recall, nationally competitive in math olympiads.


It's more possible than you'd think! The options are basically:

- Go to a fancy private school like Phillips Exeter

- Really luck out and get into a great public STEM magnet school

- Homeschool and take private classes / have very smart parents


All I did was provide him the space and time to work on the project ... his parents funded the entire project, but will get reimbursement soon. It's the great minds, and the desire to have meaningful projects that make Exeter such an awesome place. Byran is one of a kind!!!!


Oh, or:

- Concurrently enroll at a community college (a really great option that I think every country should have)


I tested out of high school and went to community college instead, one of the best decisions of my life.


Some public schools in very wealthy counties will teach some basic quantum mechanics in honors/AP classes, too. All you have to do is acquire parents that can afford the shittiest neighborhood in those districts!


They did in mine in the Netherlands. Also electronics and programming (this was a long time ago so it was all pretty new); it was a special class to prep for university more than the regular curriculum does, but it was a public school and not even a very good one; just a few really good and switched on teachers (physics, math and chem).


Can't tell if this is sarcasm.


The community college option is available to anyone who’s willing to spend a couple evenings a week taking classes, so I don’t think it’s really that out of reach. Most countries don’t offer their high school students any opportunity to study material that advanced.


Your first 3 options are mostly “be born to the right parents”. So I couldn’t tell if your remark of “it’s more possible than you’d think!” Was serious or not.

Hell I went to a really selective school. But even then, within that the top students, whom I was not one, got to do some extra stuff that would have greatly interested me and I would have been able to do. But my grades in humanities weren’t good enough to be one of the best.


Community college course options often won't include quantum mechanics.


In the high school in Poland I attended, I lucked into being in a class with a university TA assigned as physics teacher, and he did manage to sneak in QM - more-less the same stuff as 'Hello9999901 listed in their reply.

(He also taught us differentiation in the first semester, and basic integrals in the second, because as he said, you cannot learn physics properly without those tools. This annoyed the heck out of our math teacher; she ended up deciding that, if we're learning this anyway, we might as well learn it properly - and gave us a much heavier intro to calculus in the last months of the last year.)


The USA has some great schools. OP goes to Phillips Exeter Academy, which is an exclusive private school that ranks among the best high schools in the country.


Not all high schools but the US has some schools which allow you to take very advanced material / even get a head start on your college credits.


We had a cursory introduction at least about 15 years ago in Germany, it's not that far off.


You don’t study basics of QM in your high schools?


HOF HN post.


Care to explain?


I think the message means that this post is worthy of a Hall of Fame on HackerNews.


^


Attention is just communication? It’s orthogonal to the space of the representation.


Email notice sent Friday Dec 20, 2024 requiring 30 days notice to cancel. Addendum link also intentionally not hyperlinked in email notice.

We are writing to inform you of a change to our subscription terms. To ensure a smooth transition, we have updated our Subscription Addendum (available at https://ramp.com/legal/subscription-addendum) to implement a new process for customers on annual plans to cancel automatic renewal on Ramp subscription plans.

Key changes: To request cancellation, you must give written notice to your account manager. Cancellation requests must be submitted at least 30 days before your next renewal date. The changes above will take effect on December 13, 2024. If you have any questions, please contact your account manager.

Best, The Ramp Team


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: