Hacker Newsnew | past | comments | ask | show | jobs | submit | campers's commentslogin

I had wondered if they run their inference at high batch sizes to get better throughput to keep their inference costs lower.

They do have a priority tier at double the cost, but haven't seen any benchmarks on how much faster that actually is.

The flex tier was an underrated feature in GPT5, batch pricing with a regular API call. GPT5.1 using flex priority is an amazing price/intelligence tradeoff for non-latency sensitive applications, without needing to extra plumbing of most batch APIs


I’m sure they do something like that. I’ve noticed azure has way faster gpt 4.1 than OpenAI


OpenAI and Nvidia's 10GW datacenter agreement and Sam Altmans new blog post wanting to build 1GW of infra a week made me think of this article again and to post it up.

Here's a couple of interesting paragraphs from it.

  Instead of the transistor, the basic element in superconducting logic is the Josephson-junction.
  
  For logic, a Josephson-junction loop without a persistent current indicates a logical 0, while a loop with one single flux quantum’s worth of current represents a logical 1. For memory, two Josephson junction loops are connected together. An SFQ’s worth of persistent current in the left loop is a memory 0, and a current in the right loop is a memory 1.
  
  In classical CMOS-based technology, it is very challenging to stack computational chips on top of each other because of the large amount of power, and therefore heat, that is dissipated within the chips. In superconducting technology, the little power that is dissipated is easily removed by the liquid helium. Logic chips can be directly stacked using advanced 3D integration technologies resulting in shorter and faster connections between the chips, and a smaller footprint.
  
   It is also straightforward to stack multiple boards of 3D superconducting chips on top of each other, leaving only a small space between them. We modeled a stack of 100 such boards, all operating within the same cooling environment and contained in a 20- by 20- by 12-centimeter volume, roughly the size of a shoebox. We calculated that this stack can perform 20 exaflops (in BF16 number format), 20 times the capacity of the largest supercomputer today. What’s more, the system promises to consume only 500 kilowatts of total power. This translates to energy efficiency one hundred times as high as the most efficient supercomputer today.


I added a key rotator to my AI coder, and asked a couple of friends to make keys for me. That helped code a good chunk of http://typedai.dev when 2.5 Pro came out


Google actually does provide that service! https://cloud.google.com/vertex-ai/generative-ai/docs/model-...


Looking forward to using this on Cerebras!


I've been thinking about adding in an agent to our Codex/Jules like platform which goes through the git history for the main files being changed, extracts the Jira ticket ID's, look through them for additional context, along with the analyzing the changes to other files in commits.


There is a huge focus on training the LLMs to reason, that ability will slowly (or not that slowly depending on your timeframe!) but surely improve in the AI models given the gargantuan amount of money and talent being thrown at the problem. To what level we'll have to wait and see.


There isn't a TypeScript runtime, it is just a JavaScript/ECMAScript compiler/transpiler with a type checking and language server


The price will come down over time as they apply all the techniques to distill it down to a smaller parameter model. Just like GPT4 pricing came down significantly over time.


I get the same feeing when I first looked at the LangChain documentation when I wanted to first start tinkering with LLM apps.

I built my own TypeScript AI platform https://typedai.dev with an extensive feature list where I've kept iterating on what I find the most ergonomic way to develop, using standard constructs as much as possible. I've coded enough Java streams, RxJS chains, and JavaScript callbacks and Promise chains to know what kind of code I like to read and debug.

I was having a peek at xstate but after I came across https://docs.dbos.dev/ here recently I'm pretty sure that's that path I'll go down for durable execution to keep building everything with a simple programming model.


Kind of similar camp, I checked LangChain and others and ultimately I was like, well, it's not really doing much is it, just adding abstraction on top of what is essentially basic loops and conditional statements, and tbh it feels like in nearly every case I'll never be using them the same way such that some abstraction will help over just making some function helpers myself.

I don't think from first principles there's any broad framework that makes sense to be honest. I'll reach for a specific vector DB, or logging library, but beyond that you'll never convince me your "query-builder" API is going to make me build a better thing when I have the full power of TypeScript already.

Especially when these products start throwing in proprietary features and add-ons with fancy names on top.


TypedAI looks solid, was not aware of it! Bookmarked for further research.

Personally I am not fond of the decorator approach and decided to not use it in pgflow (my soon-to-be-released workflow orchestration engine on top of Postgres).

1. I wanted it to be simple to reason about and explicit (being more verbose as a trade-off)

2. There are some issues with supporting decorators (Svelte https://github.com/sveltejs/svelte/issues/11502, and a lot of others).

3. I decided to only support directed acyclic graphs (no loops!) in order to promote simplicity. Will be supporting conditional recursive sub-workflows to provide a way to repeat some steps and be able to branch.

Cheers!


Can dbos work with CF durable objects?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: