With a sufficiently advanced type system, you could lay out high level building ...

oblio · 2024-10-24T00:13:21 1729728801

> With a sufficiently advanced type system

Is this a brother or a cousin of the "sufficiently advanced compiler"? :-)

zahlman · 2024-10-24T01:53:06 1729734786

A component, if I correctly understand the proponents of such type systems.

MichaelBurge · 2024-10-24T00:35:01 1729730101

Anything on top of the Calculus of Constructions is usually enough. So it's not a moving target, and there are multiple implementations.

inopinatus · 2024-10-24T01:36:18 1729733778

if there is only one valid translation of the type constraints into executable code then what you have is a slow, expensive, and occasionally argumentative compiler

it merely remains to build a debugger for your Turing-complete type system, and the toolchain will be ready for production

Buttons840 · 2024-10-24T20:35:54 1729802154

> Is this a brother or a cousin of the "sufficiently advanced compiler"? :-)

I believe the claim was that a sufficiently advanced compiler could do a lot of optimization and greatly improve performance. Maybe my claim here will turn out the same.

tessierashpool9 · 2024-10-24T11:19:48 1729768788

it certainly is a sister or niece of the "sufficiently advanced technology".

fhdsgbbcaA · 2024-10-23T23:58:12 1729727892

Things I never want to hear about flight control systems before I board a plane: “if things compile there's a high chance it's correct”

literalAardvark · 2024-10-24T00:55:04 1729731304

Very odd comment, since that's exactly what you do want to hear

gloflo · 2024-10-24T05:56:15 1729749375

I'd rather hear: "The compiled code has gone through all tests in the comprehensive, human-expert-written, standardized test suite correctly"

Compiling does not differentiate between True and False, so no safety for that escape pod door.

literalAardvark · 2024-10-24T07:37:53 1729755473

I took that as part of the build process.

But I definitely want as much as possible to be automated and formally correct, which is why I wrote what I wrote.

xmprt · 2024-10-24T05:22:24 1729747344

There's a lot of code that compiles but isn't correct.

literalAardvark · 2024-10-24T07:45:07 1729755907

Because we're using languages with flexibility but no correctness, but the vast performance advantage AI programming has over us could be used to manage the formal proofs for a verified toolchain.

We're not quite there yet, but while regular programming is quite tough for AI due to how fuzzy it is, formal proofs are something AI is already very good at.

ziggyzecat · 2024-10-24T08:28:13 1729758493

if-then-high-chance logic is for sex education & prototypes, not for airplanes carrying insured passengers

skybrian · 2024-10-24T01:49:21 1729734561

The functions for which there's only one implementation are trivial examples. It's not going to work for anything even slightly more complicated, like a function that returns a float.

Even if you could, you probably wouldn't want to make any change a breaking change by exposing implementation details.

rq1 · 2024-10-24T07:13:03 1729753983

I did just that actually to:

* build a codegen for Idris2 and a rust RT (a parallel stack "typed" VM)

* a full application in Elm, while asking it to borrow from DT to have it "correct-by-construction", use zippers for some data structures… etc. And it worked!

* Whilst at it, I built Elm but in Idris2, while improving on the rendering part (this is WIP)

* data collators and iterators to handle some ML trainings with pausing features so that I can just Ctrl-C and continue if needed/possible/makes sense.

* etc.

At the end I had to rewrite completely some parts, but I would say 90% of the boring work was correctly done and I only had to focus on the interesting bits.

However it didn’t deliver the kind of thorough prep work a painter would do before painting a house when asked for. It simply did exactly what I asked, meaning, it did the paint and no more.

(Using 4o and o1-preview)

tharant · 2024-10-24T21:25:58 1729805158

I keep seeing folks who say they’ve built a “full application” using or deeply collaborating with an LLM but, aside from code that is only purported to be LLM-generated, I’ve yet to see any evidence that I can consider non-trivial. Show me the chat sessions that produced these “full applications”.

simonw · 2024-10-24T21:31:21 1729805481

How about this one? https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

That's for this: https://tools.simonwillison.net/ocr

tharant · 2024-10-24T22:32:05 1729809125

An application powered by a single source file comprised of only 400 lines of code is, by my definition, trivial. My needs are more complex than that, and I’d expect that the vast majority of folks who are trying to build or maintain production quality revenue generating code have the same or similar needs.

Again, please don’t be offended; what you’re doing is great and I dearly appreciate you sharing your experience! Just be aware that the stuff you’re demonstrating isn’t (hasn’t been, for me at least) capable of producing the kind of complexity I need while using the languages and tooling required in my environment. In other words, while everything of yours I’ve seen has intellectual and perhaps even monetary value, that doesn’t mean your examples or strategies work for all use-cases.

simonw · 2024-10-24T23:06:14 1729811174

LLMs are restricted by their output limits. Most LLMs can output 4096 tokens - the new Claude 3.5 Sonnet release this week brings that up to 8196.

As such, for one-shot apps like these there's a strict limit to how much you can get done purely though prompting in a single session.

I work on plenty of larger projects with lots of LLM assistance, but for those I'm using LLMs to write individual functions or classes or templates - not for larger chunks of functionality.

tharant · 2024-10-25T02:10:49 1729822249

> As such, for one-shot apps like these there's a strict limit to how much you can get done purely though prompting in a single session.

That’s an important detail that is (intentionally?) overlooked by the marketing of these tools. With a human collaborator, I don’t have to worry much about keeping collab sessions short—and humans are dramatically better at remembering the context of our previous sessions.

> I work on plenty of larger projects with lots of LLM assistance, but for those I'm using LLMs to write individual functions or classes or templates - not for larger chunks of functionality.

Good to know. For the larger projects where you use the models as an assistant only, do the models “know” about the rest of the project’s code/design through some sort of RAG or do you just ask a model to write a given function and then manually (or through continued prompting in a given session) modify the resulting code to fit correctly within the project?

simonw · 2024-10-25T03:28:04 1729826884

There are systems that can do RAG for you - GitHub Copilot and Cursor for example - but I mostly just paste exactly what I want the model to know into a prompt.

In my experience most of effective LLM usage comes down to carefully designing the contents of the context.

tharant · 2024-10-25T05:40:48 1729834848

> There are systems that can do RAG for you

My experience with Copilot (which is admittedly a few months outdated; I never tried Cursor but will soon) shows that it’s really good at inline completion and producing boilerplate for me but pretty bad at understanding or even recognizing the existence of scaffolding and business logic already present in my projects.

> but I mostly just paste exactly what I want the model to know into a prompt.

Does this include the work you do on your larger projects? Do those larger projects fit entirely within the context window? If not, without RAG, how do you effectively prompt a model to recognize or know about the relevant context of larger projects?

For example, say I have a class file that includes dozens of imports from other parts of the project. If I ask the model to add a method that should rely upon other components of the project, how does the model know what’s important without RAG? Do I just enumerate every possible relevant import and include a summary of their purpose? That seems excessively burdensome given the purported capabilities of these models. It also seems unlikely to result in reasonable code unless I explicitly document each callable method’s signature and purpose.

For what it’s worth, I know I’ve been pretty skeptical during our conversations but I really appreciate your feedback and the work you’ve been doing; it’s helping me recognize both the limitations of my own knowledge and the limitations of what I should reasonably expect from the models. Thank you, again.

simonw · 2024-10-25T05:54:06 1729835646

Yes, I paste stuff in from larger projects all the time.

I'm very selective about what I give them. For example, if I'm working on a Django project I'll paste in just the Django ORM models for the part of the codebase I'm working on - that's enough for it to spit out forms and views and templates, it doesn't need to know about other parts of the codebase.

Another trick I sometimes use is Claude Projects, which allow you to paste up to 200,000 tokens into persistent context for a model. That's enough to fit a LOT of code, so I've occasionally dumped my entire codebase (using my https://github.com/simonw/files-to-prompt/ tool) in there, or selected pieces that are important like the model and URL definitions.

anonzzzies · 2024-10-24T12:43:33 1729773813

any code to see? I like Idris so this would be interesting.

gabigrin · 2024-10-28T09:33:27 1730108007

That's one take I'm trying to take with https://www.flyde.dev. Build that higher-level abstraction that allows LLM to shine and us humans to enjoy a more effective way to view and control its outcome.

I know it's not what you meant, but tbh I don't think going deeper than Haskell on type-system power is the route to achieve mass adoption.

runeks · 2024-10-25T07:04:48 1729839888

> With a sufficiently advanced type system, you could lay out high level building blocks, such as function definitions, and let the LLM go wild, and then if things compile there's a high chance it's correct. Or maybe even a formal proof things are correct.

A formal proof is simply a type that allows only the correct implementation at the term level. And specifying this takes ages, as in hundreds of times longer than just writing it out. So you want to use an LLM to save this 1% of time after you've done all the hard work of specifying what the only correct implementation is?

halfmatthalfcat · 2024-10-23T23:58:42 1729727922

I'm not sure how much things have changed but I tried to use GPT-4 when it first came out to build a library on top of Scala + Shapeless and it utterly failed. Unless you can somehow wire in an LSP per language as an agent and have it work through the type errors as it tries to create code, I can't see where we'll ever get to a place where LLMs can work with strongly typed languages and produce compliant code.

Even with, the aforementioned "have an LSP agent work through type errors", it may be faster to just do it yourself than wait for an LLM to spit out what may be correct.

nycdatasci · 2024-10-24T00:42:29 1729730549

Definitely try Claude 3.5 Sonnet and o1-preview. They have succeeded for me where other models have failed. Also, use Cursor IDE.

mlhpdx · 2024-10-24T05:22:11 1729747331

Oh, I’ve found Claude 3.5 to be better but still pointless. To be specific, it generates code that does what I ask (roughly) but…

- Has obvious bugs, many at runtime - Has subtle bugs. - Is inefficient.

All of which it will generally fix when asked. But how useful is it if I need to know all the problems with its code beforehand? Then it responds with the same-ish wrong answer the next time.

Still a long way to go IMO.

csomar · 2024-10-24T06:00:01 1729749601

Claude 3.5 is good, however, it's "Type comprehension" is really basic. That was my experience with it when using Rust. It still can't create an internal mental model of the types and how to link them together. It'll also, at some points, start to heavily hallucinate functions and stuff.

cft · 2024-10-24T01:54:40 1729734880

o1-mini is even higher in coding benchmarks than o1-preview. That has been my experience also.

tharant · 2024-10-24T21:36:25 1729805785

Glad to know I’m not the only one who has had difficulties getting reasonable Scala code from an LLM. While I agree that LSP integration would likely help with generating or debugging production-quality code, I’d also suggest that the need for an LSP demonstrates these models’ inability to reason—at least about code. Perhaps they were trained more deeply and carefully on the marketing or “getting started” pages for the libraries and languages I need to use and not as much on examples of production-quality code repositories that make use of those languages/libs/tools.

rapind · 2024-10-24T04:48:33 1729745313

Oh this is an interesting take. Haskell, F#, possibly Elm, etc.

szundi · 2024-10-24T05:33:02 1729747982

You know people work with what they work with.

TechDebtDevin · 2024-10-24T05:32:12 1729747932

I'm sort of playing around with something like this in go for fun.