whenever I see a new model I always see if it can do engineering diagrams (e.g. "two square boxes at a distance of 3.5mm"), still no dice on this one. https://x.com/seveibar/status/1819081632575611279
Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)
It'll probably come suddenly. It has been fascinating to me watching the journey from Stable Diffusion 1 to 3. SD1 was a very crude model, where putting a word in the prompt might or might not add representations of the word to the image. Eg, using the word "hat" somewhere in the prompt might do literally nothing or suddenly there were hats everywhere. The context of the word didn't mean much to SD1.
SD2 was more consistent about the word appearing in the image. "hat" would add hats more reliably. Context started to matter a little bit.
SD3 seems to be getting a lot better at the idea of scene composition, so now specific entities can be prompted to wear hats. Not perfect, but noticeably improved from SD2.
Extrapolating from that, we're still a few generations from being able to describe things with the precision of an engineering diagram - but we're heading in the right direction at a rapid clip. I doubt there needs to be any specialist work yet, just time and the improvement of general purpose models.
Can’t you get this done via an LLM and have it generate code for mermaid or D2 or something? I’ve been fiddling around with that a bit in order to create flowcharts and datamodels, and I’m pretty sure I’ve seen at least one of those languages handle absolute positioning of object.
I have likewise been utterly unable to get it to generate images that look like preliminary rapid pencil sketches. Suggestions by experienced prompters welcome!
>> Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)
That seems like a good use for a speech driven assistant that know how to use PC desktop software. Just talk to a CAD program and say what you want. This seems like a long way off but could be very useful.
Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)