whenever I see a new model I always see if it can do engineering diagrams (e.g. ...

roenxi · on Aug 2, 2024

It'll probably come suddenly. It has been fascinating to me watching the journey from Stable Diffusion 1 to 3. SD1 was a very crude model, where putting a word in the prompt might or might not add representations of the word to the image. Eg, using the word "hat" somewhere in the prompt might do literally nothing or suddenly there were hats everywhere. The context of the word didn't mean much to SD1.

SD2 was more consistent about the word appearing in the image. "hat" would add hats more reliably. Context started to matter a little bit.

SD3 seems to be getting a lot better at the idea of scene composition, so now specific entities can be prompted to wear hats. Not perfect, but noticeably improved from SD2.

Extrapolating from that, we're still a few generations from being able to describe things with the precision of an engineering diagram - but we're heading in the right direction at a rapid clip. I doubt there needs to be any specialist work yet, just time and the improvement of general purpose models.

fennecbutt · on Aug 6, 2024

At a rapid clip is a great unintentional pun here.

napoleongl · on Aug 1, 2024

Can’t you get this done via an LLM and have it generate code for mermaid or D2 or something? I’ve been fiddling around with that a bit in order to create flowcharts and datamodels, and I’m pretty sure I’ve seen at least one of those languages handle absolute positioning of object.

seveibar · on Aug 1, 2024

it usually isn't accurate. LLMs generally have very little spatial awareness.

zellyn · on Aug 1, 2024

I have likewise been utterly unable to get it to generate images that look like preliminary rapid pencil sketches. Suggestions by experienced prompters welcome!

phkahler · on Aug 1, 2024

>> Would love to see an AI company attack engineering diagrams head on, my current hunch is that they just aren't in the training dataset (I'm very tempted to make a synthetic dataset/benchmark)

That seems like a good use for a speech driven assistant that know how to use PC desktop software. Just talk to a CAD program and say what you want. This seems like a long way off but could be very useful.

csomar · on Aug 2, 2024

> https://fal.media/files/kangaroo/FwO3j7xFIgpIXepqKDj6h.png

Prompt: two square boxes at a distance of 3.5mm. Both boxes have the same size, 10cm.

9dev · on Aug 2, 2024

It’s like spelling out wishes for a djinn—better be crystal clear what you’re thinking of…

lovethevoid · on Aug 1, 2024

I hope you find manual tagging of diagrams interesting, as that is what you'll be doing a lot of!