The author doesn't explain (or is ignorant about) why this happens. These are special tokens that the model is trained on, and are part of its vocab. For example, here are the <think> and </think> tokens defined in the [Qwen3 tokenizer config](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507/bl...).
The model runtime recognizes these as special tokens. It can be configured using a chat template to replace these token with something else. This is how one provider is modifying the xml namespace, while llama.cpp and vllm would move the content between <think> and </think> tags to a separate field in the response JSON called `reasoning_content`.
I only did it once some 15 years back (in a happy memory) using LFS. It took about a week to get to a functional system with basic necessities. A code finetuned model can write a functional chat UI with all common features and a decent UX in under a minute.
Gemma 3 4B (QAT quant):
Yes, Paul Newman was indeed known to have struggled with alcohol throughout his life. While he maintained a public image of a charming, clean-cut star, he privately battled alcoholism for many years. He sought treatment in the late 1980s and early 1990s and was reportedly very open about his struggles and the importance of seeking help.
^^ I have vague memories of Clippy, but I remember it as obnoxious, often consuming the precious screen real estate on the low res monitors of the day, without offering anything valuable. But tooltips on the web with CSS and libraries like floating-ui can be much more compact, agile and barely noticeable.
I have tried showing help text in an internal app using tooltips when the user would hover over the target element (or show a small icon on touch devices), and the feedback was good as the tooltips were never in the way but easily available for help (accessibility for keyboard users needed some thinking, but for the limited audience for that app, it was not a problem). And while you're at it, may be make it more engaging than a simple text only tooltip (which can be done without any intelligence), and let the host to customize and offer complex workflows.
No it won't (most likely). VTracer (which the authors compare with) is fast, runs in browser via wasm, consumes way less resources and can even convert natural images very decently.
But the model seems cool for the usecase of prompt to logo or icon (over my current workflow of getting a jpg from flux and passing it through VTracer). I hope someone over at llama.cpp notices this (at least for the text-to-svg usecase, if not multimodal).
Author of VTracer here. Finally being able to comment on hackernews before the thread got locked.
Would be interested in learning about your workflow. Is it a logo generation app?
I feel like this is an example of "Machine learning is eating software". Raster to vector conversion is a perfect problem, because we can generate dataset of infinite sizes and can easily validate them with vectorize-rasterize roundtrips.
I did have an idea of performing tracing iteratively. Basically by adjusting the output SVG bit-by-bit until it matches the original image within a certain margin of error. And optimizing the output size of the SVG by simplifying curves if it does not degrade the quality. But VTracer in its current state is oneshot and probably uses 1/100 of the computational resources.
VTracer seems to perform badly on all the examples. I suspect it can be drastically improved simply by upscaling the image (via traditional interpolation, or machine learning based) and picking different parameters. But I am glad that it was cited!
Thanks for noticing this, and yes I have also noticed what you're pointing out, but workable for many use cases. I use this workflow for making images for marketing or web (so images are more artistic than photo realistic generations to begin with). Think of stuff you can find on undraw, but generated by image models from prompts. Then run them through VTracer. The reproductions are not perfect, but are often good enough (can be slow depending on how sharp you want the curves, and often very large file sizes as you mentioned). Then make any changes in inkscape and convert back to raster for publishing.
> logo generation app
For logo generation, I would actually prefer code gen. I thought of this problem when reading about the diffusion language models recent (if there is lots of training data available in form of text-vector-raster triplets).
On the contrary, I have found reasoning models (DS R1 mostly) to be very good at complex positioning and transition problems. They can't "visualize" anything, so can't design well unless you explain them well (which is the problem in design, most people have vague ideas but can't explain it well in CSS terms).
And geometry, which is high school level for the most part IMO but combined with cascading in the third (z) dimension. Introduce 'relative' with its own coordinate system and then do transforms in it (to be fair, its only complex in some cases like where the transformed parent is different from the relative parent). And then get into the time domain for transitions. Its math after all, but not the same that most programming courses teach.
Leaving the distortions from inflated and unrealistic expectations (case in point: people expecting evolution of AGI somehow have not yet well defined what AGI is), I also think that in the mid-long run the current state of LLMs will bloom an entire economy for migration of legacy apps to have conversational APIs. The same investors will then have a new gold rush to chase, as it always happen.
But that's the story of every entrepreneur. Whether you're the smartest builder or sales person, nobody goes big in a single attempt, you keep improving. An attempt doesn't necessarily mean changing businesses, but improvising until you find what works. What seemed implicit to me is that the OP is looking for responses from early career entrepreneurs.
The model runtime recognizes these as special tokens. It can be configured using a chat template to replace these token with something else. This is how one provider is modifying the xml namespace, while llama.cpp and vllm would move the content between <think> and </think> tags to a separate field in the response JSON called `reasoning_content`.