Each gpu_* call emits SPIR-V and dispatches via Vulkan
compute. Data stays resident in VRAM between calls — no
round-trips to CPU unless you need the result.
No thread_id exposed. The runtime handles thread indexing
internally — gpu_add(a, b) means "one thread per element,
each does a[i] + b[i]." Workgroup sizing and dispatch
dimensions are automatic.
The tradeoff: you can't write custom kernels with shared
memory or warp-level ops. OctoFlow targets the 80% of
GPU work that's embarrassingly parallel. For the other
20% you still want CUDA/Vulkan directly.
"low hanging" is relative. At least from my perspective. A significant part of my work involves cleaning up structured and unstructured data.
An example: More than ten years ago a friend of mine was fascinated by the german edition of the book "A Cultural History of Physics" by Károly Simonyi. He scanned the book (600+ pages) and created a PDF (nearly) same layout.
Against my advice he used Adobe tools for it instead of creating an epub or something like DocBook.
The PDF looks great, but the text inside is impossible to use as training data for a small LLM. The lines from the two columns are mixed and a lot of spaces are randomly placed (makes it particularly difficult because mathematical formulas often appear in the text itself).
After many attempts (with RegEx and LLMs), I gave up and rendered each page and had a large LLM extract the text.
Even the documentation search is available:
```bash
/Applications/Wolfram.app/Contents/MacOS/WolframKernel -noprompt -run '
Needs["DocumentationSearch`"];
result = SearchDocumentation["query term"];
Print[Column[Take[result, UpTo[10]]]];
Exit[]'
```