More

mlochbaum · on Oct 6, 2024

The most uncompromisingly APL-ish code I've written is the BQN compiler[0]. Hard to write, hard to extend, hard to refactor. I generally recommend against writing this way in [1]. But... it's noticeably easy to debug. There's no control flow, I mean, with very few exceptions every line is just run once, in order. So when the output is wrong I skim the comments and/or work backwards through the code to find which variable was computed wrong, print stuff (possibly comparing to similar input without the bug) to see how it differs from expectations, and at that point can easily see how it got that way.

The compiler's whole state is a bunch of integer vectors, and •Show [a,b,c] prints some equal-length vectors as rows of a table, so I usually use that. The relevant code is usually a few consecutive lines, and the code is composed of very basic operations like boolean logic, reordering arrays with selection, prefix sum, and so on, so they're not hard to read if you're used to them. There are a few tricks, which almost all are repeated patterns (e.g. PN, "partitioned-none" is common enough to be defined as a function). And fortunately, the line prefaced with "Permutation to reverse each expression: more complicated than it looks" has never needed to be debugged.

Basically, when you commit to writing in an array style (you don't have to! It might be impossible!) you're taking an extreme stance in favor of visible and manipulable data. It's more work up front to design the layout of this data and figure out how to process it in the way you want, but easier to see what's happening as a result. People (who don't know APL, mostly) say "write only" but I haven't experienced it.

[0] https://github.com/mlochbaum/BQN/blob/master/src/c.bqn

[1] https://mlochbaum.github.io/BQN/implementation/codfns.html#i...

upghost · on Oct 6, 2024

God bless, my hat goes off to you sir. I have trouble wrapping my head around the concept of first class functions in ndarrays, let alone implementing it in hardcore APL. That has to be a feat on par with Hsu's Co-Dfns.

Don't suppose you can point to any resources to help wrap your head around BQN, do you?

mlochbaum · on Oct 6, 2024

Well this is pretty much the goal of the BQN website so my best attempts are there. I might point to the quick start page https://mlochbaum.github.io/BQN/doc/quick.html as a way to feel more comfortable with the syntax right away. And the community page https://mlochbaum.github.io/BQN/community/index.html collects links by others; Sylvia's blog in particular focuses on the sorts of flat array techniques that are useful for a compiler.

BoiledCabbage · on Oct 8, 2024

While I've seen BQN mentioned previously on pages that discuss APL, K & J I finally took a look at it.

I've got to say, it's a really impressive language. Very well thought through, it brings some nice ideas. And as someone still newer to the space, it seems to do a great job of eliminating some of the unnecessary complexity of other languages. The straightforward approach on syntax / parsing is really fresh air.

upghost · on Oct 6, 2024

Just looked at the github -- wait, you wrote BQN? My God. Is there any prior art on this -- arraylangs with first class functions? I don't think very many people realize how incredible the semantic power of BQN is. The idea of an arraylang with first class functions... it truly staggers the imagination.

I feel like if I were able to wrap my head around it I would never want to code in anything else. Thanks again and excited to take another look at it!

mlochbaum · on Oct 6, 2024

K, for a start. Whitney's earlier dialect A+ too. See https://aplwiki.com/wiki/First-class_function .

smabie · on Oct 6, 2024

Don't most array languages have first class functions?

upghost · on Oct 6, 2024

They have functions but not first class functions. Think (the ability to make) a vector/matrix of functions rather than just numbers :O

What could you do with that?

I don't know, but I bet some pretty cool stuff.

bear8642 · on Oct 6, 2024

> What could you do with that?

Just a couple quick ideas:

    fns ← f g h         ⍝ array of function

    fns[condition] args ⍝ select function to run
    (3 0 0⌿fns)    args ⍝ f f f args

notfed · on Oct 6, 2024

What is this witchcraft? I fear that I have seen something that I cannot unsee...

mlochbaum · on July 10, 2024

Huh. This is the commit that introduces the register model: https://github.com/Co-dfns/Co-dfns/commit/f89919144f22441f21...

In the compiler, it's working with dense adjacency matrix representations, so this will be at best O(n^2) in whatever the relevant sort of node is (expressions? functions?). "at present is not optimized" seems a bit of an understatement here: I've never heard anything about these sorts of graph algorithms being possible with good asymptotics in an array style. In practice, I think any value of Co-dfns would be more in the fact that it emits GPU code than that it does it quickly, but this calls into question the value of writing it in APL, I would say (for what it's worth, I believe Co-dfns is still not able to compile itself).

The phrase "allocate all intermediate arrays" seems fairly confusing: what's actually being allocated must be space for the pointers to these arrays and other metadata, not the data of the array itself. As the data is variable-size, it can't be fully planned out at compile time, and I'm fairly sure the register allocation is done when there's not enough shape or type information to even make an attempt at data allocation. This change can only improve constant overhead for primitive calls, and won't be relevant to computations where most of the work is on large arrays. I think it's helpful for Co-dfns in a practical sense, but not too exciting as it only helps with bad cases: if this is important for a user then they'd probably end up with better performance by switching to a statically-typed language when it comes time to optimize.

It does mention that "For most small array values, we will use static allocation types, which prevents the need for allocating additional space for an array above and beyond its header.", so small arrays are taken care of by register allocation. Tradeoff there between being able to fit larger arrays in this space versus wasting space for values that don't use the full header.

Constant lifting within the compiler is pretty cool, I'll have to look into that.

moonchild · on July 10, 2024

> I've never heard anything about these sorts of graph algorithms being possible with good asymptotics in an array style

yeah—idk graph algos really, but have heard parallelising combinatorial search in general (eg sat) is hard because forks and joins happen heterogeneously and erratically. this 2001 vintage has a bunch of completely sequential graph colouring algorithms in j https://dl.acm.org/doi/pdf/10.1145/570406.570416 (and j at least has sparse matrices!)

> Constant lifting within the compiler is pretty cool, I'll have to look into that.

hrm, it seems to refer to 2 things: 1) constants allocated in a special space; 2) interning. 2 is obviously worthwhile; 1 i would guess is related to memory being weird on the gpu?

mlochbaum · on June 26, 2024

These are written in a generally basic and clean style (avoiding tacit programming which is sometimes considered hard to understand, e.g. function {s↑⍺↓⍵} instead of the train (s↑↓)). They're nice to read and I'd have no trouble maintaining them. You just don't know the language.

conv looks like the hardest. s←1+(⍴⍵)-⍴⍺ is the result shape, number of subarrays with the length of ⍺ that will fit in ⍵ in each dimension. Looks like (⍳⍴⍺){s↑⍺↓⍵}¨⊂⍵ is missing the ¨; the inner function s↑⍺↓⍵ drops ⍺ elements (left argument) and then takes the first s, so it gets a length-s window of ⍵. This is called on each possible index into ⍺, together with the whole of ⍵, so it ends up getting all such windows. Presumably s is expected to be larger than ⍴⍺ so this is the more efficient way to slice things. ⊃+/,⍺× multiplies by ⍺ and sums, ravelling with , before applying +/ to collapse the dimensions and sum them all at once. Each element is an array of shape s, so summing them gives a result of shape s. There you go, multidimensional convolution!

lbatista · on June 27, 2024

Thanks for pointing the missing glyph. I will paste the correct code.

  backbias←{+/,⍵}
  logistic←{÷1+*-⍵}
  maxpos←{(,⍵)⍳⌈/,⍵}
  backavgpool←{2⌿2/⍵÷4}⍤2
  meansqerr←{÷∘2+/,(⍺-⍵)*2}
  avgpool←{÷∘4{+/,⍵}⌺(2 2⍴2)⍤2⊢⍵}
  conv←{s←1+(⍴⍵)-⍴⍺⋄⊃+/,⍺×(⍳⍴⍺){s↑⍺↓⍵}¨⊂⍵}
  backin←{(d w in)←⍵⋄⊃+/,w{(⍴in)↑(-⍵+⍴d)↑⍺×d}¨⍳⍴w}
  multiconv←{(a ws bs)←⍵⋄bs{⍺+⍵ conv a}⍤(0,(⍴⍴a))⊢ws}

mlochbaum · on June 19, 2024

I tried using search engines to find whether the fold function in Joy, a vastly better-known language, takes an initial element or not. Best I got, after some fiddling with search terms, was a Joy tutorial that did have an example of fold. Certainly much slower than using language documentation. Yes, this choice will prevent F from being the next Elixir or Kotlin, but it's hardly relevant to the usability of a niche language like this.

mlochbaum · on June 19, 2024

(I think we got off the rails a bit: the linked article looks to be ASCII-only! But I feel the need to defend myself...)

This is a bit of a change-up from the usual "write-only language" complaint! But it does have the benefit of accuracy: yes, if you know the language, the time savings in writing APL are smaller than the savings in reading it. Both are faster, once you've spent a week or two with the keyboard, since even the slow method of typing a prefix character followed by a letter is faster than typing out a keyword in another language.

Array languages are rarely "product"s. While I do offer it to others for free, I make the language I want to use, and having nice-to-read symbols for a one-time tooling cost is a nice convenience for me. Don't call me thoughtless for having different tradeoffs than you!

mlochbaum · on June 9, 2024

I was able to make a variant of the higher-base version that runs in a single pass, by stopping when one partition fills up and using a different method for the remaining (asymptotically few) elements. I described the idea, which is based on another effort called MergeShuffle, here: https://mlochbaum.github.io/BQN/implementation/primitive/ran...

And it is better when N gets large. My implementation set the cutoff at 2^19 elements, although the effect isn't too big for a few more powers of two. Here's the main radix loop: https://github.com/dzaima/CBQN/blob/v0.7.0/src/builtins/sysf...

orlp · on June 9, 2024

I found another in-place approach which also does a higher-base version described here: https://arxiv.org/pdf/2302.03317, with an open source implementation: https://crates.io/crates/rip_shuffle. Might want to compare it with your version.

mlochbaum · on June 9, 2024

Performance-oriented library with no benchmarking instructions, fun. I get 850ms to shuffle 32-bit integers up to 1e8 with this library versus 400ms in BQN (•rand.Deal•_timed 1e8). However, BQN also has a large advantage at smaller sizes, such as 425us versus 120us for 1e5 elements, so a lot of this may be the random number generator. I haven't figured out how to get PCG working yet. BQN uses wyrand which is faster but I now know has quality issues (can't even generate every possible output; I need to update the page I linked...).

It's substantially the same algorithm so any differences would just be down to implementation details. Other than multi-threading which BQN doesn't do. The usage is also a little different as BQN generates shuffled integers directly; generating the integers is 100ms of the 850ms for rip_shuffle but I'm not sure whether it makes sense to subtract that or not.

mlochbaum · on June 9, 2024

Not quite the same, rip_shuffle does have some contortions to be able to run in-place (I'm still scratching my head about who's running these sorts of high-performance algorithms with no auxiliary memory available), so if those cost anything they could contribute to the difference.

mlochbaum · on June 1, 2024

Functionality is, at this time, extremely limited (and the first kfun was out in January so I don't think there's really any intention to get this to usability on a short timeframe). No support for paired syntax like parentheses, functions in braces, and square brackets for indexing and function calls. No stranding or tacit functions. I doubt it's Turing complete. Many primitives are unimplemented and others are flaky: for instance calling count (#) on an atom can give an arbitrary number, print garbage, segfault, or allocate memory until it's killed. But it's got vector instruction support.

If you're looking for a practical k implementation, I recommend ngn/k, and several other implementations are listed at https://k.miraheze.org/wiki/Running_K .

mlochbaum · on June 1, 2024

Yes, when I took a look at shakti's database benchmarks before, they seemed entirely reasonable with typical array language implementation methods. I even compared shakti's sum-by benchmarks to BQN group followed by sum-each, which I'd expect to be much slower than a dedicated implementation, and it was around the same order of magnitude (like 3x slower or something) when accounting for multiple cores. I was surprised that something like Polars would do it so slowly, but that's what the h2o benchmark said... I guess they just don't have a dedicated sum-by implementation for each type. I think shakti may have less of an advantage with more complicated queries, and doing the easy stuff blazing fast is nice but they're probably only a small fraction of the typical database workload anyway.

mlochbaum · on April 21, 2024

In modern APLs a character scalar is just a Unicode code point, which you might consider UTF-32. It's no trouble to work with. Although interfacing with things like filenames that can be invalid UTF-8 is a bit of a mess; Dyalog encodes these using the character range that is reserved for UTF-16 surrogates and therefore unused. If you know you want to work with a byte sequence instead, you can use Unicode characters 0-255, and an array of these will be optimized to use one byte per character in dialects that care about performance.

mlochbaum · on April 21, 2024

There's a tutorial explaining BQN's combinators not directly linked: https://mlochbaum.github.io/BQN/tutorial/combinator.html

The symbols ⊸ and ⟜ are new in BQN and I think these are a good example of how APL's symbolic approach can make it easier to think about combinators. You have w F⊸G x meaning (F w) G x and w G⟜F x meaning w G (F x), that is, the function on the pointy side is a preprocessing step for the argument on that side. Once you get used to which side is which these become some of the most intuitive BQN combinators to work with.