Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My main workstation is too a Raptor Power9 running Void since November 2019, and I’m very happy with it although as the article mentions, “My experience with firmware is that open source does not mean necessarily better, rather the opposite”. OpenBMC (the firmware for the little ARM-based embedded computer that boots the big one) has a number of rough edges.

From time to time I submit patches to software to make it work (or work better) on this machine, last week it was these two trivial ones:

https://github.com/ciao-lang/ciao/pull/45

https://github.com/janet-lang/janet/pull/915

Typically that’s all that’s needed, in some cases it’s just more of it:

https://github.com/oneapi-src/oneDNN/pull/767

But for most things, since it’s Linux, it’s just those small changes: (one of my customers runs the software I write on AIX and that’s a little more work)

https://github.com/IntelRealSense/librealsense/pull/5586

https://sourceforge.net/p/speed-dreams/tickets/1022/

For Rust programs, many packages used i8 for C strings (to interface with OpenGL for instance) which broke when running on ppc64 and had to be changed to c_char, but that’s not a problem these days, I guess because many people now test on ARM.

https://github.com/femtovg/femtovg/pull/5

Before Power10 was done, IBM actually asked us Raptor users about proposals for useful machine code instructions to add to it. I replied that I’d like to have hardware UTF-8 de-/encoding but they wanted a more detailed proposal and I never got around to write it. I’m not even sure that this would be worthwhile, but I see UTF-8 de-/encoding everywhere in the code I write and would like it to approach memory read/write speeds.

I was very disappointed to learn that they had gone more proprietary with Power10 so I would not have been able to use those instructions anyway. What a pity! This machine still covers my needs so I’m not planning to replace it any time soon.



I don't manage the system I work on, so I don't see OpenBMC, but "a number of rough edges" sounds a lot better than the proprietary BMCs it's been my misfortune to use, which appear basically to be unsupported.

As far as rust goes, I've just asked around for rust expertise to try to get a feel for whether it's worth persisting with trying to make the stuff work that a user wants, since I know nothing about it. (The first issue I found was exactly i8 in a current crate.) That is rather the exception in building free software for ppc64le, though, in my experience of packaging HPC-type stuff. The real problems are with Mellanox/NVIDIA proprietary stuff for GPU support.


Interesting, never heard about them asking about ISA changes. Where was that done?

There are some lovely new instructions in Power10 but until they get the firmware source out fully I won't use them in the Firefox JIT.


It was through the IRC channel. I sent a direct message as response and exchanged a couple of sentences, then they gave me their IBM email address to submit the detailed proposal. My understanding was that it would have to be justified and for that I would have to show that it could be more efficient than an implementation based on the existing SIMD instructions which I’m not familiar with. I suspect that the kind of instructions that could actually be put there might not perform better than that.

I regret not sending at least an amateurish proposal, I still think it would be a good idea to have no-cost UTF-8 en-/decoding. Not just for text, but for general variable-length encoding of other kinds of data.


Very interesting. Which channel specifically?

VSX is really where the new development is happening, but it's become quite complete. The PC-direct instructions first made available in P9 also really closed a gap (beforehand you had to do bl with a weird flag to get PC in LR without trashing the history table).


The IRC channel was #talos-workstation on FreeNet, now on Libera.Chat.

Do you know of some minimal example of calling VSX instructions from C?

Ideally just one .c file and one Makefile or README with the exact GCC command to compile it, plus a pointer to documentation describing each instruction. I’ve seen assembler inserted into C source code with GCC, but I’ve never done it and I assume there are some non-obvious details to take into account.


Sorry, didn't see this until now (out all day). Here is a very stupid example that uses `xxbrd` to byteswap a 64-bit quantity.

  #include <errno.h>
  #include <stdio.h>
  #include <stdint.h>
  #include <stdlib.h>
  
  int main(int argc, char **argv) 
        uint64_t v = 0;
        double o;
  
        if (argc != 2) { 
                fprintf(stderr, "usage: %s quantity\n", argv[0]);
                return 1;
        }       
  
        v = strtoull(argv[1], NULL, 0);
        if (errno == EINVAL || errno == ERANGE) {
                perror("strtoull");
                return 1;
        }       
  
        __asm__(
                "xxbrd %0, %1\n"
                :"=f"(o)
                :"f"(*(double *)&v)
        );      
        fprintf(stderr, "0x%lx\n", *(uint64_t *)&o);
        return 0;
  }
  
  % gcc -o xxbrd xxbrd.c
  % ./xxbrd 0x123456789abcdef
  0xefcdab8967452301


I don't know what you want of the VSX, but if you want vectorized code, what do you expect to gain over letting the compiler do it on your C? If you want an example, there's the kernels in OpnBLAS and FFTW (and BLIS, but that seems to be broken on POWER9).

There's an IBM web page somewhere with three(?) alternatives for using VSX, one of which is just using SSE intrinsics -- I don't know how well that works -- and another is a library that's now in Fedora, whose name I forget.

That said, it's obviously not competitive with AVX2 or, presumably SVE, unless you can win on parallelization (or plain clock speed, which you probably can't).


What I’d like to do is a quick proof-of-concept to see whether whatever instructions are available in my CPU can be leveraged for UTF-8 en-/decoding.

For instance, does it work any better than my C implementation? https://github.com/Sentido-Labs/cedro/blob/master/src/cedro....

Maybe the compiler already compiles that to an optimal SIMD version, I don’t know. That’s what I would like to find out. And if the VSX instructions are not a good fit for this task, which instructions would be needed? Can I come up with a combination of logic gates that does that? Maybe not, there might be no way of implementing any significant part of the algorithm without branches or look up tables.

The thing is that I need to start somewhere, and for that classichasclass’ example is exactly what I need.


Just keep in mind that the FPRs and vector registers are now aliased together (in VMX-only CPUs this wasn't necessarily the case). What is particularly stupid about my example is that it may have to spill to memory to move the uint64_t (a GPR) into the VSX register (an FPR) and then move it back because PowerPC famously had no direct GPR-FPR moves for quite a while. Since I didn't specify -mcpu=power8 (or higher), gcc doesn't issue the new instructions and I'm not sure it would know how to.

A better way would be to explicitly use the newer mtvsrd (mtfprd) and mfvsrd (mffprd) instructions and avoid the spill. So here's a revision 2.

  #include <errno.h>
  #include <stdio.h>
  #include <stdint.h>
  #include <stdlib.h>
 
  int main(int argc, char **argv) {
        uint64_t v = 0;
 
        if (argc != 2) {
                fprintf(stderr, "usage: %s quantity\n", argv[0]);
                return 1;
        }
 
        v = strtoull(argv[1], NULL, 0);
        if (errno == EINVAL || errno == ERANGE) {
                perror("strtoull");
                return 1;
        }
 
        __asm__(
                "mtfprd %1, %0\n"
                "xxbrd %1, %1\n"
                "mffprd %1, %0\n"
                :"=r"(v)
                :"r"(v)
        );
        fprintf(stdout, "0x%lx\n", v);
        return 0;
  }
If v is already in a register, then it can just stay there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: