Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Banana Pi to Launch 24-Core Arm Server (cnx-software.com)
145 points by ingve on Dec 26, 2018 | hide | past | favorite | 97 comments


The sad thing about this chip is that you won’t even get a datasheet for it, nor a reference manual. At least SocioNext didn’t want to share it with me. That’s why I prefer NXP or anything else that has NDA-free datasheets.


Seriously,why the f--k do vendors do that. Broadcom used to be notorious for that nonsense. They might as well keep their Manhattan Project chips to themselves.


> used to be

Sorry to be the one to wake you from your apparently very pleasant daydream, but I just picked three random chips from their website and the only links available are "request info" and "contact sales." I don't see any evidence to support a claim that they've changed at all. They're still the worst.


> Sorry to be the one to wake you from your apparently very pleasant daydream

This seems a bit uncharitable. Here's another way to look at it: perhaps the commenter you replied to was familiar with Broadcom's policies from some years ago, but had not kept up to date with them and didn't want to say one way or the other about Broadcom's current behavior.


A lot of the original Broadcom became part of Cypress Semiconductor.


If it's a Chinese company you can bet the datasheets are going to leak sooner or later.


Super niche use case. But I would absolutely love one of these for cross-compiled development work

Right now if I want to cross-compile Gentoo for ARM and ARM64. I can build "most" packages with the included emerge-wrapper. Which is great as it runs basically at 1:1 speed as doing x86_64 compile jobs

However a ton of packages still fail, many of their upstream developers also refuse to incorporate changes that might fix that https://dev.gnupg.org/T2370

For those packages, I have to use a QEMU usermode chroot. Which on my i7-4790 build host, is slower than native compilation on the Raspberry Pi itself

I'd love to be able to do direct, native builds to sidestep these flaws. But every "consumer" ARM board (Raspberry Pi, ODROID, ROCK64 etc) are flawed in some way as to make them unusable for development

- Raspberry Pi lacks enough RAM and will hang even when doing single-threaded builds (heavy packages like GCC and Rust usually hit this limit)

- The ODROID-C2 hits a similar limitation due to only having 2GB of memory itself

- The ROCK64 "can" complete full self-hosting builds (slowly) but has a staggering amount of kernel bugs relating to its Ethernet and USB 3, which frequently cause the system to hang


At work we have a Pi 3b+ build agent and another 3B running a JavaScript-heavy status board. Both had reliability problems (freezing/crashing) until we added heatsinks. YMMV.


I regularly compile big projects like Envoy Proxy, Bazel, CockroachDB, a bunch of Rust tools like ripgrep, bat, fd etc. on my RockPro64 without stability issues. Heatsink is key. I run Ayufan distro as the OS [0]. The additional cores and 4G RAM help with compilation speed a lot over the Rock64.

0. https://github.com/ayufan-rock64/linux-build


> - Raspberry Pi lacks enough RAM and will hang even when doing single-threaded builds (heavy packages like GCC and Rust usually hit this limit)

Can someone ELI5 what exactly the compiler is doing when needs far greater than 2 Gigs of an RPI's RAM to compile itself/Rust?

Also, how is it a "flaw" of the RPI that GCC must exceed 2 Gigs of RAM to compile itself/Rust?

Warning-- I may use the explanation to cudgel future Electron-so-fat threads...


Generally it is the linker that seems to be the issue AFAIK. 32 bit platforms can no longer link at all for large programs like Firefox. Link time optimization will probably make this worse. The gold linker (that Android uses) uses less memory than GNU ld; possibly the llvm linker is better too.


Are there any articles about the algorithm GNU ld uses and why it can't work efficiently with less memory?


I’ve gotten around memory issues by adding a sufficiently large swapfile on raspberry pi


I do native builds of a relatively big C software project on an odroid-xu4 all the time. What are you building that needs more than 2gb for (presumably) linking? LLVM and maybe the kernel comes to mind? just curious


Running a build of Ayufan's 4.19 patches on a rock64 with latest dev device tree from the Armbian project. Eth runs close to theoretical, finally, and I've not observed any issues with USB3 either. Months of uptime, TB of traffic.

Here with the rock64 it was just a matter of time till hardware support settled in mainline, and it seems like we're there!


> The video below shows the server’s 24 cores fully utilized while building the Linux kernel, and as the title implies it’s running the recent Linux 4.19.

And how long does it take?


Even if it takes a while, it's much more convenient to native compile than cross-compile for linux distro maintainers and those who are compiling ARM packages.

Cross-compilation gives you performance, but a lot of other headaches... and currently using a quad or hex-core ARM takes quite a long time... so this would give a significant performance boost, especially with the 32GB of ram it seems to have.


Speak for yourself. I'll take faster builds in exchange managing a cross compilation toolchain any day of the week.


Perhaps for building a webapp or something...

For linux distros, they download the source, apply some patches, then let their build server/farm have at it.

Managing a cross compilation toolchain in addition to cross compilation flags and what-not for joe-random-package with quirks quickly becomes a nightmare. Version upgrades will break everything, or pull in build-system linked libraries when they're not supposed to, etc.

This sort of high-core count arm server alleviates a lot of headaches - it gives you faster build times as well as native builds. It also does away with a lot of complexity in your build pipeline, since you don't have to carefully isolate the build environment to prevent rouge linking x86 binaries.

Most linux distros that offer ARM versions already build natively... so this just gives them better build times, or more concurrent builds. That's a win.


That's a _really_ narrow use case for a business to make a product like this, and those distro vendors are already building on Cavium or Qualcomm platforms with significantly higher performance than this hardware.

Besides, with Cortex-A53, any single-threaded step in a build (linking, etc), is going to be very slow.

It's a sort-of cool product, especially if it comes at a good price point. But it's unlikely to make a huge difference in the overall market.


I wonder what the price point will be. I always wanted a low cost DIY NAS server with PLEX server capabilities. I tried with a RasPi 3 but had some problems with it hanging up all the time when I ran a SAMBA server concurrently.


Is it a problem with all raspis? I have two of them and they all conk off after few days due to voltage issues and I have to physically restart them.


> Is it a problem with all raspis?

My Pi currently has an uptime of 388 days. I use it as a file-server/backup-machine in my home network. It runs hourly backups for multiple boxes in the background (I use vhpi [0] for this). I have a usb battery attached to it for uninterruptible power supply. It works like a charm. I use NFS over Samba, though.

[0] https://github.com/feluxe/very-hungry-pi


I have had good uptime too on a pi which runs a UniFi controller. All the outages I have tracked in the last year have been power (maybe 2) or my ISP doing something, as the IP address changes after a 2 second outage. It sends me Slack notifications of what’s happened so it’s easy to see.


What are you using to power them? A proper 5V 2A supply should have no issues at load.


Yeah, that is a very likely cause. Make sure to get a quality power supply and a decent MicroSD card (as in, not the cheapest one on Amazon) so it doesn't fail after a few months. I have Raspberry Pis that have had uptimes of years+ with no issues.


And set up readonly rootfs. Or use a Banana Pi with onboard NAND to store rootfs. Or do both, just to be sure.


Yup.

You want to keep it plugged into a monitor for a while and watch out for the undervolt warning (either a lightning bolt or a rainbow in the corner of the screen). Improving the cooling, the quality of the USB cable, and the power supply are also really good ideas.


> Improving the cooling, the quality of the USB cable, and the power supply are also really good ideas.

What have you found so far to be the best options?

In my experience the quality of cabling and power supplies are the factors that most frequently affect stability of electronics hardware.


The power supply was definitely the biggest help, but I was using a dreadful one to begin with.


The problem is with bad power supplies and USB cables.

Make sure the USB port can pump out enough current. Make sure you use a short, thick USB cable. Once you do that, stability issues are gone.

I also put a heat sink on the CPU, but it probably has no effect on stability. It reduces the amount of time the CPU spends in thermal throttling.


I have a heat sink and always SSH but like others are suggesting its probably an undervolt issue although its feeding directly from USB power port. Might be the long cables I use


> Might be the long cables I use

The long cables where? I think I know what you're getting at but I would really love some more explanation if possible.


I'll take a guess at what they meant, and offer an explanation.

All wires have resistance, and all else being equal, a 2x longer wire will have 2x more resistance. Due to Ohm's law, there is a voltage drop in the cable when you are powering a load (such as the RPi), and this drop will eventually go below the minimum for the load.

Example: the RPi needs 5 volts at 2 amps.

If your cable is 0.1 ohms, then the voltage at the Pi will be 5 - (2 * 0.1) == 4.8 volts.

If your cable is 1 ohms, the voltage at the Pi will be 5 - (2 * 1) == 3 volts.

But because the Pi doesn't always need the full 2 amps, that 1 ohm cable will work sometimes, making debugging harder. Only when you try to do something compute and/or RAM intensive will you notice the board randomly die.


This is right. The way I have setup my home office I need a 3x wire cables for my raspberry pis and this is probably causing an undervolt. I might have to connect a display/vnc and check for warnings.

The LED light does go steady red which is an indication that it has conked off and needs a hard reboot


Take a look at journalctl, it records the under voltage warnings. The text is red, on raspbian at least, if I recall correctly


I've run a bunch of pi zero based cameras w/motioneye. Ignoring some QC problems with the camera modules they've proven to be very stable even in >100F ambient temps.

But I power them off dedicated wall warts.


> But I power them off dedicated wall warts.

... and what effect are you implying that this has in this particular context?


... that my experience is consistent with the others stating RPIs can be reliable with sufficient power supplies.


I blame bad USB micro cables. Replacing bad cables with new high quality solid cables fixed the unreliability for my pi3.


Macchiatobin is a pretty good alternative these days. Mini-ITX form factor, 4 relatively high-performance cores, good amount of RAM and PCIe/SATA/10GbE.


The MACCHIATObin line does charge over 300€ for a underpowered 4-core processor with 4GB of RAM, and 500€ for a 16GB model. That's the price of a low/mid-range COTS x86-64 box.


What are the use-cases for such a server? It does not look good for desktop use.


Low power/high density virtualization hosting, website hosting, or even a native build server for linux arm distributions.

That's a lot of ARM64 power in one platform.


I'm not so sure. This is a company that sells sub $100 computers after all.

It depends on the type of ARM core and the manufacturing process. If it has 24 A53 cores then it wouldn't beat a 4 core desktop CPU from 2012. If it has 24 A72 cores but a manufacturing process older than 28nm then it still wouldn't be competitive with anything on 14nm or better.


A53 cores can use as little as tens of milliwatts... these aren't going to compete with Xeon or whatever high performance CPU's are being run in today's clouds... but for a static website at Wix (or whatever), you don't really need that performance majority of the time for majority of their customers.

Same goes with most long-running VPS'... bursts of activity here and there or once a day or whatever.

Being able to run those workloads on cheaper cores that consume far less power would be a huge win. Not all workloads would be a good fit, but those which are, this can be big.


They're A53 cores at 1Ghz assuming the article is right about it using the SC2A11. Multithread performance will probably be in the ballpark of a high end Android phone or a dual core laptop.


The parent mentioned it would be a good server for build operations that require an ARM cpu. Perhaps you meant cross compiling on an old CPU would still be faster, but there may be demand for native ARM build machines.


ARM has some interesting power and heat advantages over Intel/AMD at the moment. Power and Cooling make up significant costs for cloud providers (and for many companies, for that matter), especially on the cloud storage side. For a fair number of the services, particularly ones with very bursting workloads, that superior idle CPU usage, and reduced heat output is _very_ interesting. It opens up a number of possibilities where "good enough" performance is actually good enough, and the extra rack density provides some strong advantages, _particularly_ in easily parallelisable workloads (or completely independent workloads happening in parallel, for example functions as a service)

Amazon isn't just providing ARM CPU servers because of customer demand, and nor is it just to get leverage with Intel. That's not the way they tend to work. They're providing them so that they can leverage them in-house as well. Amazon just likes to take the position that what is good for the goose is good for the gander. If it's good for them, there's bound to be customers that will find it good for them also, and as you sell it to customers, you get to those advantages of scale much quicker.


Cooling and power dictate just how much computation you can squeeze out of a data center. There’s even a fancy CS theorem about it. ARM really makes x86 look antiquated in this regard. I don’t think arm chips are where they need to be, yet, but the history of technology is full of this sort of thing.

Medium term, I expect Intel to look much more innovative than current. Long term, we’re either all dead because the robots won or uploaded to the cloud hive mind once our contributions to the universe outweigh our costs.


Arm has no intrinsic advantage over x86 ISA wise, for the most part. High performance arm cores consume similar energy to x86.


Not having to decode variable-length instructions is one advantage.


More cores means more flops. It’s going to take increased core “concentration” to get to exascale.


It would make a nice home file server.


ATOM series cpu will be good enough for home file server as it has lower power consumption too plus excellent kernel support.


Might be good enough... but for a fair comparison, ARM has excellent kernel support, and can sip a few milliwatts per core instead of Atom's best of 2 watts per core.

Might not make a difference if you attach them to spinning disks or something, but you can really turn the thing into a wall-plug sort of thing and never worry about it again.


if it's an idle fileserver ATOM can be put into sleeping mode most of the time, so the power saving might not be as much comparing to ARM.

if it's active 24x7, then hard-drive overshadows both ATOM and ARM CPU in terms of drawing currents.

ATOM supports multiple native SATA ports and it is still hard for ARM to catch up. Marvell is the only ARM vendor does nice ARM+SATA but its chip is hard to find, plus it is used in low-end NAS devices while ATOM is in the mid-range, it's more for the cost-saving instead of power-saving though.

I was attracted by ARM lowest-power in the last, until I realized hard-drive is the one consumes most power and ARM does not do well with non-multimedia IO.


I’m not sure these statements are true, can you please link me to some data?

WD Red 10TB maxes out around 5.7W [1] under heavy read-write activity and can use much less when idling.

Intel Atom is typically in the 20W range for the parts commonly picked for NAS solutions, but can go as low as 8W too [2] and again when idling would use less.

I don’t think they have anything <5W which would directly support your statement the HDD uses more power than the CPU.

[1] https://www.wd.com/content/dam/wdc/website/downloadable_asse... [2] https://www.anandtech.com/show/11144/intel-launches-16core-a...


Does it also cost the same? I know that Synology opts for ARM CPUs across their NAS line and only opts for Intel in most expensive models. Why if Atom is so good?


Atoms are expensive, compared to ARM for budget needs.


Is 24 cores really useful for a home file server?


Probably not. A home server would likely not get a high a number of simultaneous requests that can be processed in parallel. I suspect that fewer more powerful cores would lead to higher performance in that particular case.


If you can service fs calls on multiple cores, yes. The processing stack for a network fileysystems server is TCP or udp load, fs load and asynchronous clients. If it's just you, a couple of core does it. If it's you, your partner, a home pvr using network file streaming, kids playing music, I could believe some parallelism would help.

Big memory for zfs arc would help too.


Yes if you put your services into VMs.


24 cores & 32GB of RAM? This might just be enough for me to run Chromium with all my extensions enabled!


I had this idea about ARM servers that they were going to be significantly smaller, demand less power, and generate less heat than the existing x86 units. Where I work all of our servers fit in something like 22U and I expect a solid minority of them could be swapped out for something smaller.

Instead what we got are big chips with ARM ISA that are essentially scaled up to roughly Xeon sizes... which in hindsight I suppose is obvious coz that's probably where the money is.

Nevertheless I still think there is a niche for smaller servers with smaller granulation of quanta of compute capacity.


Not sure how small you want it but there are smaller and less power hungry x86 alternatives too. The more server-oriented atom chips (2-16 cores) are pretty decent and the much more expensive but more powerful Xeon-D (4-16 cores) can both be found in mini-ITX motherboards and pretty low-power configurations.

Have not kept myself up to date in case AMD offers decent alternatives in this segment.

Though my hope would be for ARM to lower the price point for these kind of servers.


I think it can't be a SC2A11, because the SC2A11 maxes out at 16GiB of memory, AFAIK.

I wonder what the TDP is on the SoC, the SC2A11 is only 5W (though they get a lot out of that!). I'd like to see what they could accomplish with a higher TDP target.


There is a link in the article to another one about a Linaro development platform, it states that the chip can use 64GB.


Hmm, maybe they're configured with different memory controllers? WikiChip and a couple others seem to say it's 16GiB, maybe that's 16GiB per DDR4 PHY.


A53 is a rather slow in-order core. I am not sure what use cases they imagine for this.


Acceptable server performance without worrying about Spectre?


I wonder what anyone would do with 1% increase in latency for long loops.


PHP shared hosting? :D


It would be interesting if consumer grade parts could hit a low enough price point for server boards without easily serviced parts [integrated CPU, mem, basic storage]. Maybe you could have cheap boards that deliver high reliability in aggregate.


They are not the only ones in Guangdong with hardware in this space. I saw an RK3399 ARM64 cluster board at Firefly a few months ago. http://en.t-firefly.com/


The RK3399 doesn't come anywhere close to this. The BPi board looks to be 24 cores on a single soc (or at least running a single system image). The closest you could get is to hang 10gig ethernet off of the pcie bus on 4+ RK3399s. Then you also have to start thinking about fun things like cluster scheduling, and the overhead of running an OS on each device.


Yes, these were carrier boards with numerous (8-16?) sub-boards each of which had an RK3399 CPU, each of which has 6 CPU cores (2xCortex-A72 cores, 4xCortex-A53 cores) plus a NEON coprocessor and Mali T860 MP4 GPU. The article discusses 24 cores. The Firefly hardware I saw would therefore have at least 6x8 = 48 CPU cores, or double the ARM64 CPU cores discussed in the article, before even counting the coprocessor or the 8-16 GPUs also included.


Depending on workload, pure core count can be irrelevant when your interconnect is trash and you have no shared memory. You might see OK performance if the interconnect somehow supports RDMA, but that seems exceedingly unlikely. Almost all these cluster boards are using gigabit ethernet interconnect. Not to mention you spend a core or more per SoC to handle your os and interconnect/network interrupts.

NEON is implemented on a per-core basis, not as a separate coprocessor. Every A53 (and your handful of A72s) supports NEON.

I'll give you the GPUs... if you really want to put in all the effort for that sweet sweet OpenCL 1.2 action on an out of date vendor kernel.

An RK3399 cluster, and a single many-core system is a nonsensical apples to oranges comparison.


But RK3399 has much better single core performance with the A72 cores (that you can run at 2.2ghz), so if you're not running parallelizable tasks all the time (/not looking for a cluster) it's a better chip.


That socionext A53 chip again… BUT if they manage to make it way cheaper than the developerbox, that might actually be interesting. Otherwise, I'd rather have four A72 cores (see macchiatobin) than twenty-four A53s.


What's about power consumption at full throttle? In watts, say compiling linux kernel with -j24/-j48.


I'd love to see a Banana Pi with x86 on it. Or literally any SBCs with good and afforadable x86


Odroid H2 - dual gb nics, SBC (but a bigger one), x86_64, SATA and USB3. $111.


LattePanda Alpha maybe? I've ordered one to run OSX


I had success with Udoo.


I do want one....


If one can make a board with enough cheap high performance CPUs, it may be good for neural nets. The main reason we use NVIDIA hardware is because it's cheaper per FLOP.


> The main reason we use NVIDIA hardware is because it's cheaper per FLOP

... well... and largely because most neural network software only supports CUDA and/or has rudimentary OpenCL support.

Nvidia hardware isn't really cheaper per FLOP in all cases, AMD and Nvidia still leapfrog each other back and forth.


Sure, but every framework has full support for CPUs.


That's true to a point... latest versions of TensorFlow, for example, have caused a few issues with instruction sets that not all CPU's (even produced today) have.


Is that because of CPUs that are only available at major cloud providers?


No, not entirely.

AVX instruction set, for instance, seems to only be on select Intel and AMD cpu's, even modernly manufactured ones. TensorFlow changed to requiring AVX since version 1.4 (Nov 2017), which has caused grief for many users... even those with recent/performant systems.


A board with many CPUs would be nice, but communication between many CPUs is slower than communication within one chip (GPU).


Most NN tasks are embarrassingly parallel.


GPU will generally have more cores. Its why a GPU was made to do is bunch of math: Matrix multiples, dot products, etc.


Yeah we use graphics cards because they are cheaper per FLOP than a bunch of CPUs. Using cheap CPUs isn't really going to change that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: