Speaking about design challenges - it would be nice if industry will switch to LLVM-like low-level HDL (or RTL) language, that will allow easier cross-tool integration. So far every project implements everything itself. There is an idea[1] to adapt FIRRTL[2][3] for this needs. Here is a good paper[4] describing its features. Apparently Intel also using it, at least in the research labs. This will help to integrate all parts of FPGA and ASIC design pipeline together. For now chip-design feels like programming in 1990 - quality of tooling is very bad.
It will stay like this for the foreseeable future. Chip design IP is too obscure for the mainstream oss people to force an oss ecosystem. There’s also too much proprietary foundry information needed to even have an oss ecosystem that actually works properly.
Ironically the closedness of chip design tools and foundry information is also what prevents it from truly growing. Entrenched players (cad companies) benefit from this, but the losers are the actual chip design companies. They are bleeding generation of engineers who have forsaken this climate for the more cozy cs oss climate.
Fairly interesting article. I'm a bit baffled by the reference to a supposed attempt at abandoning the von Neumann architecture (full citation below) - what would these alternative architectures be? It doesn't seem at all likely to me, and I'm wondering if this is more about some way to market rather minor changes as some kind of radical shift.
--------
"And the drive to put AI on the edge is causing new design architectures. “There has been a rethink on compute architectures for cache coherency, for heterogenous computing,” points out Synopsys’ Nandra. “In 2018, different approaches have been attempted to solve ML inference challenge. They are all trying to work out how to do inferencing on an edge device and being able to quickly process data so that they don’t have to upload information from the Cloud and to do things in real time. That has opened a big debate about von Neumann compute architectures to different approaches where you are separating memory, accelerator chips—be they dedicated FPGAs, dedicated GPUs, application specific processors—and new chips that have blocks that talk with each other.”"
A traditional von Newmann computer reads instructions and data from memory, into something that does arithmetic, logic and control (call it a CPU).
But consider that the "memory" nowadays is composed of multiple instances of 8 Gib DRAM. That's a lot of bits to be sitting there, read out perhaps 64 at a time by the CPU.
Instead, can some logic be embedded within each memory chip? Some way to bypass the "von Neumann bottleneck"?[1]
Ideas like that are the essence of "alternative architecture". Can a system composed of these, let's call them "non-von Neumann" elements, be faster or cheaper than the computer architecture we've been using since Johnny's 1945 seminal paper.[2]
I spent some time with a team building a ML processor based on associative processing. It still has a classical on-die, multi-tier cache system - built entirely in SRAM - but no ALU. It was designed to handle computations entirely in specialized SRAM arrays.
The concept was designed to handle massively parallel computation & minimize off-chip data transfers as much as possible. Still waiting to see how it will do in production.
> The confluence of the recent advances in technology and
the ever-growing demand for large-scale data analytics created a renewed interest in a decades-old concept, processing-in-memory (PIM). PIM, in general, may cover a very wide spectrum of compute capabilities embedded in close proximity to or even inside the memory array. In this paper,
we present an initial taxonomy for dividing PIM into two broad categories: 1) Near-memory processing and 2) In-memory processing.
Content addressable memories are a commonly used example.
Harvard architecture machines are still used in embedded processors; among other things they can provide some hardware security impossible in a von Neumann architecture.
They are interpreting "Von Neumann" somewhat loosely; on a more strict basis (though I consider it nonsense), the only real NV machines were minicomputers like the PDP-11 or micros until recently. Mainframes had plenty of processing units beyond the CPU (what, after all, was a channel controller) and modern systems have cpus embedded in disk drives, in the PHY of all sorts of devices etc...not to mention that a modern "machine" can be a machine-room-scale system with lots of subunits.
Even at the level the semi guys are talking about the current processors are a far cry from what Von Neumann wrote of, with multiple functional units, asymmetric processors, caches etc -- make a mockery of the simplistic CPU model of ALU, single path to memory etc.
Some alternatives are neuromorphic, heterogeneous computing and advanced packaging. Neuromorphic relies on neural networks, and one of the interesting things there is lower accuracy computing to speed things up because you have a whole network of sensors to provide context and collective accuracy. Heterogeneous relies on different types of processors and memories scattered around one chip, which could include GPUs, APUs or even embedded FPGAs. Advanced packaging uses different chips packaged together and connected through bridges, interposers or in the case of monolithic 3D, TSVs between different metal layers. (Here's another link on the same site: https://semiengineering.com/big-changes-for-mainstream-chip-...)
These are definitely major shifts, not marketing. The difficulty in continuing Moore's Law/Dennard scaling at 7nm and below makes it imperative to use alternatives. Instead of 50% improvements in power/performance at each new node, it's now 20% max, and probably significantly lower in some cases.
ML inference is not magic: by and large, it's just a combination of simple operations like matrix multiplications/dot products, element-wise nonlinearities, convolutions and other stuff that vector processors, GPUs and increasingly CPUs (thanks to SIMD) are very well optimized for. (In theory one could optimize a chip for some specific, well-defined ML architecture, even to the point of "wiring" the architecture in hardware, and people used to do such things back in the 1980s when this was needed in order to even experiment with e.g. neural network models. But given how fast ML is progressing these days, there's just no reason for doing anything like that nowadays!)
[1] https://github.com/SymbiFlow/ideas/issues/19
[2] https://bar.eecs.berkeley.edu/projects/firrtl.html
[3] https://github.com/freechipsproject/FIRRTL
[4] https://aspire.eecs.berkeley.edu/wp/wp-content/uploads/2017/...