Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Intel Ditching Hyper-Threading with New Core I7-9700k Coffee Lake Processor (wccftech.com)
130 points by jhack on July 25, 2018 | hide | past | favorite | 84 comments


In general I don't think blow-by-blow Intel SKU rumors are a good use of HN's time, but it's useful to understand the context here: https://www.computerbase.de/2018-07/spezifikationen-core-i9-... The i9 is rumored to be 8C/16T, so Intel has to make the i7 worse in some way. An 8C/8T i7-9xxx is presumably still a little faster than a 6C/12T i7-8xxx even though it has fewer threads.


OpenBSD disables Intel's hyperthreading due to security concerns: https://news.ycombinator.com/item?id=17350278


Note: it wasn't found to be insecure, it was merely suspected of being insecure

(FWIW, this comes from people who tend to know what they are talking about. But also people who value security over everything else)


For the uninformed: concerns don't mean vulnerabilities, necessarily

At least there's no proof so far


More likely that HT was disabled on that system than Intel scrapping HT on i7s unless they are shifting the entire product line one peg down and having a mainstream i9 on the desktop that isn’t a part of the HEDT platform.


That is the Occam's razor choice to explain this yea.


HT is easy to implement but hard to debug. It wouldn't be at all surprising if an Intel engineering sample had HT disabled during the testing of some early samples. And really an Intel heavyweight core is wider than it needs to be in most cases where it isn't running two threads.


Assuming this is true (which I doubt - HT was the main differentiation between the i5 and i7), this seems like the exact opposite approach you'd think they'd take with Ryzen breathing down their necks and boasting high core counts- AMD's i7 equivalents are all 8 core / 16 threads - Intel would want to return fire in some way.


If they really wanted to make a consumer/prosumer divide between i5 and i7/9 then they’d make virtualization and ECC available only for the latter.


> Most applications can usually handle a maximum of 8 threads efficiently

I just wonder if I'm in the minority running more than one app at a time. Specifically, a dozen apps that are never closed and just stay in the background until I need them. Even more specifically, listening to music on YouTube while working in the IDE, wherein the browser pops up regularly on the CPU usage chart.

For the same reason, the norm of 4 or 8 gb RAM is baffling to me.


Do you find that you're CPU limited in your workflow? Apps that are mostly idle don't need a dedicated CPU thread. Firefox playing a Youtube video on a rather old i7-5820K @3.3GHz takes about 20% of one CPU on my machine. Emacs takes a negligible amount of CPU most of the time, so do my terminals. Compilation is really the only situation where I max out my machine.

In general if you only care about raw performance you'll run one heavy app per machine. You won't use the same server to render video and build your code at the same time, it will be more efficient to use two different servers for that.


To me it seems that modern apps don't really stay frozen in the background like it was ten years ago, at least not on Mac. Something always keeps bubbling up in the process manager. Especially, JS GUIs seem to constantly occupy the CPU with something, even if a little―despite the supposedly event-driven nature.


Even then, the CPU utilisation is low, having one or two spare cores is enough to reap any latency benefits.


How do you arrive at that conclusion considering that nowadays browsers tend to run each page in dedicated threads and in some cases even processes? In that scenario your assertion would only hold if you assume no one has more than a couple of pages open at any given time.


"Running Threads" in Windows and Linux don't use any CPU time unless there is something to do.

Most of the time, threads and processes are in the blocked state. For example, waiting for mouse movements, or network traffic.

Check your CPU utilization, I bet you its below 20% if you have anything close to a modern processor. Even with 30+ tabs open


what? read some OS books


I'm not CPU bound at home at all. Still have an i7 for those rare moments of raw file to jpeg export with Lightroom.

I do care about performance at work because I'm using it 8 hours a day but I wouldn't min an i5 at Home at all. Don't think that any normal person needs it. Even with a few apps running in the background


No one needs a V8 engine either, but it's nice to be able to tromp on the pedal and feel the g forces every once and a while. And the price difference for higher performance car engines is much more than the small bump in cost from an it to i7.


I do, but even with 20 apps running I probably never used more than 3-4 simultaneously that each consumed a meaningful amount of CPU. With 8 cores I can probably easily do without HT for any workload.


This seems like a terrible move just as AMD is getting more competitive. AMD cpus are just going to be that much bigger bang for the buck. I don't buy that they're going to limit ht to only their top end cpus.


This form of "segmentation" is something I've always found appalling. A part like this is no different from a high-end part; it's an identical design that is intentionally damaged so it doesn't cut in to high-end sales. It comes out of the factory fully-featured and Intel charges you extra if you want them to skip breaking it.

The free market, in its beautiful efficiency, leads to the intentional crippling of millions of state-of-the-art chips.


I understand it seems weird with hardware, but I think it's similar to how we charge for software.

Try this thought experiment:

1. Based on market analysis, Intel decides it could sell a part with hyperthreading for $1K, and one without for $500.

2. Engineers start building both chips.

3. Because the chips themselves cost little to make ($45 and $50 for non-HT vs HT let's say?), and it costs $10M of engineering time to design each chip, engineers realize it would actually be far more efficient to design one chip with hyperthreading, and disable it for the lower-end SKU.

I'm curious which part you object to in this sequence. The sale price of chips is mostly amortizing very high R&D costs, not unit distribution costs. There are a variety of different types of customers to serve with different price points, while designing different chips is expensive, so it ends up being logical to make one chip with different features enabled.

Software and web services are the ultimate expression of this kind of economics. It costs almost 0 to serve an additional customer, but a lot of R&D and operations to build it.

Would you call a web service "intentionally damaged" when they don't give you all features for the same price (or for free?)


This market where they can build chips for $50 and sell them for $1000 only exists because there is a (near) duopoly on x86 (afaik mostly due to patents).

If the HT chip costs almost the same as the non HT chip then maybe they should just be selling HT chips for $500 instead, that would probably be what would happen if there was actually competition in the x86 market.


What about the dies that aren't 100% perfect, and don't work at those high end specs? If there's a defect in a core or two, or in some area of cache, why not trim it down to a lower spec part and sell it for something (rather than scrapping it)?


I think the case of HT is more pure "segmentation", and a little different. I bet most of the parts that ship with HT disabled have all threads working.

The number of transistors involved in HT are quite small. There are probably very few cores where one thread works and the other doesn't, compared to the number of CPUs that work or don't work because their cache or something more fundamental is screwed up by a defect.

I wouldn't be surprised if they don't even test for one thread working and the other not, and the "defeaturing" is a separate step after yield binning.


On the other hand, I'm sitting here perfectly happy with my $90-$100 i3 at home, which would not have been as cheap without price bucketing and all the people buying expensive i7s and server chips subsidizing the R&D.

Do I ever use more than 2 cores? Rarely.

Oh actually it has 4 cores, guess that changed with the newer chips.


> Oh actually it has 4 cores, guess that changed with the newer chips.

Yup. That changed with the 8000-series. The current i3 chips are more like the previous generation's i5.

It's pretty nice because it means i3's are suitable for gaming.


Doesn't that viewpoint ignore real-world yields and binning?


By having different feature levels, they can downgrade parts that don't pass at a higher class of features. For example, maybe part of the cache is faulty then you can disable half of it and sell as a lower grade part.


Is this a trusted source? The previous post about i9 there has a different information altogether.


No, Wccftech has a reputation of posting any rumors and not bothering to verify them.


Or just making things up.


Nothing coming out of wccf is even marginally trustworthy, they're infamous for posting every single rumor they hear about.


This source [1] says:

Core i9-9900K with 8 cores and 16 threads

Core i7-9700K with 6 cores and 12 threads

Core i5-9600K with 6 cores and 6 threads

[1] https://hothardware.com/news/core-i9-9900k-coffee-lake-cpu-i...


Are there any scheduling experts on here? So I've always been under the impression that HT just schedules instructions on the unused functional units.

So say your CPU has 100 adders, and when you resolve all your dependencies for incoming instructions, you can only use 60 of those adders when running instructions in parallel (out of order), so the HT/logical core uses the other 40 adders for another thread (so the HT cores get a lower priority than the standard cores; hence why operating system that are HT aware can be more efficient by scheduling lower priority threads on the HT cores).

Is that correct or am I way off? (HT wasn't a thing when I took architecture class. I had a dual AthlonXP back then, where I had two physically separate processors).


HT can help fill unused execution units (which are in the order of a dozen not hundreds unfortunately) as they are over-provisioned compared to what a single thread could possibly use [1].

But the biggest win is when HT can fill pipelines holes caused by a single thread waiting for some long latency operation (almost invariably memory accesses). Normally out of order execution can fill these holes by extracting parallelism from a single thread of execution, but that's not always possible when multiple high or unpredictable latency operations are chained together.


Yes I think it more or less works that way. A hyperthread has its own set of registers and stack and everything, but for doing any actual work it shares the same functional blocks as any other thread running on the same core. I do think that the hyperthreading parts increase the numbers of some functional blocks like load/store units and such compared to non-hyperthreading parts though, to reduce contention (someone correct me if I'm wrong).

What this means is that hyperthreading does not really work so well on sustained, homogenuous workloads. For example doing very heavy computation on 8 threads of a 4-core CPU with 8 hardware threads, can actually reduce performance, because all threads will be contending for the same functional blocks of the CPU.


> What this means is that hyperthreading does not really work so well on sustained, homogenuous workloads. For example doing very heavy computation on 8 threads of a 4-core CPU with 8 hardware threads, can actually reduce performance, because all threads will be contending for the same functional blocks of the CPU.

When I got my i7-3770K (4 cores, 8 threads) years ago, I was into POV-Ray rendering. I did a test using only 1, 2, 4, and 8 threads.

As expected, 2 threads was double the speed of 1, 4 threads was double the speed of 2, but 8 threads was only about 15% faster than 4.

Thinking about it now, I wonder what power consumption looked like.


> hyperthreading does not really work so well on sustained, homogenuous workloads

what would distinguish this description from, say, compression and compiling? based on the experience on my machine (which of course is limited), HT does give a boost in those two cases, which could be classified as homogeneous and sustained.


>> hyperthreading does not really work so well on sustained, homogenuous workloads

> what would distinguish this description from, say, compression and compiling?

The real difference is between integer and floating point workloads.

Integer workloads typically have lots of branch mis-predicts and cache misses which cause pipeline stalls. Any time you've got pipeline stalls, hyper-threading will be great.

Floating Point workloads typically are just huge amounts of number crunching with very simple access patterns. There are few branches and memory accesses typically have a consistent stride and so are pre-fetch friendly. Typically, this kind of workload is memory bandwidth limited because your CPU can run full bore without any bubbles in the pipeline. Hyper-threading isn't much use in this case: if any functional units are going unused, it's only because the CPU can't vacuum data up fast enough. This is one of the big reasons GPUs are so popular for doing FP workloads: they have a ton of functional units and they are paired with very high speed on-board memory.

In the grandparent's defense, floating point workloads tend to be sustained and homogenous. They are, however, not the only kind of sustained, homogenous workloads, as you so accurately point out.


Compiling isn't homogeneous IMO. Lots of branches, lots of waiting of RAM, lots of corner cases. Lots of ways to get those hyperthreads working in parallel.

Compression would be closer to homogeneous. But something REALLY homogeneous is like, Prime95. You're only hitting SSE and/or AVX instructions over-and-over again. All threads try to only use AVX instructions, so hyperthreading doesn't help too much.


HT also doesn't help as much on FP workloads simply because there are less FP execution units to share, e.g. Skylake: https://en.wikichip.org/wiki/intel/microarchitectures/skylak...


> For example doing very heavy computation on 8 threads of a 4-core CPU with 8 hardware threads, can actually reduce performance, because all threads will be contending for the same functional blocks of the CPU.

We have exactly this scenario on a server we're running (non-virtualised) and, if it didn't involve rebooting the machine, I'd already have run a test to benchmark whether we get better performance with or without hyperthreading. Unfortunately our test environments aren't similar enough to be representative in terms of testing yet, but this is going to change in the next few weeks so I'll finally be able to run my test without needing to take down the production box.


You could try offlining the HT cores in the linux scheduler, and see if it makes a difference (no reboot required).

Example: https://www.golinuxhub.com/2018/01/how-to-disable-or-enable-...


Unfortunately we're running Windows Server 2016. I may be wrong - be nice if I was, actually - but I thought it required a reboot.


I am certainly not an expert. My high level understanding of hyper-threading comes from having a Pentium 4 with it back in the early 00's [1]. Things may have changed since then, so I could be wrong...anyway, a simple model of hyperthreading would be a CPU with two operations: load and execute. Each operation takes one CPU cycle. Hyperthreading is possible when both load and execute can run each and every clock cycle.

  Cycle 0: Thread 1 loads
  Cycle 1: Thread 1 executes & thread 2 loads
  Cycle 2: Thread 1 loads & thread 2 executes
  Cycle 3: etc.
  
The advantage of hyperthreading is that there is little or no context switching overhead. At least relative to traditional hardware threads. With a real CPU there are more steps in the pipeline and each step takes a variable amount of time. Later CPU's take advantage of those available cycles with parallel, speculative and out of order execution...but those are generally abstractions over a single thread.

[1]: https://en.wikipedia.org/wiki/Pentium_4


AFAIK no, HT siblings do not get a lower priority. The core's resources are shared equally between the SMT threads. Operating systems that are SMT aware try to avoid this sharing when possible (a non-SMT-aware OS could schedule a pair of runnable threads on one core while another core is idle; a SMT-aware OS would use the idle core instead, so both threads get a full core for themselves).


Its funny how HyperThereading is such a widely used term, being an Intel marketing trademark. (SMT is the computer architecture term - cf. mmx vs simd)


They are different. HyperThreading is where one core acts as two by rapidly switching contexts. It isn't actually two hardware threads, and the one core never does two things at once.

SIMD is one thread doing one instruction on multiple pieces of data at the same time.

SIMD can give higher throughout from the CPU, and you must organize your data types to use SIMD.

Symmetric multithreading is where software takes advantage of multiple logical hardware threads to do multiple pieces of work per clock cycle.

SMT can use HyperThreads and/or multiple physical cores, and/or multiple physical CPUs with one or more logical hardware threads each.


> They are different. HyperThreading is where one core acts as two by rapidly switching contexts.

This would be quite the opposite of Intel's actual SMT implementation, which aims to keep all the parallel execution resources of an OoO core fed and busy.

There have been some machines that do what you describe too (eg Tera/Cray MTA, many GPUs, Sun's Niagara), to combat memory latency and reduce the need for cache. Those machines have a big thread count since they want to have a lot of outstanding memory operations in flight. You will notice that these machines are not called SMT, since the S stands for "Simultaneous".


It isn't actually two hardware threads, and the one core never does two things at once.

I don't think that's quite true. Rather, the two "virtual cores" are mostly independent, but share some backend resources. For example, modern Intel CPU's can sustain execution of 4 instructions per cycle, and with hyperthreading, two of these instructions can be from one thread and two from the other. Or one and three, or four and zero. Both threads truly do execute at the same time, it's just that the competition for resources means it sometimes takes longer for them to execute on a shared core than on separate physical cores.


I wonder if they fixed meltdown on those new processors. If they didn't, a lot of people probably would wait for upgrade (or switch to AMD).


Has AMD fixed meltdown on any processors?


AMD never had meltdown to begin with. That was an Intel exclusive. Spectre is the one that affects everyone


For a few seconds I thought you were all talking about back when Intel introduced thermal management causing AMD to get a reputation for burning up[1] if the heat sink wasn't properly installed.

https://www.youtube.com/watch?v=Xf0VuRG7MN4 (AMD results start at about one minute mark)


Has AMD fixed Spectre in its processors?


As much as Intel has.

The issue with Spectre is that it is a new "buffer overflow". You can't "fix" a buffer overflow through hardware alone. You need software + hardware... and at best, you only get mitigations.

And within the next few months, some researcher is going to come up with a new Spectre-based attack that current mitigations won't work on. Its a bit annoying. Just sit tight and stay up to date on Spectre, its a moving target.


Not that I'm aware of, but neither has Intel. Intel recently published some patches that will enable Enhanced IBRS mode in future processors but those haven't even been talked about yet, I imagine that AMD will do something similar in the end but we haven't heard it yet.

http://lkml.iu.edu/hypermail/linux/kernel/1807.3/00923.html


whops, of course, mixed them up.


"Most applications can usually handle a maximum of 8 threads efficiently, any more than that is diminishing returns anyways."

That's only relevant if you only plan on running one application at a time.


But HT was the main (or only) difference between i5 and i7. What will be the difference among 5/7/9 in 9th generation?


Core counts from i3 through i7, threading for i9. According to the article, anyway.


This is very unlikely just when amd's making them panic with super competitive ryzen procs


is this because of meltdown?


> This is now going to be a feature limited to the i9 branding,

So their goal would be to limit meltdown to the flagship i9 series? Seems like a strange plan.


I suspect that without hyperthreading the OS kernel has more explicit knowledge and control over thread tenancy so it is easier to mitigate a cache timing attack. But that is just speculation on my part.


L1 cache is shared between threads on the same core so, with SMT, you can attack the code that runs on the sibling thread too.

However, that is more for Spectre, which abuses the reads done from foreign code to observe the effect they have on the cache. That foreign code can be the kernel, another process, the hypervisor, etc. running on the same core. Meltdown can read the entire comments of memory without needing foreign code and therefore SMT doesn't matter for it.


Maybe my understanding is wrong but with smt though the kernel.is not informed of which virtual CPU is in residence at any given time. IIRC, With smp only, the kernel is responsible for scheduling onto the processor so it can invalidate the cache exactly when it needs to.


The threads look exactly like real processors to the kernel. The only difference as far as the kernel is concerned is that they share the L1 cache.


Exactly. So you can have insecure boundary between threads because they share L1 cache. Is my understanding of how meltdown works incorrect?


You're thinking of Spectre. Meltdown doesn't need to share cache, a thread can just read kernel mappings into its own L1. Kernel mappings typically include the whole physical memory (on 64-bit machines where virtual address space is huge).


It's because it's one of the few ways Intel can cut costs/remain profitable in the face of new AMD competition without disrupting the company's operations in the long term.

Of course this doesn't mean its brand/sales won't be hurt further due to this move.


Honest question: How does disabling Hyper-Threading cut costs?


It doesn’t but it allows them to have more products based of the same exact die just like i7s and i5s share the same die which reduces manufacturering costs.

That said it would simply mean that the i7 would become the new i5 and also fill the same price range quite likely.

They might run the i5, 7 and 9 in the mainstream line now and bring the i3 to where the Pentium line is currently is however I still think that this is quite likely a configuration issue than a new segmentation since the i9 is their HEDT platform now.


they already had too many products! you have to spend like 5 days figuring out which one is 1% faster than the other for your particular use case.

choice is bad! :D


The good news is that you can ignore the 1% differences. Here's your new decision tree:

1. How much can I afford?

2. In that budget, do I need maximum instructions per second on one core or maximum cores?

If maximum cores, AMD is over there.


Transistors, I guess; HT adds extra registers for each core. However, I have some doubts that removing them would make a tangible impact on the final price.


Not actually.

Remember that architectural registers (such as RAX or EAX) are "fake" and remapped to the real, physical, microcode registers. Code like "xor eax, eax" is translated into "physical-register #51 = 0" with regards to uops, and doesn't even use any execution units on modern Skylake or Zen processors!

Hyperthreads only need to contain one more remapping of physical registers to architectural registers.


Not registers per se. You want a large number of physical registers for renaming and speculative execution in any event. What you need with HT is another sets of mappings form architectural to physical registers for the committed state.


Intel claimed that HT used about 5% of the die area, but that was with HT Pentium 4. The numbers probably changed in 15 years.


It's probably less as a percentage now.


Or you could leave it all in there (cheaper and easier) just disabled and instead enjoy better yields:

* Parts with broken HT can be shipped as fully working non-HT parts.

* High-leakage parts can be shipped as they now pass the power screening.


> Parts with broken HT can be shipped as fully working non-HT parts.

It's unlikely that they ever get parts with broken HT on an otherwise salvageable core. Too much of that functionality is simply partitioning existing resources in half. There aren't that many transistors that are simply HT overhead that the core can do without when HT is not in use.


This is the status quo with i5 v. i7 binning, AFAIK.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: