More

m0th87 · 2025-08-11T10:07:09 1754906829

There are no guarantees even if everything operates on the same core. Rust docs have some details: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._mm_sfe...

Sesse__ · 2025-08-11T10:42:36 1754908956

Do you have any Intel references for it? I mean, Rust has its own memory model and it will not always give the same guarantees as when writing assembler.

m0th87 · 2025-08-11T11:46:28 1754912788

https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

Intel's docs are unfortunately spartan, but the guarantees around program order is a hint that this is what it does.

Sesse__ · 2025-08-11T12:30:44 1754915444

That doc is about visibility _outside the core_ (“globally visible”), so it's not what I'm looking for.

Similarly, if I look up MOVNTDQ in the Intel manuals (https://www.intel.com/content/dam/www/public/us/en/documents...), they say:

“Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with VMOVNTDQ instructions if multiple processors might use different memory types to read/write the destination memory locations”

Note _if multiple processors_.

m0th87 · 2025-08-11T09:19:08 1754903948

I had interpreted GP to mean that you don’t slap on NTs for correctness reasons, rather you do it for performance reasons.

orlp · 2025-08-11T09:21:05 1754904065

That is something I can agree with, but I can't in good faith just let "it's just a hint, they don't have anything to do with correctness" stand unchallenged.

m0th87 · 2025-08-11T09:04:09 1754903049

I work on optimizations like this at work, and yes this is largely correct. But do you have a source on this?

> or (more likely) go into just some special small subsection of it reserved for non-temporal writes only.

I hadn’t heard of this before. It looks like older x86 CPUs may have had a dedicated cache.

Tuna-Fish · 2025-08-11T09:36:51 1754905011

IIRC they used the write-combining buffer, which was also a cache.

A common trick is to cache it but put it directly in the last or second-to-last bin in your pseudo-LRU order, so it's in cache like normal but gets evicted quickly when you need to cache a new line in the same set. Other solutions can lead to complicated situations when the user was wrong and the line gets immediately reused by normal instructions, this way it's just in cache like normal and gets promoted to least recently used if you do that.

Sesse__ · 2025-08-11T09:34:39 1754904879

A source on what? The Intel optimization manuals explain what MOVNTQ is for. I don't think they explain in detail how it is implemented behind-the-scenes.

See e.g. https://cdrdv2.intel.com/v1/dl/getContent/671200 chapter 13.5.5:

“The non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) allow data to be moved from the processor’s registers directly into system memory without being also written into the L1, L2, and/or L3 caches. These instructions can be used to prevent cache pollution when operating on data that is going to be modified only once before being stored back into system memory. These instructions operate on data in the general-purpose, MMX, and XMM registers.”

I believe that non-temporal moves basically work similar to memory marked as write-combining; which is explained in 13.1.1: “Writes to the WC memory type are not cached in the typical sense of the word cached. They are retained in an internal write combining buffer (WC buffer) that is separate from the internal L1, L2, and L3 caches and the store buffer. The WC buffer is not snooped and thus does not provide data coherency. Buffering of writes to WC memory is done to allow software a small window of time to supply more modified data to the WC buffer while remaining as non-intrusive to software as possible. The buffering of writes to WC memory also causes data to be collapsed; that is, multiple writes to the same memory location will leave the last data written in the location and the other writes will be lost.”

In the old days (Pentium Pro and the likes), I think there was basically a 4- or 8-way associative cache, and non-temporal loads/stores would go to only one of the sets, so you could only waste 1/4 (or 1/8) on your cache on it at worst.

m0th87 · 2025-08-11T10:08:39 1754906919

I see, thanks. I had assumed incorrectly that NT writes operated the same as NT accesses, where there is no dedicated cache.

m0th87 · 2025-06-09T14:37:09 1749479829

That’s what I hope for, but everything that isn’t bananas expensive with unified memory has very low memory bandwidth. DGX (Digits), Framework Desktop, and non-Ultra Macs are all around 128 gb/s, and will produce single digits tokens per second for larger models: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen...

So there’s a fundamental tradeoff between cost, inference speed, and hostable model size for the foreseeable future.

m0th87 · on Jan 4, 2024

This is almost perfect for my needs (communication and location sharing in areas without phone signal.) But it's missing GPS/GNSS. I'm not sure how niche that use case is, so more broadly I wish Lilygo offered something akin to Raspberry Pi hats -- i.e. some way to add extensions.

m0th87 · on July 12, 2023

I’ve heard Beaglebone Blacks are great (never used them), but have to say the experience with Blue was awful. I had a hardware defect that’s apparently commonplace. It gave me the feeling (maybe unjustified) that there’s a low bar for slapping the Beaglebone label on a product. In contrast, none of my several RPi’s have had any hardware issues, and I’m pretty brutal with them.

m0th87 · on May 10, 2023

I’m definitely interested! Was just looking over this code and wishing it were in rust or C.

m0th87 · on July 19, 2022

That's interesting, I didn't know they did that. But it could be better:

1) This alert doesn't show up at all on mobile.

2) IMO this should either affect the overall score or be displayed in search results as well, which is not currently the case: https://www.glassdoor.com/Search/results.htm?keyword=ZURU

tpoacher · on July 19, 2022

I didn't know this either. I stand corrected.

Having said that, apart from what you also flagged, it's also a bit bland. Like "there's some legal stuff here, exercise caution".

Instead the badge could have symbolic character, an emotive icon ... something. Something that strongly implies "Danger Will Robinson" without explicitly saying so. Something any company would want to avoid risking that thing showing next to their logo, unless it was absolutely necessary.

As it is now, all I'm getting is a bland "huh, something legal must have happened here".

MikeTheGreat · on July 19, 2022

I hear what you're saying, and I think that it would be good to have more clear language, particularly for people new to white-collar employment (which, I imagine, would be a good portion of Glassdoor's audience)(people with more established careers can check Glassdoor AND ask people in their network, whereas people new to the industry may lack/have less of the professional network).

That said, at least here in the US, a carefully-bland legal statement strongly implies what you're looking for. Like, the more bland, the bigger the warning sign :)

silvestrov · on July 19, 2022

I think it should also be below the ZURU logo. Right now it looks like a generic warning for the site (e.g. "Glassdoor goes offline for maintenance in 5 mins")

neon_electro · on July 19, 2022

But it does give you enough to start doing more searching to understand the greater context, no?

rubslopes · on July 19, 2022

Yesterday I accessed the link on mobile and also could not see the warning. I can see it today. Maybe they reacted to your comment?

harnomik · on July 19, 2022

Imo its good enough, doing more visible thing or showing them even worse might make them walk on thin line for another lawsuit

Bombthecat · on July 19, 2022

It's showing on my mobile phone?

And I just checked a company where I know it happened.

No warning.. so no idea why and how they show it..

maneesh · on July 19, 2022

Shows up on mobile for me

driscoll42 · on July 19, 2022

I imagine Glassdoor is reacting to the situation and is going on the "Better to get out part of the solution now than wait to get everything out perfectly" which I would agree with.

m0th87 · on July 2, 2022

For what it’s worth, the person you’re replying to is one of the heavyweights of the rust community.

littlestymaar · on July 4, 2022

Not only is a “heavyweights of the rust community”, he was literally one of the main designer of the language at Mozilla (he's still contributor #6 by commit[1] despite not having worked on it for the past 7 years!)

[1]: https://github.com/rust-lang/rust/graphs/contributors

mojuba · on July 2, 2022

I don't understand, how does that invalidate my response to that person's comment about GC's?

throwaway098234 · on July 2, 2022

By analogy, you are telling a mathematics professor that "2 + 2 = 4".

EDIT: It wasn't me that was downvoting you FYI, but go ahead and down vote me. :)

mojuba · on July 2, 2022

This is becoming a pointless meta, but the parent comment didn't indicate in any way that I was talking to a "professor". The comment said, great that there are more languages with GC. I disagree whoever may say that.

I'm not a "professor" but as a software engineer with 35 years in this industry I can say that new languages should avoid GC's (as in, generational and related) and stick to either ARC or Rust-like compile-time memory management.

Just because the original comment is by, let's say, a prominent figure, doesn't make it right.

P.S. I rarely downvote out of disagreement, only for comment quality.

kaba0 · on July 3, 2022

> I'm not a "professor" but as a software engineer with 35 years in this industry I can say that new languages should avoid GC's

With respect, and much less experience than You, I really don’t think so. I believe the majority of languages are better off being managed. Low-level languages do have their place and I am very happy for Rust that does bring some novel idea to the field. But that lower detail is very much not needed for the majority of applications. Also, ARC is much much slower than a decent GC, so from a performance perspective as well, it would make sense to prefer GCd runtimes.

mojuba · on July 3, 2022

ARC is in fact faster than GC, and even more so on M1/M2 chips and the Swift runtime. There were benchmarks circulating here on Hacker News, unfortunately can't find those posts now. GC requires more memory (normally double the amount of that of an ARC runtime) and is slower even with more memory.

kaba0 · on July 4, 2022

How can more and sync work be faster than a plain old pointer bump and then some asynchronous, asymptotic work done on another thread? Sure, it does take more memory, but in most cases (OpenJDK for example) it is simply a thread local arena allocation where it is literally an integer increase, and an eventual copy of live objects to another region. You couldn’t make it any faster, malloc and ARC are both orders of magnitude slower.

ARC, while in certain cases can elide, will still in most case have to issue atomic increases/decreases that are the slowest thing on modern processors. And on top it doesn’t even solve the problem completely (circular references), mandating a very similar solution than a tracing GC (as ref counting is in fact a form of GC, tracing looking it live edges between objects, ref counting looking at dead edges)

mojuba · on July 4, 2022

I'm not familiar with the details but it is said that Swift's ARC is several times faster than ObjC's, it somehow doesn't always require atomic inc/dec. It also got even better specifically on the M1 processors. As for GC's, with each cycle there's always overhead of going over the same objects that are not disposable.

Someone also conducted tests, for the same tasks and on equivalent CPU's Android requires 30% more energy and 2x RAM compared to iOS. Presumably the culprit is the GC.

kaba0 · on July 4, 2022

That’s a very strong presumably, on a very niche use case of mobile devices.

It is not an accident that on powerful server machines all FAANG companies use managed languages for their critical web services, and there is no change on the horizon.

mojuba · on July 4, 2022

It might be because on the server side they usually don't care about energy or RAM much. The StackOverflow dev team has an interesting blog post somewhere, where they explain that they figured at one point C#'s GC was the bottleneck and they had to do a lot of optimizations at the expense of extra code complexity to minimize the GC overhead.

It is actually quite rare that companies think of their infrastructure costs, it's usually just taken for granted, plus that there aren't many ARC languages around.

Anyway I'm now rewriting one of my server projects from PHP to Swift (on Linux) and there's already a world of difference in terms of performance. For multiple reasons of course, not just ARC vs. GC, but still.

kaba0 · on July 4, 2022

With all due respect, (big) servers care about energy costs a lot, at least as much as mobile phones. By the way, out of the manages languages Java has the lowest energy consumption. RAM takes the same energy whether filled or not.

Just because GC can be a bottleneck doesn’t mean it is bad or that alternatives wouldn’t have an analog bottleneck. Of course one should try to decrease the number of allocations (the same way you have to do in case of RC as well), but there are certain allocation types that simply have to be managed. For those a modern GC is the best choice in most use case.

m0th87 · on Jan 9, 2022

Just whipped up a (naive) solve in rust: https://gist.github.com/ysimonson/01a1dee41b1b5990c30568fd25...

It usually solves this in under 6 guesses. The guessing could be improved; at the moment it's random, but it could select for words with non-repeating letters to narrow down the search space faster.