More

NullCascade · 2025-10-01T21:25:19 1759353919

The Toit language and runtime is also ESP32-focused:

NullCascade · 2025-09-25T03:42:17 1758771737

It's just because a very significant amount of UK satellites are OneWeb Internet satellites.

NullCascade · 2025-09-23T23:10:01 1758669001

Elon Musk's takeover of X is already a good example of what happens with unlimited free speech and unlimited reach.

Neo-nazis and white nationalists went from their 3-4 replies per thread forums, 4chan posts, and Telegram channels, to now regularly reaching millions of people and getting tens of thousands of likes.

As a Danish person I remember how American media in the 2010s and early 2020s used to shame Denmark for being very right-wing on immigration. The average US immigration politics thread on X is worse than anything I have ever seen in Danish political discussions.

NullCascade · 2025-09-23T07:43:21 1758613401

It is actually illegal for EU companies and organizations to boycott Iran.

https://en.wikipedia.org/wiki/Blocking_statute

The EU has since the 1990s gone out of its way to support countries like Iran and Cuba against US/Israeli economic sanctions.

somezero · 2025-09-23T08:03:15 1758614595

This is not just a funny joke, but also timely given the recent sanctions.

https://en.wikipedia.org/wiki/International_sanctions_agains...

NullCascade · 2025-09-15T13:30:23 1757943023

One Swedish-Kurdish man in Iran who is working for the Iranian government is using Telegram/Signal and Monero to intentionally cause carnage in the streets of Sweden and has been attempting to expand to Denmark.

But instead of going directly after this man our tech inept governments are trying to do the mathematically impossible.

modo_mario · 2025-09-15T14:17:05 1757945825

One would wonder why this doesn't lead into inquiry into the obvious other things things that made one singular such man capable of causing such disturbance.

rockemsockem · 2025-09-15T13:34:56 1757943296

Right, to catch a predator managed to catch people without needing to backdoor stuff. These people are just lazy and incompetent, potentially intentionally.

tgv · 2025-09-15T13:40:36 1757943636

How would they go about that without violating other laws/rights? The state cannot act on rumors alone.

Yokolos · 2025-09-15T13:41:13 1757943673

Even worse, they're setting up Europe for a fascist takeover. To protect us. Just what?

You'd think we never had the Third Reich, Nazis or WW2 with how they're behaving.

cuntymaccunto · 2025-09-15T14:45:44 1757947544

[flagged]

Yokolos · 2025-09-15T15:06:27 1757948787

The US which has already turned into an authoritarian fascist dictatorship. Right. What a bastion of freedom there, lmao. "Don't tread on me" has literally turned into "Tread on me, daddy".

anthk · 2025-09-15T14:55:34 1757948134

As if your corporatocracy wasn't any better.

NullCascade · 2025-09-07T16:22:58 1757262178

Troy Hunt and HIBP is a good example in the other direction but Hunt has also been burned plenty of times by serverless.

https://www.troyhunt.com/closer-to-the-edge-hyperscaling-hav...

NullCascade · 2025-09-03T13:09:27 1756904967

What is currently the best model (or multi-model process) to go from text-to-3D-asset?

Ideally based on FOSS models.

neutronicus · 2025-09-03T13:17:41 1756905461

Piggybacking ... what about text-to-sprite-sheet? Or even text-and-single-source-image-to-sprite-sheet?

nzach · 2025-09-03T14:07:55 1756908475

I've never done this task specifically, but I imagine the new google model (Gemini 2.5 Flash Image) is what you want. It has really good character consistency, so you should be able to paste a single sprite and ask it to generate the rest.

SXX · 2025-09-03T15:47:49 1756914469

This is possible, but mostly for generating assets in the somewhat same style that you already have. Problem is that AI models are not good at tracking state of multiple entities on one image including 2.5 Flash Image.

If you actually want something consistent you should really generate images one by one and provide extensive description of what you expect to see on each frame

And if you want to make something like animation it's only really possible if you basically generate thousand of "garbage" images and then edit together what fits.

NullCascade · 2025-09-03T13:07:12 1756904832

Maybe private Chinese AI labs consider EU/UK regulators a bigger threat than US anti-China hawks.

NullCascade · 2025-09-02T11:15:01 1756811701

>Use the AGPL for server code, the GPL for tools and the LGPL for libraries.

If we are doing it for ideological reasons why not go all in and use AGPLv3 for everything, including libraries? It also has the added benefit that big corporations will not use your code.

NullCascade · 2025-08-18T15:14:20 1755530060

What is the actual process of identifying hotspots caused suboptimal compiler generated assembly?

Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly?

astrange · 2025-08-18T18:48:16 1755542896

So the main issues here are not what people think they are. They generally aren't "suboptimal assembly", at least not what you can reasonably expect out of a C compiler.

The factors are something like:

- specialization: there's already a decent plain-C implementation of the loop, asm/SIMD versions are added on for specific hardware platforms. And different platforms have different SIMD features, so it's hard to generalize them.

- predictability: users have different compiler versions, so even if there is a good one out there not everyone is going to use it.

- optimization difficulties: C's memory model specifically makes optimization difficult here because video is `char *` and `char *` aliases everything. Also, the two kinds of features compilers add for this (intrinsics and autovectorization) can fight each other and make things worse than nothing.

- taste: you could imagine a better portable language for writing SIMD in, but C isn't it. And on Intel C with intrinsics definitely isn't it, because their stuff was invented by Microsoft, who were famous for having absolutely no aesthetic taste in anything. The assembly is /more/ readable than C would be because it'd all be function calls with names like `_mm_movemask_epi8`.

derf_ · 2025-08-18T23:51:41 1755561101

One time I spent a week carefully rewriting all of the SIMD asm in libtheora, really pulling out all of the stops to go after every last cycle [0], and managed to squeeze out 1% faster total decoder performance. Then I spent a day reorganizing some structs in the C code and got 7%. I think about that a lot when I decide what optimizations to go after.

[0] https://gitlab.xiph.org/xiph/theora/-/blob/main/lib/x86/mmxl... is an example of what we are talking about here.

saagarjha · 2025-08-19T08:28:34 1755592114

Unfortunately modern processors do not work how most people think they do. Optimizing for less work for a nebulous idea of what "work" is generally loses to bad memory access patterns or just using better instructions that seem most expensive if you look at them superficially.

astrange · 2025-08-21T21:15:06 1755810906

If you're important enough they'll design the next processor to run your code better anyway.

(Or at least add new features specifically for you to adopt.)

magicalhippo · 2025-08-19T06:46:15 1755585975

It can be sobering to consider how many instructions a modern CPU can execute in case of a cache miss.

In the timespan of a L1 miss, the CPU could execute several dozen instructions assuming a L2 hit, hundreds if it needs to go to L3.

No wonder optimizing memory access can work wonders.

ack_complete · 2025-08-19T00:37:16 1755563836

> And on Intel C with intrinsics definitely isn't it, because their stuff was invented by Microsoft, who were famous for having absolutely no aesthetic taste in anything.

Wouldn't Intel be the one defining the intrinsics? They're referenced from the ISA manuals, and the Intel Intrinsics Guide regularly references intrinsics like _allow_cpu_features() that are only supported by the Intel compiler and aren't implemented in MSVC.

astrange · 2025-08-19T03:22:59 1755573779

The _emm _epi8 stuff is Hungarian notation, which is from Microsoft.

ack_complete · 2025-08-19T03:31:54 1755574314

Uh, no, that's standard practice for disambiguating the intrinsic operations for different data types without overloading support. ARM does the same thing with their vector intrinsics, such as vaddq_u8(), vaddq_s16(), etc.

duped · 2025-08-18T16:34:33 1755534873

Normally you spin up a tool like vtune or uprof to analyze your benchmark hotspots at the ISA level. No idea about tools like that for ARM.

> Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly?

IME, not really. I've done a fair bit of hand-written assembly and it exclusively comes up when dealing with architecture-specific problems - for everything else you can just write C (unless you hit one of the edge cases where C semantics don't allow you to express something in C, but those are rare).

For example: C and C++ compilers are really, really good at writing optimized code in general. Where they tend to be worse are things like vectorized code which requires you to redesign algorithms such that they can use fast vector instructions, and even then, you'll have to resort to compiler intrinsics to use the instructions at all, and even then, compiler intrinsics can lead to some bad codegen. So your code winds up being non-portable, looks like assembly, and has some overhead just because of what the compiler emits (and can't optimize). So you wind up just writing it in asm anyway, and get smarter about things the compiler worries about like register allocation and out-of-order instructions.

But the real problem once you get into this domain is that you simply cannot tell at a glance whether hand written assembly is "better" (insert your metric for "better here) than what the compiler emits. You must measure and benchmark, and those benchmarks have to be meaningful.

Sesse__ · 2025-08-18T18:00:39 1755540039

> Normally you spin up a tool like vtune or uprof to analyze your benchmark hotspots at the ISA level. No idea about tools like that for ARM.

perf is included with the Linux kernel, and works with a fair amount of architectures (including Arm).

godelski · 2025-08-18T19:58:28 1755547108

You may still need to install linux-tools to get the perf command.

Sesse__ · 2025-08-18T20:33:12 1755549192

It's included with the kernel as distributed by upstream. Your distribution may choose to split out parts of it into other binary packages.

godelski · 2025-08-18T20:38:04 1755549484

I'm not disagreeing, I just wanted to add so others might know why they can't just run the command.

duped · 2025-08-18T18:29:59 1755541799

perf doesn't give you instruction level profiling, does it? I thought the traces were mostly at the symbol level

Sesse__ · 2025-08-18T19:05:24 1755543924

Hit enter on the symbol, and you get instruction-level profiles. Or use perf annotate explicitly. (The profiles are inherently instruction-level, but the default perf report view aggregates them into function-level for ease of viewing.)

jcranmer · 2025-08-18T18:03:07 1755540187

> Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly?

Not really. There are a couple of reasons to reach for handwritten assembly, and in every case, IR is just not the right choice:

If your goal is to ensure vector code, your first choice is to try slapping explicit vectorize-me pragmas onto the loop. If that fails, your next effort is either to use generic or arch-specific vector intrinsics (or jump to something like ISPC, a language for writing SIMT-like vector code). You don't really gain anything in this use case from jumping to IR, since the intrinsics will satisfy your code.

If your goal is to work around compiler suboptimality in register allocation or instruction selection... well, trying to write it in IR gives the compiler a very high likelihood of simply recanonicalizing the exact sequence you wrote to the same sequence the original code would have produced for no actual difference in code. Compiler IR doesn't add anything to the code; it just creates an extra layer that uses an unstable and harder-to-use interface for writing code. To produce the best handwritten version of assembly in these cases, you have to go straight to writing the assembly you wanted anyways.

astrange · 2025-08-18T18:49:39 1755542979

Loop vectorization doesn't work for ffmpeg's needs because the kernels are too small and specialized. It works better for scientific/numeric computing.

You could invent a DSL for writing the kernels in… but they did, it's x86inc.asm. I agree ispc is close to something that could work.