More

avallach · 2025-09-05T07:58:02 1757059082

Maybe I'm misunderstanding, but after reading it sounds to me not like "io_uring is faster than mmap" but "raid0 with 8 SSDs has more throughput than 3 channel DRAM".

nine_k · 2025-09-05T08:23:51 1757060631

The title has been edited incorrectly. The original page title is "Memory is slow, Disk is fast", and it states exactly what you say: an NVMe RAID can offer more bandwidth than RAM.

kentonv · 2025-09-05T13:17:45 1757078265

No, the title edit is fair, where the original title is misleading.

Obviously, no matter how you read from disk, it has to go through RAM. Disk bandwidth cannot exceed memory bandwidth.*

But what the article actually tests is a program that uses mmap() to read from page cache, vs. a program that uses io_uring to read directly from disk (with O_DIRECT). You'd think the mmap() program would win, because the data in page cache is already in memory, whereas the io_uring program is explicitly skipping cache and pulling from disk.

However, the io_uring program uses 6 threads to pull from disk, which then feed into one thread that sequentially processes the data. Whereas the program using mmap() uses a single thread for everything. And even though the mmap() is pulling from page cache, that single thread still has to get interrupted by page faults as it reads, because the kernel does not proactively map the pages from cache even if they are available (unless, you know, you tell it to, with madvise() etc., but the test did not). So the mmap() test has one thread that has to keep switching between kernel and userspace and, surprise, that is not as fast as a thread which just stays in userspace while 6 other threads feed it data.

To be fair, the article says all this, if you read it. Other than the title being cheeky it's not hiding anything.

* OK, the article does mention that there exists CPUs which can do I/O directly into L3 cache which could theoretically beat memory bandwidth, but this is not actually something that is tested in the article.

avallach · 2025-08-18T09:08:18 1755508098

It's even weirder than that: mention of the shell GUI is just the intro, later they proceed with implementing a facsimile of the Windows NT kernel in... TypeScript. Even for purely self-educational purposes, this pairing seems to be very counterproductive.

I agree that the title is misleading and should be changed. I also expected LG webOS.

avallach · 2025-08-05T16:41:05 1754412065

@viraptor above mentions that they actually do try first with an explicit perplexity-agent: https://news.ycombinator.com/item?id=44797682 . So there's no ambiguity. The worst they could accuse Cloudflare, is that they don't give website owners an easy way to only block scrapers while allowing user-driven agents (do they?).

avallach · 2025-08-05T07:57:48 1754380668

Cloudflare did explain a proper solution: "Separate bots for separate activities". E.g. here: one bot for scraping/indexing, and one for non-persistent user-driven retrieval.

Website owners have a right to block both if they wish. Isn't it obvious that bypassing a bot block is a violation of the owners right to decide whom to admit?

Perplexity's almost seems to believe that "robots.txt was only made for scraping bots, so if our bot is not scraping, it's fair for us to ignore it and bypass the enforcement". And their core business is a bot, so they really should have known better.

viraptor · 2025-08-05T13:25:23 1754400323

They're already doing that https://docs.perplexity.ai/guides/bots There's PerplexityBot and Perplexity‑User.

avallach · 2025-08-05T13:44:05 1754401445

And then once they see that the website operator blocked the perplexity-user, apparently instead of respecting that, they not only ignore robots.txt, but actively try to bypass the security measures established with the explicit purpose of limiting their access. If this was about bypassing DRM rather than AI-WAF, it would be plainly illegal.

To me this invalidates their whole claim that Cloudflare fails to tell the difference between scraper and user-driven agent. Instead, distinguishing them is trivial, and the block is intentional.

skeledrew · 2025-08-05T15:19:19 1754407159

I use Perplexity regularly for research because it does a good job accessing, preprocessing and citing relevant resources. Which do you think is better: the service respects my desire for it to do a good job and ignore site owners blocking agent access because "don't like automated agents", or the service respects said site owners' - what I consider unreasonable - desires and not do a good job for me? Expand to the inevitably increasing LLM-for-research user base.

avallach · 2025-08-05T16:29:16 1754411356

I can totally see your point. It's a bit like that fight of news agencies against the free snippets and aggregations on 3rd party websites. The Internet is supposed to be open after all.

But it also feels like essentially "pirating" the webpages while erasing their brand. Maybe it's even a tolerable transitive situation, but you can't even argue it's beneficial in the same way as game piracy could be according to some. In the long term, we need an incentive for the content creators to willingly allow such processing. Otherwise, a lot of high quality content will eventually become members-only with DRM-like anti agent protections.

The incentive doesn't have to be monetary. I could for example imagine some website owners allow AI agents that commit to upfront verbatim repeating some sort of mandatory headers/messages/acknowledgements from the content authors, before copying or summarizing, and are known to stick to this commitment.

You can also bypass the problem already now by accessing and copying the content manually, and then putting it in the context of a tool like NotebookLM. Nobody's hurt, because you have actually seen the source by yourself, and that's all the website owners can reasonably demand.

TL;DR: why even post quality content in open if the audience won't see your ads, your donation button, or even your name. What do you think?

viraptor · 2025-08-05T20:58:47 1754427527

This kind of makes sense for chatgpt and others. But perplexity links to your content directly. I end up clicking more perplexity sources than search results in practice. I don't know how well that generalises, but the traffic is not just going away.

skeledrew · 2025-08-05T17:21:25 1754414485

> In the long term, we need an incentive for the content creators to willingly allow such processing. Otherwise, a lot of high quality content will eventually become members-only with DRM-like anti agent protections.

I partially agree with this. Yes, some incentive is OK, for some cases. I wouldn't be OK with a mandatory header/message for example showing up in my output, unless there's some very direct relevance to the content. But there could be some kind of tipper markup/code embedded in the site metadata that my agent abstracts away as content rating feedback options, and tips automatically made on my behalf if I have it configured and selected the "useful" option. Of course source citation should also be a mandatory part of the output, for that branding and also in case there's desire to go beyond the output.

However, there will also always be content authors out there who share quality content freely with no expectation of any kind of return. The "problem" is that such content usually isn't SEO-optimized, and so likely won't be in the top results. There will be little lost if those optimizing for return start blocking their content as they'll also be automatically deranked, by virtue of content access issues, and the non-optimized content will then rise to the surface.

TL;DR: suggested configurable creator-tipping system abstracted behind feedback options, and the likely case that those who block access will be deranked in favor of those maintaining open access.

skeledrew · 2025-08-05T15:23:04 1754407384

> bypassing a bot block is a violation of the owners right to decide whom to admit?

There is only a violation if the bot finds a way around a login block. Same for human. But whatever is on the public web is... public. For all.

hunter2_ · 2025-08-05T20:26:54 1754425614

So it's ok to block someone "because you didn't include a session token I gave you in exchange for knowing the password" but it's not ok to block someone "because you didn't stick to manually-operated user agents as I told you via robots.txt"? What about not letting someone play level 42 "because you didn't complete level 41"?

A web server providing a response to your request is akin to a restaurant server doing the same. Except for specific situations related to civil rights, they are free to not deal with you for any reason.

skeledrew · 2025-08-05T21:39:32 1754429972

Typically when something is behind a login, it denotes a private space intended for a particular set of persons given explicit access. It's senseless to block people from using agents if the same people would otherwise have access, unless there is an abuse of that access, ie. action which is to the detriment of the space. And though some of that does happen, it obviously isn't the full story. I have a Perplexica instance running locally that I sometimes use (but often don't as Perplexity does a much better job). Should that also be blocked?

Hmm maybe a civil case could be potentially made here too, re disability. By blocking LLM use, sites are reducing the ability of select users to reasonably interact with the content. Just could become a thing in a few years if this nonsense continues.

avallach · on Jan 8, 2025

The filename is 'poland.gif', I wonder what's the message there.

racenis · on Jan 8, 2025

It's the same GIF that's used in the polish milk soup song video.

avallach · on June 1, 2024

The post title is misleading. The algorithm did not leak, only the documentation listing all the signals that can possibly be used as inputs for that algorithm. It doesn't reveal which ones are actually used and how.

avallach · on May 25, 2024

The magic lies in tessellation. Tessellation is an efficient GPU process of heavily subdividing your mesh, so that displacement maps can add visible geometric details afterwards. And because it's dynamic you can selectively apply it only to the meshes that are close to the camera. These are reasons why it's better than subdividing the mesh at preprocessing stage and "baking in " the displacement into vertex positions.

Etherlord87 · on May 25, 2024

Good point! It's not just the LOD, it may be also the fact that the parallelization makes GPUs a better fit for subdivision than CPUs, and surely there's a matter of connection bandwith to the GPUs: the vertex coordinates, as well as lots of other vertex-specific data, needs only to be sent for the base vertices, the new vertices get their new values from interpolating old values, and the textures like displacement map control the difference from interpolated value to desired value. Of course a texture would have been just some weird compromise of resolution of the sent data, except you don't have to provide a texture for every attribute, and more importantly, such a texture might be static (e.g. if encoded in normal space it may work throughout an animation of an object).

avallach · on May 25, 2024

No, at least not in the "automatic" way the Nvidia RTX Remix does. You would not only need to generate the displacement maps for textures, but the most importantly port the game to this new rendering engine. It's an extremely complicated task if done by reverse engineering and hacking the executable, without ability to read and recompile the source code.

tcsenpai · on May 25, 2024

Perfectly clear. Who knows, maybe in the future this could be a step for a mechanism that does so; anyway, impressive results in itself! Kudos!

avallach · on March 26, 2024

In the Android app I consistently get "Downloading model file" stuck at exactly -60830200% . Tried clearing data and caches and changing the connection.

herogary · on March 26, 2024

Thank you for your feedback. Could you please email us the information of your phone model and system version? We will investigate promptly. In the meantime, you can try exiting the program and re-entering to see if that helps. Please also check your network connection.

avallach · on Dec 28, 2023

In various trains, over 20 versions of the compiled firmware with unique variants of the locking algorithm were found. And to make matters worse, the trains were found to have something that appears to be a GSM-to-CAN bridge. It isn't reverse engineered yet but AFAIK shouldn't be there and in the worst case may be a remote control backdoor.

Maxious · on Dec 28, 2023

Both these points were clarified in the audience questions - it's a UDP to CAN bridge so the Linux based passenger information system knows the state of the train. And only the Linux system is GSM connected (to get network announcements etc.), none of the firmwares were installed remotely, only when trains were sent back to the manufacturer physically.