Hacker Newsnew | past | comments | ask | show | jobs | submit | scottlamb's commentslogin

Opened the link. Saw my own comment. I'm still as confused today as I was then about how this was ever supposed to work—either the quoted code is wrong or there's some weird unstated interface contract. I gather from other issues the maintainers are uninterested in a semver break any time soon. Unsure if they'd accept a performance regression (even if it makes the thing actually work). So I feel stuck. In the meantime, I don't use per-layer filtering. That's a trap.

I've got a whole list of puzzling bugs in the tracing <-> opentelemetry <-> datadog linkage.


Agree, and I would add that a bad abstraction, the wrong abstraction for the problem, and/or an abstraction misused is far worse than no abstraction. That was bugging me in another thread earlier today: <https://news.ycombinator.com/item?id=47350533>

I'm not sure Rust's `async fn` desugaring (which involves a data structure for the state machine) is inlineable. (To be precise: maybe the desugared function can be inlined, but LLVM isn't allowed to change the data structure, so there may be extra setup costs, duplicate `Waker`s, etc.) It's probably true that there is a performance cost. But I agree with the article's point that it's generally insignificant.

For non-async fns, the article already made this point:

> In release mode, with optimizations enabled, the compiler will often inline small extracted functions automatically. The two versions — inline and extracted — can produce identical assembly.


I am fairly doubtful that it makes sense to be using async function calls (or waits) inside of a hot loop in Rust. Pretty much anything you'd do with async in Rust is too expensive to be done in a genuinely hot loop where function call overhead would actually matter.

> The family has proof of residence (which is its own absurdity we won't discuss), and this third party can arbitrarily override that based on a black box argument.

Doesn't the family have a very straightforward libel claim against the third party? That the car was parked elsewhere may be true. "Although you are the owner on record of a house in our district boundaries, your license plate recognition shows that is not the place where you reside" is a statement the family can disprove in court (to a civil standard) and demonstrate has financially damaged them ("her daughter is currently attending a private school 45 minutes away from her home"). If that statement came from the third party (rather than the school district misinterpreting the raw data themselves), the family will win. The straightforward financial damages (let alone anything pain / suffering / punitive damages) likely exceed the company's payment from the school district ("a total of $41,904 for a 36-month-long contract"). It wouldn't take many of these claims before the company becomes insolvent, and good riddance.

I'd also expect them to win a lawsuit against the school district for falsely denying the basic right of education. Perhaps the individual school administrator also for libel. With any luck, a total legal bloodbath that warns any other school districts away from this conduct.


That depends if the third party makes the claim of non-residence and how they make it, and if they disclaim warranty and reliance. I can show you a site with some graphs and data of who is parked where and when and how often; I doubt they're directly saying, "This person definitely doesn't live at this residence, so deny her child entry."

That distinction is what I was getting at with "if that statement came from the third party (rather than the school district misinterpreting the raw data themselves)".

If the company just provided the raw data, they may be in better legal shape. But I'd say either they or the school administrator libeled the family. Maybe both. (Of course, I'm not a lawyer.) Even if the company did provide only the raw data, I wonder if libel is somehow implied in its contracted/intended use. And I'm really hoping for the legal bloodbath outcome, because this is unconscionable.

The family may not have time or money to pursue this, but there are lawyers who work on contingency or even pro bono, including the ACLU.


If they disclaim warranty and reliance that’s relevant to the person they are selling data to, but not to the harmed party.

> The cloud instances have network-attached disks

Props for identifying the issue immediately, but armed with that knowledge, why not redo the benchmark on a different instance type that has local storage? E.g. why not try a `c8id.2xlarge` or `c8id.4xlarge` (which bracket the `c6a.4xlarge`'s cost)?


IMO, there are a lot of smells in this code not addressed in the article. I only skimmed, and still, here are a few:

1. They represent a single room change with this sequence of three operations:

    VectorDiff::Set { index: 3, value: new_room } because of the new “preview”,
    VectorDiff::Remove { index: 3 } to remove the room… immediately followed by
    VectorDiff::PushFront { value: new_room } to insert the room at the top of the Room List.
and I don't see any mention of atomic sequences. I think the room will momentarily disappear from view before being placed into the correct spot. That kind of thing would drive me nuts as a user. It suggests to me this is not the right abstraction.

Also, if you are actually representing the result with a vector, it's O(n), so from a performance perspective, it's not great if the vector can be large: you're shifting everything from [3, n) one spot forward and then one spot back, unnecessarily. If there were a `VectorDiff::Move`, you'd only be shifting 3 elements (the distance moved). Could still be the full length of the list but probably usually not? Something like a `BTreeSet` would make it actually O(lg n).

2. Taking a lock in a comparison function (they call it `Sorter`, but the name is wrong) is a smell for correctness as well as performance. Can the values change mid-sort? Then the result is non-deterministic. (In C++ it's actually undefined behavior to use a non-deterministic comparator. In Rust it's safe but still a bad idea.) You just can't sort values while they're changing, full stop, so inner mutability in a list you're sorting is suss. [edit: and for what? within a client, are you seriously doing heavy mutations on many rooms at once? or is a single lock on all the rooms sufficient?]

3. The sorted adapter just degrades to insertion sort of changes right here: <https://docs.rs/eyeball-im-util/0.10.0/src/eyeball_im_util/v...> and decomposes what could have been an atomic operation (append) into several inserts. Even `Set` does a linear scan and then becomes a (non-atomic again) remove and an insert, because it can change the sort order.

4. The `.sort_by(new_sorter_lexicographic(vec![Box(...), Box(...), Box(...)]))` means that it's doing up to three dynamic dispatches on each comparison. The `new_sorter_lexicographic` is trivial, so inline those instead. And definitely don't take a separate lock on each, yuck, although see above anyway about how you just shouldn't have locks within the vec you're sorting.

I would never use these abstractions.


5. In their "dessert" section, they talk about a problem with sort when the items are shallow clones. It's an example of a broader problem: they put something into an `ObservableVector` but then semantically mutate it via inner mutability (defeating the "observable"). You just can't do that. The sort infinite loop is the tip of the iceberg. Everything relying on the observable aspect is then wrong. The lesson isn't just "jumping on an optimization can lead to a bug"; it's also that abstractions have contracts.

Probably for ints unconditionally. For floats in Sesse__'s example without `-ffast-math`, I count 10 muls, 2 muladds, 1 add. With `-ffast-math`, 1 mul, 3 muladds. <https://godbolt.org/z/dPrbfjzEx>

Isn't the faster approach SIMD [edit: or GPU]? A 1.05x to 1.90x speedup is great. A 16x speedup is better!

They could be orthogonal improvements, but if I were prioritizing, I'd go for SIMD first.

I searched for asin on Intel's intrinsics guide. They have a AVX-512 instrinsic `_mm512_asin_ps` but it says "sequence" rather than single-instruction. Presumably the actual sequence they use is in some header file somewhere, but I don't know off-hand where to look, so I don't know how it compares to a SIMDified version of `fast_asin_cg`.

https://www.intel.com/content/www/us/en/docs/intrinsics-guid...


I don’t know much about raytracing but it’s probably tricky to orchestrate all those asin calls so that the input and output memory is aligned and contiguous. My uneducated intuition is that there’s little regularity as to which pixels will take which branches and will end up requiring which asin calls, but I might be wrong.

I'd expect it to come down to data-oriented design: SoA (structure of arrays) rather than AoS (array of structures).

I skimmed the author's source code, and this is where I'd start: https://github.com/define-private-public/PSRayTracing/blob/8...

Instead of an `_objects`, I might try for a `_spheres`, `_boxes`, etc. (Or just `_lists` still using the virtual dispatch but for each list, rather than each object.) The `asin` seems to be used just for spheres. Within my `Spheres::closest_hit` (note plural), I'd work to SIMDify it. (I'd try to SIMDify the others too of course but apparently not with `asin`.) I think it's doable: https://github.com/define-private-public/PSRayTracing/blob/8...

I don't know much about ray tracers either (having only written a super-naive one back in college) but this is the general technique used to speed up games, I believe. Besides enabling SIMD, it's more cache-efficient and minimizes dispatch overhead.

edit: there's also stuff that you can hoist in this impl. Restructuring as SoA isn't strictly necessary to do that, but it might make it more obvious and natural. As an example, this `ray_dir.length_squared()` is the same for the whole list. You'd notice that when iterating over the spheres. https://github.com/define-private-public/PSRayTracing/blob/8...


It comes down to how "coherent" the rays are, and how much effort (compute) you want to put into sorting them into batches of rays.

With "primary" ray-tracing (i.e. camera rays, rays from surfaces to area lights), it's quite easy to batch them up and run SIMD operations on them.

But once you start doing global illumination, with rays bouncing off surfaces in all directions (and with complex materials, with multiple BSDF lobes, where lobes can be chosen stochastically), you start having to put a LOT of effort into sorting and batching rays such that they all (within a batch) hit the same objects or are going in roughly the same direction.


When I was working on this project, I was trying to restrict myself to the architecture of the original Ray Tracing in One Weekend book series. I am aware that things are not as SIMD friendly and that becomes a major bottle neck. While I am confident that an architectural change could yield a massive performance boost, it's something I don't want to spend my time on.

I think it's also more fun sometimes to take existing systems and to try to optimize them given whatever constraints exist. I've had to do that a lot in my day job already.


I can relate to setting an arbitrary challenge for myself. fwiw, don't know where you draw the line of an architectural change, but I think that switching AoS -> SoA may actually be an approachably-sized mechanical refactor, and then taking advantage of it to SIMDify object lists can be done incrementally.

The value of course is contingent on there being a decent number of objects of a given type in the list rather than just a huge number of rays being sent to a small number of objects; I didn't evaluate that. If it's the other way around, the structure would be better flipped, and I don't know how reasonable that is with bounces (that maybe then aren't all being evaluated against the same objects?).


This tracks with my experience and seems reasonable, yes. I tend to SoA all the things, sometimes to my coworkers’ amusement/annoyance.

The issue is that the algorithm is only half the story. The implementation (e.g. bytecode) is the other.

I've been trying to find ways to make the original graphics renderer of the CGA version of Elite faster as there have been dozens of little optimizations found over the decades since it was written.

I was buoyed by a video of Super Mario 64/Zelda optimizations where it was pointed out that sometimes an approx calculation of a trig value can be quicker than a table lookup depending on the architecture.

Based on that I had conversations with LLMs over what fast trig algorithms there are, but for 8088 you are cooked most of the time on implementing them at speed.


I don't do much float work but I don't think there is a single regular sine instruction only old x87 float stack ones.

I was curious what "sequence" would end up being but my compiler is too old for that intrinsic. Even godbolt didn't help for gcc or clang but it did reveal that icc produced a call https://godbolt.org/z/a3EsKK4aY


If you click libraries on godbolt, it's pulling in a bunch, including multiple SIMD libraries. You might have to fiddle with the libraries or build locally.

There are lots of reasons to read through source code you never edit or recompile: security audits, interoperability, learning from their techniques, etc. And I think many of those same ideas apply to seeing the training data of a LLM. It will help you understand quickly (without as much experimentation) what it's likely to be good at, where its biases may be, where some kind of supplement (transfer learning? RAG? whatever) might be needed. And the why.

> security audits

If you are unable to run the multimillion training, then any kind of security audit of the training code is absolutely meaningless, because you have no way to verify that the weights were actually produced by this code.

Also, the analogy with source code/binary code fails really fast, considering that model training process is non-deterministic, so even if are able to run the training, then you get different weights than those that were released by the model developers, then... then what?


I probably shouldn't have led with that example because yeah, reproducible (and cheap) builds would be best for security audits. But I wouldn't say it's absolutely meaningless. At least it can guide your experimentation, and if results start differing radically from what you'd expect from the training data, that raises interesting questions.

If you're going through the effort to be open source you can probably set up fixed batch sizes and deterministic combination of batches without too much more effort. At least I hope it's not super hard.

> considering that model training process is non-deterministic

Why would it have to be? Just use PRNG with published seeds and then anyone can reproduce it.


I have zero actual experience in training models, but in general, when parallelizing work: there can be fundamental nondeterminism (e.g., some race conditions) that is tolerated, whose recording/reproduction can be prohibitive performance-wise.

Agree, this feels like a distinction that needs formalising...

Passive transparency: training data, technical report that tells you what the model learned and why it behaves the way it does. Useful for auditing, AI safety, interoperability.

Active transparency: being able to actually reproduce and augment the model. For that you need the training stack, curriculum, loss weighting decisions, hyperparameter search logs, synthetic data pipeline, RLHF/RLAIF methodology, reward model architecture, what behaviours were targeted and how success was measured, unpublished evals, known failure modes. The list goes on!


I'd also add training checkpoints to the list for active transparency. I think the Olmo models do a decent job, but it would be cool to see it for bigger models and for ones that are closer to state-of-the-art in terms of both architecture and algorithms.

Security audits, etc, are possible because binary code closely implements what the source code says.

In this case, you have no idea what the weights are going to "do", from looking at the source materials --- the training data and algorithm --- without running the training on the data.


Bit of a thread-jack, but has anyone reverse-engineered the UniFi camera adoption protocol? I was surprised to discover that, unlike the APs, the cameras can't be adopted through the Unifi Software Controller that you can just throw into a Docker container. You're supposed to do that through their NVR appliance (Unifi Protect). I was hoping to just use them with my open-source NVR. They seem to be about the only option for a reasonably priced, larger image sensor camera that is not made by a company participating in the Uyghur genocide (Hikvision, Dahua, Univision, Huawei).

I found https://community.home-assistant.io/t/unifi-cameras-without-... in which someone sshed in, edited some config files by hand, and got streaming to work for the current boot. One could probably take that a bit further and, you know, save the config to flash. But it'd be nice to just do it the way their controller does and know it's going to work for future firmware updates and such.

They also stream by connecting to your NVR with modified version of flv, rather than you connecting to them with RTSP, which is annoying but can be worked around.


If you want to bypass Unifi Protect, what sort of "adoption" are you thinking of? AFAIK, "adoption" is a Unifi Protect thing. Otherwise it's just a device on your network that you can configure Frigate etc. to connect to and pull streams.

Changing the credentials for web access (firmware upgrade, janky jpeg-based live stream, etc.) and ssh access from the default ubnt:ubnt. Surprisingly, I don't see a page for this in the web UI, and the `password` command in the CLI is ineffective. I haven't looked around the filesystem.

Setting where it sends the video stream.

Configuring video settings, zone detections, etc. I found a video going through them here: <https://youtu.be/URam5XSFzuM?si=8WK4Yghh9kidZe6c&t=279> Just about any other camera lets you change this stuff through the camera's built-in web interface and/or ONVIF. Ubiquitis apparently don't.

> Otherwise it's just a device on your network that you can configure Frigate etc. to connect to and pull streams.

No, it connects to you!


You want to change the credentials of the camera, so Frigate can log into it while it is connected to your Unifi network?

I did that for 5 different cameras yesterday, you're saying Unifi's cameras doesn't allow user management? That sucks!

> No, it connects to you!

I thought frigate connects to the camera's RTSP stream (maybe with ONVIF in the mix)?


Unifi cams don't stream RTSP, they stream FLV v1 (FlashVideo) on 3 streams over plain TCP on port 7550, one per quality channel. And yes, they stream that TO the NVR who adopted them only... then the NVR recodes and sends RTSP (configurable).

For the adoption stage, UniFi cameras broadcast on UDP port 10001 using a proprietary TLV (Type-Length-Value) protocol. The Protect console listens on this port and picks up new cameras immediately. 4 bytes `\x01\x00\x00\x00` sent as UDP broadcast to `255.255.255.255:10001`

The response then contains these fields:

  | Hex Code | Field | Data |
  |----------|-------|------|
  | `0x01` | MAC Address | 6-byte hardware address |
  | `0x02` | MAC + IP | Combined MAC and IPv4 address |
  | `0x03` | Firmware Version | String |
  | `0x0B` | Hostname | String |
  | `0x0C` | Platform (Short Model) | String |
  | `0x0A` | Uptime | 64-bit integer |
  | `0x13` | Serial | String |
  | `0x14` | Model (Full) | String |
  | `0x17` | Is Default | Boolean (adopted vs unmanaged) |
After discovery, the Protect console: 1. Connects to the camera via SSH (default credentials) 2. Configures the Inform URL (TCP 8080) 3. Camera registers with the controller

So conceivably at step 2 you could use your own modified URL to point to your own NVR and then grab the FLV streams from there.


Thanks!

> 1. Connects to the camera via SSH (default credentials) 2. Configures the Inform URL (TCP 8080)

Not what I expected but okay. Looks like there's a `set-inform` command. It looks like it opens a TLS connection, doesn't check the certificate, and tries to opens a websocket:

    GET /camera/1.0/ws HTTP/1.1
    Pragma: no-cache
    Cache-Control: no-cache
    Host: ...
    Origin: http://ws_camera_proto_secure_transfer
    Upgrade: websocket
    Connection: close, Upgrade
    Sec-WebSocket-Key: ...
    Sec-WebSocket-Protocol: secure_transfer
    Sec-WebSocket-Version: 13
    Camera-MAC: ...
    Camera-IP: ...
    Camera-Model: 0xa601
    Camera-Firmware: 5.0.83
    Device-ID: ...
    Adopted: false
    x-guid: be9d8e45-62a8-ae84-8b23-71723c7decaf
I might try accepting the websocket but I have a feeling I'll get stuck about there without knowing what the server is supposed to send over it. I'm debating if I'm willing to buy a Unifi Protect device or not.

...then again I did a search for a couple strings and ran across https://github.com/keshavdv/unifi-cam-proxy . It's the opposite direction of what I want (makes a standard camera work with Unifi Protect) but maybe contains the protocol details I'm looking for...


> ...then again I did a search for a couple strings and ran across https://github.com/keshavdv/unifi-cam-proxy . It's the opposite direction of what I want (makes a standard camera work with Unifi Protect) but maybe contains the protocol details I'm looking for...

Actually, yes. I got lazy and just asked Claude Code to write a server, using that as a reference...and it worked. It was able to change the password and have it start streaming flv video. Not exactly a production-quality implementation but as a proof-of-concept it's quite successful.


There you go! I wrote a proxy server to deal with Unifi cameras and also dewarp their 360 camera streams... and used Claude Code to reverse-engineer most of what's going on. Sniff the entire network traffic between their NVRs and cams via Wireshark/TShark, grabbing the NVR's web socket streams, and also had it write a custom METAL shader pipeline native to Apple silicon to replace ffmpeg which was way too slow to deal with 5K 360 streams and dewarp them. All in a matter of hours. Amazing times ;)

I actually just registered here to comment the first time after lurking for years ;) I had the displeasure to upgrade from a broken g3 flex to a g5 flex, and then finding out they cut out the rtsp stream functionality for unknown (probably business) reasons. I dont plan to buy a protect appliance, and frigate should still handle my nvr stuff, so this comment thread comes really handy right now. Is there any possibility for you to publish this proxy to github or similar?

If I'm understanding correctly moonlighter's proxy speaks Ubiquiti's protocol on both ends, "just" altering the video stream along the way. Which is pretty cool but not what you're looking for.

I'd like to make a production-quality version of what you're looking for: acts as an RTSP server, adopts Ubiquiti cameras. As a single-binary thing that you could run anywhere, maybe even on the camera itself. (self-adoption? emancipation, as in "emancipated minor"?) But it'll take me a bit. My RTSP library right now is client-only, so it needs a bit of expansion to do this. Server support is in my TODO list along with several other changes. https://github.com/scottlamb/retina/issues/89


> I thought frigate connects to the camera's RTSP stream (maybe with ONVIF in the mix)?

Right, that's the expectation of Frigate, my own Moonfire NVR, and basically every other NVR out there. Ubiquiti decided to think different.


Well thanks for the heads-up to avoid their cameras.

I hear you, but on the other hand, I'd take a bit of interop pain over supporting genocide any day. It looks like with the hints from moonlighter and from https://github.com/keshavdv/unifi-cam-proxy I'll be able to get this to work.

Honestly it might be less work than some other cameras that (allegedly) speak RTSP. You'd be shocked how low-quality these implementations are. Never advancing timestamps, setting the RTP MARK bit arbitrarily, writing uninitialized memory framed as audio packets (on cameras that don't have microphones), closing file descriptors then writing data to them anyway (and so having it show up on the next accepted connection to be assigned that fd even pre-auth), etc.


> writing uninitialized memory framed as audio packets..., closing file descriptors then writing data to them anyway...

Thanks for the reassurance that I'm not such an incompetent dev as I feel.

Funny how companies tend to be competent at either devices or software, and rarely both. This sounds vaguely like the automotive industry.


FWIW, there are multiple other camera brands who don't manufacture in Xinjiang (or China for that matter), like Axis or Vivotek.

Arecont Vision is another good brand. I’ve got a friend that got a bunch of Arecont domes stupid cheap and they have amusing modes like “casino mode” (guaranteed 30fps recording for various gaming regulations).

https://www.arecontvision.com/news/arecont-vision-adds-casin...

(eBay deal sniping sometimes gets you some funny deals but YMMV — I picked up an Axis Q1700-LE license plate camera for under $200 for some experiments.)


I've eyed Axis cameras but they're pricy (particularly for large sensors) and don't seem to come in the turret form factor I prefer. E.g. the AXIS M4317-PLVE is a dome and $717 at newegg. Kinda a weird model actually—180/360 degree view which isn't what I'd want. But I haven't found anything that is at a price I'd like to pay myself, let alone recommend to others for home use.

Vivotek's a bit more reasonable but still. The (brand new?) Vivotek VIT04A-W is the closest I've found—1/1.8" sensor, 4MP, turret, $535 on jmac.com.

These Ubiquiti models seem really nice in terms of hardware specs and and very reasonably priced. $200 for a 1/1.8" sensor turret, $479 for a 1/1.2" sensor turret with extra AI features. Same general price bracket as Dahua, I think.


Check secondhand. I'm finding quad sensor cams that retailed for $2k purchased off craigslist for $100. Wipe em, flash new firmware, deny all internet access at the network level, and you're good to go

Here's that YouTube link without the creepy Google tracking component:

https://youtu.be/URam5XSFzuM?t=279


Im using a g3 flex with frigate without use of unifi protect, works fine.

I think newer models like g4 flex dont support this thou.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: