When you think about it's actually pretty ridiculous that this difficult in the first place, in the internet era. In principle all you really need is a standard way to tag each keyframe with an NTP-synced timestamp, and any camera with an internet connection would be able to achieve this effortlessly.
NTP time skew can be on the order of 100ms. If the audio/video is desynced by 30ms, it can definitely be noticeable. Some people are more sensitive than others, and it also depends on the particulars of the content.
Edit: Another commenter points out that in some situations, it's even necessary to align the time that the frames start, which necessarily requires sub-frame precision.
It's almost never going to be that bad unless you're going out over a terrible internet connection. Worst case you could have a way to configure one camera to run its own NTP server and have the others sync to it over LAN. That would get you sub-frame (<1ms) precision easily.
this drove me insane back in ~2004/2005 as a longtime-NTP-loving sysadmin who was doing tech for an indie film company. they were ingesting tons of multicam MiniDV into FCP and their timecode discipline was NOT good. there was no simple answer to this back then, pure manual work. truly is kind of bonkers it has taken until ~2024 to solve this (kinda).
Is NTP precise enough for this? I thought timestamps are usually in milliseconds, but audio sampling rates can be fractions of a millisecond? If you're off by a little, won't it create weird echoes or interference in the sound ?
But that's actually part of the signal (useful information), isn't it, like for stereo or spatial recording setups? Presumably each performer or instrument's delay is a function of its distance to different microphones.
vs NTP-introduced errors, which are just noise and not part of the intentional recording?
Oh c’mon there are people who have thought about it for more than the ten seconds you have and it’s not feasible and not effortless - you want to give every cam an internet gateway??