Depending on one's skillset, you could use a dcc tool like Blender + three.js to make creation of these visuals and interactions much simpler. Have a look at gltfjsx + react-three-fiber [1] combination, which themselves are abstractions over vanilla three.js.
With that said, the raw webGL approach here is arguably more educational, so goal achieved I think!
Cool example, but all r3f is doing here is just providing the threejs camera, controls and the text with emoji, the watch itself is loaded as a .glb file, where I'd assume most people would be interested in learning about.
Yeah, I think exporting a scene from blender as glft/glb, and then using these tools to bring your exported 3D file to the web, is one of the more approachable abstractions.
The reason you'd use gltfjsx (which that example doesn't) is to have fine grained controls for every node in the scene graph. In the case of the watch, this would map to having a component for each mesh or gear, which can be controlled with mechanics/physics.
With that said, the raw webGL approach here is arguably more educational, so goal achieved I think!
[1] https://docs.pmnd.rs/react-three-fiber/getting-started/examp...
Edit: there's actually a 50 LOC watch example with r3f: https://codesandbox.io/s/bouncy-watch-qyz5r