This seems like it’s in response to the congressional testimony last week to clarify some things about their remote assistance systems.
It’s interesting that they only have 70 people for this - I can understand the outside the US ones for nighttime assistance and they need to be able to scale for other countries too in the future.
What I’m still wondering is what is limiting the scaling for Waymo - just cars or also the sensor systems? They’ve had their new test vehicles in SF for a while but I still think that most customers only get their Jaguars right now (and still limited on highway driving to specific customers in the Bay Area).
> What I’m still wondering is what is limiting the scaling for Waymo
I'm also very curious about this. Probably a mix of many things: training the driver to handle tricky conditions better (e.g. flooded roads), getting more Ohai vehicles imported and configured, configuring the backlog of Jaguar iPace and trucking them out to new markets, mapping roads and non-customer testing in new markets, getting regulatory approval/cooperation in other market (e.g. DC), finding depot space, hiring maintenance team, etc.
This GitHub readme was helpful in understanding their motivation, cheers for sharing it.
> Integrating agents into it prevents fragmentation of their service and allows them to keep ownership of their interface, branding and connection with their users
Looking at the contrived examples given, I just don't see how they're achieving this. In fact it looks like creating MCP specific tools will achieve exactly the opposite. There will immediately be two ways to accomplish a thing and this will result in a drift over time as developers need to take into account two ways of interacting with a component on screen. There should be no difference, but there will be.
Having the LLM interpret and understand a page context would be much more in line with assistive technologies. It would require site owners to provide a more useful interface for people in need of assistance.
> Having the LLM interpret and understand a page context
The problem is fundamentally that it's difficult to create structured data that's easily presentable to both humans and machines. Consider: ARIA doesn't really help llms. What you're suggesting is much more in line with microformats and schema.org, both of which were essentially complete failures.
LLMs can already read web pages, just not efficiently. It's not an understanding problem, it's a usability problem. You can give a computer a schema and ask it to make valid API calls and it'll do a pretty decent job. You can't tell a blind person or their screen reader to do that. It's a different problem space entirely.
I'm currently a solo bootstrapped founder, have done short stints in the past - 1 year in 2022, then became cofounder of a funded startup for a year. Now doing it again.
Question is how you stay motivated to keep at it - looks like it took about 4 years before you made similar to your Google salary, did family pressure or external pressure ever impact you? Or is it mainly just keep your eyes on the longer term goal?
I'm also quite lucky that I was aiming for lean-FIRE before I left Facebook, so I have the luxury of being able to keep at it, but sometimes it is demotivating seeing peers / others.
> Question is how you stay motivated to keep at it - looks like it took about 4 years before you made similar to your Google salary, did family pressure or external pressure ever impact you? Or is it mainly just keep your eyes on the longer term goal?
I found it helpful to go in with low expectations.
I was listening to a lot of podcasts about bootstrapping while I was still at Google in 2017-2018, and even the big success stories usually had 5+ years of failing or succeeding only marginally. So, I went in with the expectation that I'd probably fail for the first 5 years, and so there wasn't that feeling of disappointment from not earning much the first few years.
I also had a lot of lucky conditions that made it easy to take the risk at the time, including no family to support, lots of savings, low expenses.
> I'm also quite lucky that I was aiming for lean-FIRE before I left Facebook, so I have the luxury of being able to keep at it, but sometimes it is demotivating seeing peers / others.
Yeah, honestly I do sometimes think, "Wow, if I'd stayed at Google and kept getting that comp (which was about 50% equity IIRC), that would be a lot of money." But I also am very pleased with my life now, and I know I wouldn't have enjoyed my job nearly as much for the last 8 years had I stayed an employee. And that's a huge amount of my life to not do what I'd like to do.
Already have my own JS engine & the basics of three.js and pixi.js 8 working, roadmap to v1.0.0 posted in github issues. Aiming to show it to folks at GDC in March.
So in theory it should be possible, but it might require customizing the Dawn or wgpu-native builds if they don't support it (this is providing the JS bindings / wrapper around those two implementations of wgpu.h). But I've already added a special C++ method to handle draco compression natively, adding some mystral native only methods is not out of the question (however, I would want to ensure that usage of those via JS is always feature flagged so that it doesn't break when run on web).
Did you write your WebGPU chessboard using the raw JS APIs? Ideally it should work, but I just fixed up some missing APIs to get Three.js working in v0.1.0, so if there are issues, then please open up an issue on github - will try to get it working so we close any gaps.
Here's a dawn implementation with support for ray tracing that was implemented a number of years ago but never integrated into browsers. Perhaps it will help?
Yes, chessboard3d.app is written with raw JS APIs and raw WebGPU. It does use the rapier physics library, which uses WASM, which might be an issue? It implements its own ray tracing but would probably run 10x faster with hardware ray tracing support.
I think you'd get a lot of attention if you had hardware ray tracing, since that's only currently available in DirectX 12 and Vulkan, requiring implementation in native desktop platforms. FWIW, if the path looks feasible, I would be interested in contributing.
WASM shouldn't be an issue since the draco decoder uses it - but it may only work with V8 (for quickjs builds it wouldn't work, but the default builds use V8+dawn). Obviously with an alpha runtime, there may be bugs.
I think it would be super cool to have some sort of extension before WebGPU (web) has it. I was taking a look at the prior example & it seems like there's good ongoing discussion linked here about it: https://github.com/gpuweb/gpuweb/issues/535. Also I believe that Metal has hardware ray tracing support now too?
Re: Implementation, a few options exist - a separate Dawn fork with RT is one path (though Dawn builds are slow, 1-2 hours on CI). Another approach would be exposing custom native bindings directly from MystralNative alongside the WebGPU APIs - that might make iteration much faster for testing feasibility. The JS API would need to be feature-flagged so the same code gracefully falls back when running on web (did this for a native draco impl too that avoids having to load wasm: https://mystralengine.github.io/mystralnative/docs/api/nativ...).
Followup comment about Apple disallowing JIT - will need to confirm if JSC is allowed to JIT or only inside of a webview. I was able to get JSC + wgpu-native rendering in an iOS build, but would need to confirm if it can pass app review.
There's 2 other performance things that you can do by controlling the runtime though - add special perf methods (which I did for draco decoding - there is currently one __mystralNativeDecodeDracoAsync API that is non standard), but the docs clearly lay out that you should feature gate it if you're going to use it so you don't break web builds: https://mystralengine.github.io/mystralnative/docs/api/nativ...
The other thing is more experimental - writing an AOT compiler for a subset of Typescript to convert it into C++ then just compile your code ("MystralScript") - this would be similar to Unity's C# AOT compiler and kinda be it's own separate project, but there is some prior work with porffor, AssemblyScript, and Static Hermes here, so it's not completely just a research project.
Is AssemblyScript good for games though? last I checked it lacks too much features for game-code coming directly from TS but might be better now? No idea how well static hermes behaves today (but probably far better due to RN heritage).
I've been down the TS->C++ road a few times myself and the big issue often comes up with how "strict" you can keep your TS code for real-life games as well as how slow/messy the official TS compiler has been (and real-life taking time from efforts).
It's better now, but I think one should probably directly target the GO port of the TS compiler (both for performance and go being a slightly stricter language probably better suited for compilers).
I guess, the point is that the TS->C++ compilation thing is potentially a rabbit-hole, theoretically not too bad, but TS has moved quickly and been hard to keep up with without using the official compiler, and even then a "game-oriented" typescript mode wants to have a slightly different semantic model from the official one so you need either a mapping over the regular type-inference engine, a separate on or a parallell one.
Mapping regular TS to "game-variants", the biggest issue is how to handle numbers efficiently, even if you go full-double there is a need to have conversion-point checking everywhere doubles go into unions with any other type (meaning you need boxing or a "fatter" union struct). And that's not even accounting for any vector-type accelerations.
AssemblyScript was just mentioned as some prior work, I don't think that AssemblyScript would work as is for games.
I realize the major issues with TS->C++ though (or any language to C++, Facebook has prior work converting php to C++ https://en.wikipedia.org/wiki/HipHop_for_PHP that was eventually deprecated in favor of HHVM). I think that iteratively improving the JS engine (Mystral.js the one that is not open source yet but is why MystralNative exists) to work with the compiler would be the first step and ensuring that games and examples built on top with a subset of TS is a starting point here. I don't think that the goal for MystralScript should be to support Three.js or any other engine to begin with as that would end up going down the same compatibility pits that hiphop did.
Being able to update the entire stack here is actually very useful - in theory parts of mystral.js could just be embedded into mystralnative (separate build flags, probably not a standard build) avoiding any TS->C++ compilation for core engine work & then ensuring that games built on top are using the strict subset of TS that does work well with the AOT compilation system. One option for numbers is actually using comment annotations (similar to how JSDoc types work for typescript compiler, specifically using annotations in comments to make sure that the web builds don't change).
Re: TS compiler - I do have some basics started here and I am already seeing that tests are pretty slow. I don't think that the tsgo compiler has a similar API though for parsing & emitters right now, so as much as I would like to switch to it (I have for my web projects & the speed is awesome), I don't think I can yet until the API work is clarified: https://github.com/microsoft/typescript-go/discussions/455
I remember reading about Ejecta a long time ago! I had completely forgotten about it, but it is similar! The funny thing is to support UI elements, I had to also support canvas2d through Skia (although not 100% yet), so maybe impact could even work at some point (would require extensive testing obviously).
Phaser is not supported right now because phaser is still using a WebGL renderer from my understanding (maybe in a v2.0.0 adding ANGLE + WebGL support is an option, but debating if that's a good idea or not).
So I am stubbing parts of the DOM api (input handling like keydown, pointer events, etc.), so you shouldn't need to rewrite any of that.
Three.js and Pixi 8 with the WebGPU renderer are part of the v1.0.0 roadmap (verifying that they can work correctly on all platforms), right now most of the testing was done against my own engine (tentatively called mystral.js which will also be open sourced as part of v1.0.0, it's already used for some of the examples, just as a minified bundle): https://github.com/mystralengine/mystralnative/issues/7
Hi, thanks! Yeah for controls I'm emulating pointerevents and keydown, keyup from SDL3 inputs & events. The goal is that the same JS that you write for a browser should "just work". It's still very alpha, but I was able to get my own WebGPU game engine running in it & have a sponza example that uses the exact key and pointer events to handle WASD / mouse controls: https://mystraldev.itch.io/sponza-in-webgpu-mystral-engine (the web build there is older, but the downloads for Windows, Mac, and Linux are using Mystral Native - you can clearly tell that it's not Electron by size (even Tauri for Mac didn't support webp inside of the WebGPU context so I couldn't use draco compressed assets w/ webp textures).
I put up a roadmap to get Three.js and Pixi 8 (webgpu renderer) fully working as part of a 1.0.0 release, but there's nothing that my JS engine is doing that is that different than Three.js or Pixi. https://github.com/mystralengine/mystralnative/issues/7
I did have to get Skia for Canvas2d support because I was using it for UI elements inside of the canvas, so right now it's a WebGPU + Canvas2d runtime. Debating if I should also add ANGLE and WebGL bindings as well in v2.0.0 to support a lot of other use cases too. Fonts support is built in as part of the Skia support as well, so that is also covered. WebAudio is another thing that is currently supported, but may need more testing to be fully compatible.
It’s interesting that they only have 70 people for this - I can understand the outside the US ones for nighttime assistance and they need to be able to scale for other countries too in the future.
What I’m still wondering is what is limiting the scaling for Waymo - just cars or also the sensor systems? They’ve had their new test vehicles in SF for a while but I still think that most customers only get their Jaguars right now (and still limited on highway driving to specific customers in the Bay Area).
reply