More

ttamslam · 2025-10-30T20:45:46 1761857146

We use playwright for interacting with the browser, so while it's not available by default, we do support bulk exporting tests as playwright to move off our platform or to customers who want to run deterministic versions of the tests on their own infra (you can also run them on ours!)

ttamslam · 2025-10-30T17:42:00 1761846120

This is interesting, I think we've shied away a bit from security-ish use cases since it's outside of our personal core competencies, do you have examples of what tools exist today for catching things like that? Or is it totally adhoc?

> can the agents check their email? other notification methods?

Yes to email (for paying customers agents spin up with unique addresses), no to other notifications, but as soon as a paying customer has a use case for SMS, etc. we'll build it.

dfsegoat · 2025-10-30T17:43:40 1761846220

OTP protected flow verification

ttamslam · 2025-10-30T17:11:40 1761844300

> Are your agents good at testing other agents? e.g. I want your agent to ask our agent a few questions and complete a few UI interactions with the results.

I'd say this is one of our strong suits I think, specifically the UIs tend to be easy to navigate for browser agents, and the LLM as a judge offers pretty good feedback on chat quality and it can inform later actions. (I'd be remiss not to mention though that a good LLM eval framework like Braintrust is probably the best first line though)

> How do you handle testing onboarding flows?

We can step through most onboarding flows if you start from logged out state & give the context it'll need (i.e. a stripe test card, etc.) That said though, setting up integrations that require multi-page hops is still a pain point in our system and leaves a lot to be desired.

Would love to talk more about your specific case and see if we can help! founders@propolis.tech

tommy18 · 2025-11-11T18:56:07 1762887367

Then how do you compare with braintrust? Aren’t they doing same thing for Agents?

ttamslam · 2025-10-30T16:45:28 1761842728

Hey I'm Matt! Really excited to answer any questions.

To elaborate a little bit on the "canary" comment --

For a while at Airtable I was on the infra team that managed the deploy (basically click run and then sit and triage issues for a day), One of my first contributions on the team was adding a new canary analysis framework that made it easier to catch and rollback bugs automatically. Two things always bothered me about the standard canary release process:

1) It necessarily treats some users as lower value, and thus more acceptable to risk exposing bugs to (this makes sense for things like free-tier, etc. but the more you segment out, the less representative and thus less effective your canary is). When every customer interaction matters (as is the case for so many types of businesses) this approach is harder to justify

2) Low frequency / high impact bugs are really difficult to catch in canary analysis. While it’s easy to write metrics that catch glaring drops/spikes in metrics, more subtle high impact regressions are much harder and often require user reports (which we did not factor in as part of our canary). Example: how do you write a canary metric that auto rolls back when an enterprise account owner (small % of overall users) logs in and a broken modal prevents them from interacting with your website.

I view what we’re building at Propolis as an answer to both of these things. I envision a deploy process (very soon) that lets us roll out to simulated traffic and canary on THAT before you actually hit real users (and then do a traditional staged release, etc.)

bfeynman · 2025-10-30T20:45:04 1761857104

seems like you are misappropriating what canaries are useful and used for... they are designed to be lightweight and shallow... hence the name and whole analogy, canaries never were meant to determine if a mine was structurally unsafe etc

svnt · 2025-10-31T10:12:51 1761905571

I don’t see how they have it wrong?

Canaries are lightweight and shallow once they exist. Building a canary from the ground up is still beyond us, but if you don’t want to kill an actual bird that is pretty much the only way to go.

ttamslam · 2025-07-08T13:49:12 1751982552

I'm super curious to learn more about what AI/Innovation looks like for the MTA.

Is any of your/their work published?

ttamslam · on Nov 5, 2024

I thought about doing this as well but ended up viewing it personally as a "lose-lose" for myself rather than a "win-win". I wonder if that says anything about my risk-aversion.

mrguyorama · on Nov 5, 2024

It means you are not a degenerate gambler. You probably also do not enjoy slot machines, or the bullshit that passes for card games in a casino. I want to play blackjack, not "blackjack" where you change the allowed bets and rules specifically to increase the house edge thank you very much.

floobertoober · on Nov 6, 2024

Do you feel the same way about insurance in general?

ttamslam · on Oct 10, 2023

Wild to see some actual code from the FTX repo. Laughed at the takeaway of making sure you at least hide your fraud behind some messy code.

mmcwilliams · on Oct 10, 2023

Or more realistically compiled code where your fraudulent methods exist in patches you don't commit to git.

ttamslam · on Sept 21, 2023

I seem to missing something during install -- Odin is not showing up in the list of Community Plugins. I'm running locally via docker and have tried updating the Obsidian client + toggling restricted mode + restarting. Very excited to try it out if I can get the install to work.

ttamslam · on May 10, 2023

Did you replace it with a “dumb” phone for emergencies?

breck · on May 10, 2023

No. For emergencies have had to rely on nearby human beings, like the 1990's.

roycebranning · on May 10, 2023

Any crazy stories from this?

breck · on May 12, 2023

Many but I like to save them for real life.

One tamer one is the time I stopped to help a woman whose car broke down on the side of Route 5 near Camp Pendleton. Her phone had died and she was stuck there waiting for help. I didn't have a phone, but knew there was a Marine Base nearby so drove there to get help. Unfortunately it was dark and I went down the wrong road and suddenly see a Marine running after my car with an M-16. Luckily he didn't shoot, laughed at my Hawaii Driver's License, and directed me to a nearby gas station, where we were able to phone for help.

Not having a phone makes every day an adventure! :)

ttamslam · on Nov 24, 2022

Ah I should have searched before impulsively posting, sorry about that.

> This video is a feat of human intellect and ingenuity.

I couldn't agree more! Anyone reading this who can spare 20minutes - It's certainly worth it, and only escalates the longer you watch.

anm89 · on Nov 24, 2022

No, it was long dead. Happy you reposted!

Reposting this definitely improved your QPU alignment