We use playwright for interacting with the browser, so while it's not available by default, we do support bulk exporting tests as playwright to move off our platform or to customers who want to run deterministic versions of the tests on their own infra (you can also run them on ours!)
This is interesting, I think we've shied away a bit from security-ish use cases since it's outside of our personal core competencies, do you have examples of what tools exist today for catching things like that? Or is it totally adhoc?
> can the agents check their email? other notification methods?
Yes to email (for paying customers agents spin up with unique addresses), no to other notifications, but as soon as a paying customer has a use case for SMS, etc. we'll build it.
> Are your agents good at testing other agents? e.g. I want your agent to ask our agent a few questions and complete a few UI interactions with the results.
I'd say this is one of our strong suits I think, specifically the UIs tend to be easy to navigate for browser agents, and the LLM as a judge offers pretty good feedback on chat quality and it can inform later actions. (I'd be remiss not to mention though that a good LLM eval framework like Braintrust is probably the best first line though)
> How do you handle testing onboarding flows?
We can step through most onboarding flows if you start from logged out state & give the context it'll need (i.e. a stripe test card, etc.) That said though, setting up integrations that require multi-page hops is still a pain point in our system and leaves a lot to be desired.
Would love to talk more about your specific case and see if we can help! founders@propolis.tech
Hey I'm Matt! Really excited to answer any questions.
To elaborate a little bit on the "canary" comment --
For a while at Airtable I was on the infra team that managed the deploy (basically click run and then sit and triage issues for a day), One of my first contributions on the team was adding a new canary analysis framework that made it easier to catch and rollback bugs automatically. Two things always bothered me about the standard canary release process:
1) It necessarily treats some users as lower value, and thus more acceptable to risk exposing bugs to (this makes sense for things like free-tier, etc. but the more you segment out, the less representative and thus less effective your canary is). When every customer interaction matters (as is the case for so many types of businesses) this approach is harder to justify
2) Low frequency / high impact bugs are really difficult to catch in canary analysis. While it’s easy to write metrics that catch glaring drops/spikes in metrics, more subtle high impact regressions are much harder and often require user reports (which we did not factor in as part of our canary). Example: how do you write a canary metric that auto rolls back when an enterprise account owner (small % of overall users) logs in and a broken modal prevents them from interacting with your website.
I view what we’re building at Propolis as an answer to both of these things. I envision a deploy process (very soon) that lets us roll out to simulated traffic and canary on THAT before you actually hit real users (and then do a traditional staged release, etc.)
seems like you are misappropriating what canaries are useful and used for... they are designed to be lightweight and shallow... hence the name and whole analogy, canaries never were meant to determine if a mine was structurally unsafe etc
Canaries are lightweight and shallow once they exist. Building a canary from the ground up is still beyond us, but if you don’t want to kill an actual bird that is pretty much the only way to go.
I thought about doing this as well but ended up viewing it personally as a "lose-lose" for myself rather than a "win-win". I wonder if that says anything about my risk-aversion.
It means you are not a degenerate gambler. You probably also do not enjoy slot machines, or the bullshit that passes for card games in a casino. I want to play blackjack, not "blackjack" where you change the allowed bets and rules specifically to increase the house edge thank you very much.
I seem to missing something during install -- Odin is not showing up in the list of Community Plugins.
I'm running locally via docker and have tried updating the Obsidian client + toggling restricted mode + restarting.
Very excited to try it out if I can get the install to work.
One tamer one is the time I stopped to help a woman whose car broke down on the side of Route 5 near Camp Pendleton. Her phone had died and she was stuck there waiting for help. I didn't have a phone, but knew there was a Marine Base nearby so drove there to get help. Unfortunately it was dark and I went down the wrong road and suddenly see a Marine running after my car with an M-16. Luckily he didn't shoot, laughed at my Hawaii Driver's License, and directed me to a nearby gas station, where we were able to phone for help.
Not having a phone makes every day an adventure! :)