Hacker Newsnew | past | comments | ask | show | jobs | submit | itsderek23's commentslogin

Nice work! I'm working on a similar standalone DevOps AI Agent (OpsTower.ai). This post shows how the agent is structured and how it performs against a 40 question evaluation dataset: https://www.opstower.ai/2023-evaluating-ai-agents/


That's an impressive article and a lot of good work put into it!


Thanks!

> Did you hit token limits?

While i used TikToken to limit the message history (and keep below the token limit), generally I found that I didn't get better completions by putting a lot of data into the context. Usually the completions got more confusing. I put a limited amount of info into the context and have generally stayed below the token limit.

> Are you storing message/ chat histories between sessions

Right now, yes. It's pretty important to store everything (each request / response) to debug issues with prompt, context, and the agent call loop.


+1. Jonathan is great to work with if you are in a similar position as Baremetrics.


This certainly looks like a cleaner way to deploy an ML model than SageMaker. Couple of questions:

* Is this really for more intensive model inference applications that need a cluster? It feels like for a lot of my models, a cluster is overkill.

* A lot of the ML deployment (Cortex, SageMaker, etc) don't see to rely on first pushing changes to version control, then deploying from there. Is there any reason for this? I can't come up for a reason why this shouldn't be the default. For example, this is how Heroku works for web apps (and this is a web app at the end of the day).


You're 100% right that Cortex is designed for the production use-case. A lot of our users are running Cortex for "small" production use cases, since the Cortex cluster can include just a single EC2 instance for model serving (autoscaling allows deployed APIs to scale down to 1 replica). For ML use-cases that don't need an API (a lot of data analysis work, for example), Cortex is probably overkill.

As for your second question, we definitely want to integrate tightly with version control systems. Since right now we are 100% open source and don't offer a manged service, we don't have a place to run the webook listeners. That said, most of our users version control their code/configuration (we do that with our examples as well: https://github.com/cortexlabs/cortex/examples), and it should be straightforward to integrate Cortex into an existing CI/CD workflow; the Cortex CLI just needs to be installed, and then running `cortex deploy` with the updated code/configuration will trigger a rolling update.

If you're referring to version control for the actual model files, Cortex is un-opinionated as to where those hosted, so long as they can be accessed by your Predictor (what we call the Python file that initializes your model and serves predictions). If you're interested in implementing version control with your models, I'd recommend checking out DVC.


Is it possible to partner with you to offer a managed service for Cortex? We are looking at your solution to offer our clients for deployment.


Great Caleb - makes sense. Thanks!


Scout also detects these for Django, ordering by the most performing N+1s: http://blog.scoutapp.com/articles/2018/04/30/finding-and-fix...


And bullet is used for Rails to detect N + 1 queries. https://github.com/flyerhzm/bullet


A lightweight approach we've started at my company:

1. Create a GitHub Repo dedicated to user-facing issues (https://github.com/scoutapp/roadmap)

2. Customers can subscribe to issues they are interested.

3. When resolving an issue, we reference it in the git commit, which closes the issue and notifies the issue subscribers.

We're a developer tool, so it's a familar flow for our customers.


Neat approach. I've also seen public facing trello boards used to varying success (at the very least to give users a hopefully clear picture of what features/issues are prioritized)


> Hi Derek! You've helped me with my employer's Scout configuration in Slack :)

Small world!

> Is there an automated way of getting the average of a performance metric (eg Time spent in AR) over N requests?

I'm assuming you mean w/Chrome dev tools + server timing?

Not that I'm aware of...DevTools is an area I'd like to explore more though.


Author here.

The server timing metrics here are actually extracted from an APM tracing tool (Scout).

Tracing services generally do not give immediate feedback on the timing breakdown of a web request. At worst, the metrics are heavily aggregated. At best, you'll need to wait a couple of minutes for a trace.

The Server Timing API (which is how this works) give immediate performance information, shortening the feedback loop and allowing you to do a quick gut-check on a slow request before jumping to your tracing tool.


> but I think it's an API limitation

Author here - I believe that's the case. There isn't a way to specific start & end time: https://w3c.github.io/server-timing/#dom-performanceserverti...

That said, the spec also mentions:

> To minimize the HTTP overhead the provided names and descriptions should be kept as short as possible - e.g. use abbreviations and omit optional values where possible.

I could see significant issues if we tried to send data in timeline fashion (such as creating a metric for each database record call in an N+1 scenario).

One idea: pass down an URI (ie - https://scoutapp.com/r/ID) that when clicked, provides full trace information.


Author here.

Application instrumentation - whether via Prometheus, StatsD, Scout, New Relic - solves a very different problem than this. The server timing metrics here are actually extracted from an APM tool (Scout), so you get the best of both worlds.

With those tools, you do not get immediate feedback on the timing breakdown of a web request. At worst, the metrics are heavily aggregated. At best, you'll need to wait a couple of minutes for a trace.

Profiling tools that give immediate feedback on server-side production performance have their place, just like those that collect and aggregate metrics over time.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: