Hacker Newsnew | past | comments | ask | show | jobs | submit | dluc's commentslogin

about #3 I’ll recommend https://github.com/microsoft/kernel-memory :)


We are also developing an open-source solution for those who would like to test it out and/or contribute, it can be consumed as a web service, or embedded into .NET apps. The project is codenamed "Semantic Memory" (available in GitHub) and offers customizable external dependencies, such as using Azure Queues, RabbitMQ, or other alternatives, and options for Azure Cognitive Search, Qdrant (with plans to include Weaviate and more). The architecture is similar, with queues and pipelines.

We believe that enabling custom dependencies and logic, as well as the ability to add/remove pipeline steps, is crucial. As of now, there is no definitive answer to the best chunk size or embedding model, so our project aims to provide the flexibility to inject and replace components and pipeline behavior.

Regarding Scalability, LLM text generators and GPUs remain a limiting factor also in this area, LLMs hold great potential for analyzing input data, and I believe the focus should be less on the speed of queues and storage and more on finding the optimal way to integrate LLMs into these pipelines.


The queues and storage are the foundation on which some of these other integrations can be built on top. Agree fully on the need for LLMs within the pipelines to help with data analysis.

Our current perspective has been on leveraging LLMs as part of async processes to help analyze data. This only really works when your data follows a template where I might be able to apply the analysis to a vast number of documents. Alternatively it becomes too expensive to do at a per document basis.

What types of analysis are you doing with LLMs? Have you started to integrate some of these into your existing solution?


Currently we use LLMs to generate a summary, used as an additional chunk. As you might guess, this can take time, so we postpone the summarization at the end (the current default pipeline is: extract, partition, gen embedding, save embeddings, summarize, gen embeddings (of the summary), save emb)

Initial tests though are showing that summaries are affecting the quality of answers, so we'll probably remove it from the default flow and use it only for specific data types (e.g. chat logs).

There's a bunch of synthetic data scenarios we want to leverage LLMs for. Without going too much into details, sometimes "reading between the lines", and for some memory consolidation patterns (e.g. a "dream phase"), etc.


Makes sense. Interesting on the fact that summaries affect quality sometimes.

For synthetic data scenarios are you also thinking about synthetic queries over the data? (Try to predict which chunks might be more used than others)


yes, queries and also planning.

For instance, given the user "ask" (which could be any generic message in a copilot), decide how to query one or multiple storages. Ultimately, companies and users have different storages, and a few can be indexed with vectors (and additional fine tuned models). But there's a lot of "legacy" structured data accessible only with SQL and similar languages, so a "planner" (in the SK sense of planners) could be useful to query vector indexes, text indexes and knowledge graphs, combining the result.


Really interesting library.

Is anyone aware of something similar but hooked into Google Cloud infra instead of Azure?


we could easily add that if there's interest, e.g. using Pub/Sub and Cloud Storage. If there are .NET libraries, should be straightforward implementing some interfaces. Similar considerations for the inference part, embedding and text generation.



Why .NET apps specifically?


Multiple reasons, some are subjective as usual in these choices. Customers, performance, existing SK community, experience, etc.

However, the recommended use is running it as a web service, so from a consumer perspective the language doesn't really matter.


good alternative: https://www.linen.dev/


The model doesn't run code, it generates text that happens to be code. It's up to the client calling the model API to use this text, e.g. compiling and executing (if that's your scenario) and calling the model again to fix the original code if needed.


hello there all, dluc from Semantic Kernel :wave: looking forward to attend :-)


Nice pun, it's an edge case...


the search filter seems to have some hard coded logic, preferring languages like R (4 projects [1]) over Scala (5 projects [2]). I wonder if that's based on trends or purely taste (or just cache :-)).

  [1] https://opensource.google.com/projects/search?q=%20&language=r
  [2] https://opensource.google.com/projects/search?q=%20&language=scala


Maybe it has something to do with actual amount of scala code. E.g. github reports bazel as 90%+ java.


Having worked mostly with mesos and k8s, I found k8s configuration superior, e.g. how one can import secrets, config files, and more importantly set up the network without address translation or port forwarding. Tooling seems OK with both, and I agree one needs to spend some time to get familiar with the CLI and nomenclature, IMHO because both are quite flexible and powerful.


To author: custom metrics link is broken.

Looking forward to compare that with an autoscale solution I'm working on.


perhaps the repo was private until recently


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: