novacode007's comments

novacode007 · on Aug 13, 2024

Looks interesting to me, but I had some questions. Could you elaborate on the process of expert review and validation? How do you ensure the quality and accuracy of the datasets created?

yosai · on Aug 13, 2024

We have a team of domain expert who do the vetting of the instruction dataset.We do typical RLHF(Reinforcement learning from human feedback) and connect back to our SFT(supervised finetuning) loop.That's why we name ourself as hardware and human in loop.Humans play an important role in ensuring quality and accuracy of our dataset.

novacode007 · on Aug 13, 2024

Got it, and how well does it work with more complex documents, like those with a lot of images or intricate tables? I'm curious about how accurately it aligns the content with the source code in those cases.

yosai · on Aug 13, 2024

We use multimodal RAG and tools similar to unstructued.io ,We generate structured output and use LLM again to do the matching with our AST parsed source code.Now matching part is really complex and need manual inspection and validation.

yosai · on Aug 13, 2024

Please visit https://h2loop.ai/ to know more about H2LooP