anarkafkas's comments

anarkafkas · on June 22, 2023

Thanks for your interest! We currently support 7 models from OpenAI and Anthropic including gpt-3.5 and gpt-4. You can definitely configure the agents to the specific needs of your application including dynamic configuration that changes for each user. The fine-tuning of base models is currently not supported but our early customers have not really needed this. Happy to help you with your specific use case! Feel free to reach out to me at anar@proficientai.com or our Discord channel. Also you should find much of this information in documentation.

anarkafkas · on June 22, 2021

Nothing unexpected. You're charged N reads where N is the number of docs traversed.

anarkafkas · on June 22, 2021

Thank you! Yes, that's correct. The limit for a batch migrator must be 500 because it uses Firestore write batches internally.

I'm currently writing another migrator that won't be using Firestore batches, it'll just use the good old Promise.all(). I'm planning to add more capabilities soon like error-resilient traversers using different traversal strategies, the ability to re-traverse the docs that couldn't be migrated the first time etc.

anarkafkas · on June 22, 2021

Great question! Since traversing the entire collection may take a while, it's definitely possible that a new doc has been added in the meantime. Whether that new doc will be traversed or not depends on its order/index within the collection. It definitely won’t be traversed twice.

If it's positioned before all the docs in the current batch then it won't be traversed. If it's positioned after the current batch then it will be. So obviously that also depends on whether you’re traversing a plain collection or a Query.

Catching all the new docs that were added requires implementing a different strategy like adding a temporary field to all the traversed docs and then querying the ones that don’t have that field. It’s definitely something that we can implement soon!

spankalee · on June 22, 2021

It looks like adding and removing documents before the end of the current batch may cause /existing/ documents to be skipped or processed twice.

If you add a new document before the end of the current batch, the offset used for the beginning of the next batch will be too low, causing documents at the boundary to be processed twice. If you delete a document the index will be too high, skipping some documents.

I think the temporary field solution might work, but you need stable indexing on the set to be traversed, so I think you need to add the temporary field to new documents and exclude them in the query, and you need to only soft-delete while traversing and exclude them post-query. Then you can clean up and remove the temporary fields and soft-deleted documents afterwards.

anarkafkas · on June 23, 2021

Are you sure about this? Not sure what you mean by "offset" but we are passing the last document of the current batch to the .startAfter() method which ensures that the next batch only contains the docs that come after that. So there shouldn't be any doc that is processed twice. But as I said earlier, the new docs won't be traversed which is expected. I'm working on a different type of traverser that will fix this.

I'll actually write up some tests to confirm that we don't process any docs twice!