_endif_'s comments

_endif_ · on Sept 23, 2024

Agreed, but I have to say data cleaning is actually one of the hardest step, LLMs are simply not there yet.

It's almost impossible to for LLM to tell all the invalid rows at once since the data cannot be fit into the context window. If we prompt the model to thoroughly do data cleaning, there will be many try-and-fail steps. This happens to me as a human, I clean some rows, expect my program to run with the data, only to find there are more malformed data. LLM cannot get it right for now, actually I see many cases that LLM fails because it wants to convert types (e.g. string to date).

Based on my experience, the best way is simply to skip the data cleaning step in the planning stage (you can provide feedback asking the tool to not do any steps).

_endif_ · on Sept 23, 2024

I feel you, this definitely is a Google-wide issue across products (e.g. https://x.com/levelsio/status/1831840497629065656). These products themselves are worth a try IMHO.

postalcoder · on Sept 23, 2024

This and the underlying tweet makes me feel seen.

Why does developing with Google have to be so hard?

nerdponx · on Sept 23, 2024

How is this different from what literally any other hosted services company does? If anything, Google is being comparatively honest and transparent.

_endif_ · on Sept 23, 2024

100%. The good part is, if you follow the tweet thread, you can see they are trying to improve things. Hopefully the effort can land and the rest products can follow.