Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’ve seen devs just run select * from table then filter it and sort it in their own code. Then they complain “the database is slow” when it’s spending all its time shipping gigabytes of data they don’t need to them!


From what I encountered, this is generally the case when someone is in the "analysis/reports" mode. Rather than get summary statistics on each column, find number of nulls, etc by writing a sql query, they instead get the data into the Python/R instance, and use general purpose functions, utilities, etc. "Programmers are expensive" statement probably applies here as well. I'm not trying to be defensive here, just saying that this might be one reason.


If you believe "Programmers are expensive", then you should do as much as you can do with a declarative data manipulation language (usually SQL, you can also consider sequences of text manipulations tools using pipes) and leave that last 15-5% of high-value work to a more powerful but also verbose imperative lenguage (usually Python, but any).

Asking for what you want is considerably faster than saying how you want it done.


From what I encountered, this is generally the case when someone is in the "analysis/reports" mode

I understand this use case, but this is in actual application code!


Is anyone working on a translator for pandas dataframe syntax to SQL?


In the R world, dbplyr[1] does this amazingly well.

[1] https://dbplyr.tidyverse.org/


tidyverse is just an mind-boggingly amazing ecosystem of packages

Forget Da Vinci, the first man to be cloned should be Hadley Wickham



Dang, Ibis doesn't support Redshift or SQL Server. I'm also having trouble understanding what it really is - it's an entire framework for big data it seems and not just a translator. What I'd really like is just that, something that turns pandas dataframe operation into ANSI SQL. So input pandas2sql('tablename["col"]') -> "select col from tablename". Something really simple to use.


Pandas has from_sql and to_sql methods that are compatible with SQLAlchemy if you insist on using an ORM, that gets you most of the way there...


SQLAlchemy is more than just an ORM. It also has sql expression language, for writing queries using python without using any ORM features.




Im surprised ppl dont use ORM libs for this instead..


ORMs are just as capable as any developer of generating bad SQL queries that crush databases and plug network connections.


Yes, but in the example used, all orm libs I know would fetch only a single column or two (often for a single row). While "select *" would fetch everything.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: