Show HN: Sqlbind a Python library to compose raw SQL

graemep · on Feb 26, 2024

I think the point about not know what SQL an ORM generated query is the one that resonates with me, as does the dig at Django ORM.

I love Django, including aspects of the ORM (its very easy to write schema, migrations are pretty easy to use, even queries are very concise and easy to write) but I have no idea what the SQL will look like except in the simplest cases.

I often do not particularly care, and it is easy enough to see them when I do, but I feel a bit icky about the extra layer of stuff I do not really understand.

jiaaro · on Feb 26, 2024

If you use postgres, psycopg provides `cursor.mogrify(query, params)` which returns the actual query that will be executed . For example:

    cursor.mogrify(*queryset.query.sql_with_params())

Alternatively, you can set the log level of the `django.db.backends` logger to DEBUG to see all executed queries

westurner · on Feb 26, 2024

https://docs.djangoproject.com/en/5.0/topics/db/sql/ :

> Django gives you two ways of performing raw SQL queries: you can use `Manager.raw()` to perform raw queries and return model instances, or you can avoid the model layer entirely and execute custom SQL directly.

https://stackoverflow.com/questions/1074212/how-can-i-see-th... has :

  MyModel.objects.all().query.sql_with_params() 
  str(MyModel.objects.all().query)

And:

  from django.db import connection
  from myapp.models import SomeModel
  queryset = SomeModel.objects.filter(foo='bar')
  sql_query, params = queryset.query.as_sql(None, connection)
  
  with connection.connection.cursor(cursor_factory=DictCursor) as cursor:
      cursor.execute(sql_query, params)
      data = cursor.fetchall()

But that's still not backend-specific SQL?

There should be an interface method for this. Why does psycopg call it mogrify?

https://django-debug-toolbar.readthedocs.io/en/latest/panels... :

> debug_toolbar.panels.sql.SQLPanel: SQL queries including time to execute and links to EXPLAIN each query

But debug toolbars mostly don't work with APIs.

https://github.com/django-query-profiler/django-query-profil... :

> Django query profiler - one profiler to rule them all. Shows queries, detects N+1 and gives recommendations on how to resolve them

https://github.com/jazzband/django-silk :

> Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before presenting them in a user interface for further inspection

jiaaro · on Feb 26, 2024

`str(queryset.query)` does not give you executable SQL. Query parameters are not escaped or encoded in that representation of the query. That escaping/interpolation of query parameters is performed by mogrify. I agree the name is odd, and I don't know why they use it other than "transmogrify" being an obscure word for "magically transform".

debug_toolbar and silk are both tools that show you what queries were executed by your running application. They're both good tools, but neither quite solves the problem of giving you the executable SQL for a specific queryset (e.g., in a repl).

westurner · on Feb 26, 2024

There are a number of solutions listed in that stackoverflow post for logging queries, but AFAIU none call mogrify().

Maybe it should be called dialectize() or to_sql(parametrized=True)

WesleyJohnson · on Feb 26, 2024

I really wish Django would let you write raw queries, but wrap them in an object that still allows `.annotate()` and `.filter`, etc. I know you can do Model.objects.raw(), but I'm talking about a more complex query with nested joins, etc.

I understand why it doesn't allow that, and probably never can, but it would be nice when you need author OLAP queries but want to continue to use as much of the ORM as you can.

specialist · on Feb 26, 2024

(I'm not a Python dev. Sorry.)

Guessing .annotation() for runtime type info? .filter() avoids re-executing query just to get a subset?

How do the (canonical utility methods) relate to nested JOINs? Because field types of nested sets aren't visible?

TIA.

WesleyJohnson · on Feb 29, 2024

Django's ORM has a Model class that wraps an underlying SQL table to can runs queries in an object-oriented approach. When using utility methods like .filter() and .annotate(), you're executing them against the Model (well a ModelManager to be more specific) so they inherently understand the model (and thus the underlying table) that you're querying against. This allows Django to translate the arguments you pass into those methods into SQL, while also handle SQL parameterization, etc.

If the ORM were to allow you do build a raw query, using multiple tables, views, nested queries, etc - it would be difficult for it to then allow those utility methods. The ORM wouldn't have the Model class and it's associated fields to use in while translating the method's args into SQL.

JodieBenitez · on Feb 26, 2024

> but I have no idea what the SQL will look like

Why don't you print QuerySet.query ?

airstrike · on Feb 26, 2024

> but I have no idea what the SQL will look like except in the simplest cases

that's really on you as Django makes it very easy to find the answer to that because Python makes it very easy to inspect objects at runtime

graemep · on Feb 26, 2024

that is what i meant by 'easy enough to see them'

westurner · on Feb 26, 2024

"Is ORM still an anti-pattern?" (2023)

> In truth the best way to do the data layer is to use stored procs with a generated binding at the application layer. This is absolutely safe from injection, and is wicked fast as well.

[Who must keep stored procedures in sync with migrations,]

https://news.ycombinator.com/item?id=36497613#36503998

pmontra · on Feb 26, 2024

I used the stored procedures approach sometimes. The usual criticisms are

1. Almost nobody in the team knows SQL, stored procedures more so.

2. It's not possible to deploy and test stored procedures with the usual tooling. Actually I don't know what's the usual way to do it in projects that go with that approach.

_a_a_a_ · on Feb 26, 2024

Perhaps I can help, or maybe learn something.

I don't really understand the confusion here, you write a stored procedure, you document it, test it, peer review it, then stuff it into the database. You then write code that uses it. No sarcasm, but what's the problem?

robertlagrant · on Feb 26, 2024

> you ... test it

I think they're referring to ongoing automated tests, and not one-off tests.

pmontra · on Feb 26, 2024

Yes, what is the normal way to unit test stored procedures? Dev teams used to Node, Python, Ruby or whatever language have tools that work with ORMs to setup a test database, run tests and zero it. I did wrote some stored procedures in a Rails migration many years ago but that's not the place where one would normally write code. Furthermore there will be many migrations like that, almost one for every change to the stored procedure. The problem is in the frameworks, that are not designed for that. So, what's the recommended way to handle them?

_a_a_a_ · on Feb 26, 2024

Right, I'm starting to get the picture. I've never worked with an ORM as my SQL is pretty solid. I have to put my hands up here and say I don't know, sorry.

robertlagrant · on Feb 27, 2024

I think I can tell you the answer: you have a very slow system test that spins up a whole real database, sets up all the tables and stored procedures, and fires a load of requests through the system and inspects the responses and the final state of the system.

That works; it's just maybe thousands of times slower than unit tests, and much more work to create and maintain.

agumonkey · on Feb 26, 2024

there Queryset.query but it's not obvious unless you read every line of the docs

jd3 · on Feb 26, 2024

also manage.py shell_plus --print-sql will print the sql after each command you enter

you can also print qs.query, qs.explain("analyze"), etc etc

airstrike · on Feb 26, 2024

that's a pretty basic thing to spot in the docs and Django docs are actually godtier

agumonkey · on Feb 26, 2024

I have 90% allergy to their doc style. The only worse are old javadocs. I like python stdlib model, so I assumed it would be as nice.. but here it's a blend of newb info (oh look Forms!) and partial technical description. To the point that every time you ask something online.. people tell you straight away try ccbv or "read django source", which should be the first things listed on their website.

airstrike · on Feb 26, 2024

you may be mixing up their `ref` type docs and the `topic` type docs which are more tutorial-y. compare

- https://docs.djangoproject.com/en/5.0/topics/db/queries/

- https://docs.djangoproject.com/en/5.0/ref/models/querysets/

agumonkey · on Feb 27, 2024

Even the refs didn't fit my needs for a comprehensive structure of the framework. It's cultural issue I think.

nicolaslem · on Feb 26, 2024

Isn't using f-strings to build SQL queries a large footgun? I can imagine a junior developer introducing a SQL injection by not completely grasping the sqlbind API.

Using the quickstart example, visually both lines are very similar and both are functional, but one of them may introduce a SQL injection:

    sql = f'SELECT * FROM table WHERE field1 = {q/value1} AND field2 > {q/value2}'
    sql = f'SELECT * FROM table WHERE field1 = {value1} AND field2 > {value2}'

Retr0id · on Feb 26, 2024

Proposed safe(r) API via decorated methods:

    QParams = sqlbind.Dialect.some_dialect

    @QParams
    def make_my_query(value1: str, value2: int):
        return f'SELECT * FROM table WHERE field1 = {value1} AND field2 > {value2}'

    data = connection.execute(*make_my_query(foo, bar))

Obviously this wouldn't work as-is, QParams would need to be modified to support this. The decorator would wrap the method, sanitising the args before they're even passed to the wrapped function.

Edit: actually, I might've misunderstood what sqlbind is doing internally, maybe this approach doesn't quite make sense

Edit 2: Like so: https://github.com/baverman/sqlbind/issues/1

chrisjc · on Feb 26, 2024

If I understand your code correctly, you are bypassing DBAPI's binding functionality and opening yourself up for SQL injection.

    QParams = sqlbind.Dialect.some_dialect

    @QParams
    def make_my_query(value1: str, value2: int):
        # SELECT * FROM table WHERE field1 = ? AND field2 > ?
        return f'SELECT * FROM table WHERE field1 = {value1} AND field2 > {value2}'

    data = connection.execute(*make_my_query(foo, bar), [foo, bar])

Retr0id · on Feb 26, 2024

Where's the injection? (you might want to take a look at the example in the issue I opened, it's more complete - I'm not bypassing the DBAPI, I'm working with it)

chrisjc · on Feb 26, 2024

Yup. Your code in github explains what you're suggesting better. My bad.

chrisjc · on Feb 26, 2024

It's a little difficult to parse through the README, but I believe placeholders and placeholder structure is being rendered in the SQL, not placeholder values.

    ps = f'SELECT * FROM table WHERE field1 = {q/value1} AND field2 > {q/value2}'
    # SELECT * FROM table WHERE field1 = ? AND field2 > ?

However, I would imagine that if any external input is passed through to this framework, then there might still be the possibility of SQL injection attacks passing through this framework and ending up in the prepared statement SQL.

gpapilion · on Feb 26, 2024

Any string interpolation would open you up to sql injection without a lot of care. It generally recommended to used prepared statements.

I guess the q part is the query parameter for the prepared statement. I’d still be a bit antsy here given the ease of a mistake.

I just don’t get what it’s providing over using my db’s module. Usually the dialect is close but different enough to require changes to queries.

bvrmn · on Feb 26, 2024

It's my biggest concern as well. For real world application I tend to use out of line filters:

    filters = [q.field == value1, q.field2 > value2]
    sql = f'SELECT * FROM table {WHERE(*filters)}'

It's a little harder to misuse.

hombre_fatal · on Feb 26, 2024

I don’t use Python so can’t be bothered to read the source but the general solution here is to make regular string interpolation fail.

e.g. You must use the Q mechanism and doesn’t work if you forget it.

dougk · on Feb 26, 2024

f-strings are part of the language so you need to change the interpreter to make that fail. That will not play nice with the rest of the codebase.

ltbarcly3 · on Feb 26, 2024

https://github.com/justinvanwinkle/Norm

Similar idea, more fleshed out, doesn't require all the weird interpolation stuff and special python functions which map to SQL grammar.

Almost nothing to memorize so you can use the library.

The SQL it outputs is extremely readable and cleanly formatted.

bvrmn · on Feb 26, 2024

It's not obvious how to make SQL with nested queries with own WHERE in Norm. One of design goals of sqlbind was an ability to write complete high level structure of a big query. And to eliminate `if` statements as much as possible.

Norm:

    def get_users(cursor, user_ids, only_girls=False, minimum_age=0):
        s = (SELECT('first_name', 'age')
             .FROM('people')
             .WHERE('user_ids IN :user_ids')  # doesn't work in sqlite
             .bind(user_ids=user_ids))
        if only_girls:
            s = s.WHERE(gender='f')
        if minimum_age:
            s = (s.WHERE('age >= :minimum_age')
                 .bind(minimum_age=minimum_age))
        return cursor.run_query(s)

VS sqlbind:

    def get_users(cursor, user_ids, only_girls=False, minimum_age=0):
        q = QParams()
        filters = [
            q.user_ids.IN(user_ids), # renders into FALSE if user_ids are empty and supports SQlite.
            q.cond(only_girls, "gender = 'f'"),
            q.age >= truthy/minimum_age,
        ]
        sql = f'''\
            SELECT first_name, age
            FROM people
            {WHERE(*filters)}
        '''
        return cursor.execute(sql, q)

raphinou · on Feb 26, 2024

Interesting to see this as I recently did a show HN without much success about an F# library also handling raw sql.

I enjoy working with SQL directly, though I understand it has pros and cons. If you need specially crafted queries, it is a must though.

My lib is at https://monazita.gitlab.io/monazita/ and show HN is at https://news.ycombinator.com/item?id=39467742

wodenokoto · on Feb 26, 2024

I find it really difficult to write parameterized SQL. The problems are many fold:

I want to write a sql query I can deploy against test and prod, so I need to be able to parameterize table names to some extend. Then there are values, as shown here, with all the footguns that entails. But in the end I also want to be able to have IDE niceties while developing. Autocomplete on column names and table names and inline be able to see types and those kind if things.

And I have never seen anything that can give you all those things.

Shish2k · on Feb 27, 2024

> I want to write a sql query I can deploy against test and prod, so I need to be able to parameterize table names

… are your “test" and “prod" different tables in the same database? :S

wodenokoto · on Feb 27, 2024

Fair point.

But if I remember correctly, when I'm executing bigquery against parquet files in a bucket, they are kinda all in the big database of global bucket names on GCP.

sparks1970 · on Feb 26, 2024

I came across this yesterday for golang: https://sqlc.dev which is somewhat like what you want, maybe.

Not sure it allows you to parameterize table names but the basic idea is codegen from sql queries so you are working with go code (autocompletion etc).

_a_a_a_ · on Feb 26, 2024

If you are allowed to create new schema objects then you can create views over the tables. Give the views a consistent name, but create the view to reference the actual table you want. Then update the view.

maxxk · on Feb 26, 2024

Interesting approach. But for me the main disadvantage of SQL code embedded in other language is lack of syntax highlighting/checking.

I've had the best experience with embrace-sql which translates sql-files with magic comment-delimited parametrized queries to python-callable modules: https://pypi.org/project/embrace/

dr_kiszonka · on Feb 26, 2024

I wish there was a way to get around the syntax highlighting, checking, and completion issue. Pandas has a nice `query` method, which is hard to use for the same reason.

https://pandas.pydata.org/docs/reference/api/pandas.DataFram...

kissgyorgy · on Feb 26, 2024

The proposed API is really strange and not intuitive, there is a lot of black magic going on. q/email? which just returns a question mark and the value storage happens in q. Usually when designing a library, explicitness and the Principle of Least Surprise are very good principles. Don't write smart code, write straightforward code!

bvrmn · on Feb 26, 2024

Unfortunately straightforward code leads to spaghetti mess in my domain: big reporting queries. API solves an issue of binding a value in exact place of raw SQL query.

phs2501 · on Feb 26, 2024

I had my own take on this concept[1], though with considerably less language magic involved. I imagine there's a lot of these kind of things running around. My criteria were:

a) let me write actual SQL, not a python DSL that generates SQL

b) be placeholder-safe

c) be composable

Though it was somewhat intentionally limited to what I needed to support for my own needs at the time.

[1] https://github.com/bdowning/sql-athame

agubelu · on Feb 26, 2024

I went through the first example in the README more than a few times, and found it quite difficult to make sense of...

Are the query params stored in `q`? Which is also updated when it's inserted in the query itself? Why does a `sqlbind.Dialect.default()` turn into an `str`?

And... what's the difference between that and normal parameterized queries? It just seems weirder and with a lot of side effect magic going on.

bastawhiz · on Feb 26, 2024

It's not a lot of magic, but it's pretty opaque magic. The q object overloads the division operator and returns a ? and stores the "passed" value inside itself. The q object extends either a list or a dictionary to act as the set of values to pass to the parametrized query.

So in effect, the slash behavior is just a spicy way to append or set (depending on which dialect option you choose).

agubelu · on Feb 26, 2024

No, I can see that. It's the same as the logical OR operator | computing unions of sets. I just fail to see how this solution is better than a standard parameterized query, it feels much more opaque without any obvious upsides.

owlstuffing · on Feb 26, 2024

Too bad manifold-sql[1] is for java, it does all that and more, particularly the IDE dev productivity stuff.

1. https://github.com/manifold-systems/manifold/blob/master/man...

esafak · on Feb 26, 2024

I think the original title is better, because composing queries is not something SQL does and I was curious to see how it was achieved.

nick238 · on Feb 26, 2024

> It's really not an easy task to decipher complex SQLAlchemy expression back into SQL

SQLAlchemy to SQL is borderline trivial, you just call `str(statement)` and it will output SQL. [Here's the docpage for if you want to select dialects and optionally inline parameters](https://docs.sqlalchemy.org/en/20/faq/sqlexpressions.html#ho...).

tda · on Feb 26, 2024

Yes, but to get the exact query with interpolated parameters that can be copy pasted into e.g. pgadmin for a nice explain analyze visualization is a bit more involved. As I could never remember it I always copy pasted that function in any project. But other than that minor complaint I have always been thoroughly impressed by both SQLAlchemy ORM and the Core query builder, and how they can be combined to have a best of both worlds situation. I have written some very complex queries that remained maintainable thanks to composition with cte's, sub queries and computed properties. Doing that in pure SQL would also have been possible, but using using SQLlachemy allowed me to name and reuse smaller components, avoiding the wall of text SQL statements.

rmbyrro · on Feb 26, 2024

I failed to see the value.

You don't have to use the modeling part of an ORM. Just the built-in security, session, and connection pool handling is already valuable. And you can already do:

```

params = [123]

session.query("SELECT * FROM my_table WHERE userID = ?", params)

```

Why introduce another library just for that?

bastawhiz · on Feb 26, 2024

If you have a lot of values to add to a query, they're physically distant from the place you're using them. With the unnamed version like you used, you're relying on your ability to count how many values are used so you get the list correct. With named parameters, you're adding another layer of indirection. Use a value more than once? If you change it or want to remove it, you'd better be sure you do it for each usage of the name in your query!

rmbyrro · on Feb 27, 2024

Ah, that indeed makes sense