I think the point about not know what SQL an ORM generated query is the one that resonates with me, as does the dig at Django ORM.
I love Django, including aspects of the ORM (its very easy to write schema, migrations are pretty easy to use, even queries are very concise and easy to write) but I have no idea what the SQL will look like except in the simplest cases.
I often do not particularly care, and it is easy enough to see them when I do, but I feel a bit icky about the extra layer of stuff I do not really understand.
> Django gives you two ways of performing raw SQL queries: you can use `Manager.raw()` to perform raw queries and return model instances, or you can avoid the model layer entirely and execute custom SQL directly.
from django.db import connection
from myapp.models import SomeModel
queryset = SomeModel.objects.filter(foo='bar')
sql_query, params = queryset.query.as_sql(None, connection)
with connection.connection.cursor(cursor_factory=DictCursor) as cursor:
cursor.execute(sql_query, params)
data = cursor.fetchall()
But that's still not backend-specific SQL?
There should be an interface method for this. Why does psycopg call it mogrify?
> Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before presenting them in a user interface for further inspection
`str(queryset.query)` does not give you executable SQL. Query parameters are not escaped or encoded in that representation of the query. That escaping/interpolation of query parameters is performed by mogrify. I agree the name is odd, and I don't know why they use it other than "transmogrify" being an obscure word for "magically transform".
debug_toolbar and silk are both tools that show you what queries were executed by your running application. They're both good tools, but neither quite solves the problem of giving you the executable SQL for a specific queryset (e.g., in a repl).
I really wish Django would let you write raw queries, but wrap them in an object that still allows `.annotate()` and `.filter`, etc. I know you can do Model.objects.raw(), but I'm talking about a more complex query with nested joins, etc.
I understand why it doesn't allow that, and probably never can, but it would be nice when you need author OLAP queries but want to continue to use as much of the ORM as you can.
Django's ORM has a Model class that wraps an underlying SQL table to can runs queries in an object-oriented approach. When using utility methods like .filter() and .annotate(), you're executing them against the Model (well a ModelManager to be more specific) so they inherently understand the model (and thus the underlying table) that you're querying against. This allows Django to translate the arguments you pass into those methods into SQL, while also handle SQL parameterization, etc.
If the ORM were to allow you do build a raw query, using multiple tables, views, nested queries, etc - it would be difficult for it to then allow those utility methods. The ORM wouldn't have the Model class and it's associated fields to use in while translating the method's args into SQL.
> In truth the best way to do the data layer is to use stored procs with a generated binding at the application layer. This is absolutely safe from injection, and is wicked fast as well.
[Who must keep stored procedures in sync with migrations,]
I used the stored procedures approach sometimes. The usual criticisms are
1. Almost nobody in the team knows SQL, stored procedures more so.
2. It's not possible to deploy and test stored procedures with the usual tooling. Actually I don't know what's the usual way to do it in projects that go with that approach.
I don't really understand the confusion here, you write a stored procedure, you document it, test it, peer review it, then stuff it into the database. You then write code that uses it. No sarcasm, but what's the problem?
Yes, what is the normal way to unit test stored procedures? Dev teams used to Node, Python, Ruby or whatever language have tools that work with ORMs to setup a test database, run tests and zero it. I did wrote some stored procedures in a Rails migration many years ago but that's not the place where one would normally write code. Furthermore there will be many migrations like that, almost one for every change to the stored procedure. The problem is in the frameworks, that are not designed for that. So, what's the recommended way to handle them?
Right, I'm starting to get the picture. I've never worked with an ORM as my SQL is pretty solid. I have to put my hands up here and say I don't know, sorry.
I think I can tell you the answer: you have a very slow system test that spins up a whole real database, sets up all the tables and stored procedures, and fires a load of requests through the system and inspects the responses and the final state of the system.
That works; it's just maybe thousands of times slower than unit tests, and much more work to create and maintain.
I have 90% allergy to their doc style. The only worse are old javadocs. I like python stdlib model, so I assumed it would be as nice.. but here it's a blend of newb info (oh look Forms!) and partial technical description. To the point that every time you ask something online.. people tell you straight away try ccbv or "read django source", which should be the first things listed on their website.
Isn't using f-strings to build SQL queries a large footgun? I can imagine a junior developer introducing a SQL injection by not completely grasping the sqlbind API.
Using the quickstart example, visually both lines are very similar and both are functional, but one of them may introduce a SQL injection:
sql = f'SELECT * FROM table WHERE field1 = {q/value1} AND field2 > {q/value2}'
sql = f'SELECT * FROM table WHERE field1 = {value1} AND field2 > {value2}'
QParams = sqlbind.Dialect.some_dialect
@QParams
def make_my_query(value1: str, value2: int):
return f'SELECT * FROM table WHERE field1 = {value1} AND field2 > {value2}'
data = connection.execute(*make_my_query(foo, bar))
Obviously this wouldn't work as-is, QParams would need to be modified to support this. The decorator would wrap the method, sanitising the args before they're even passed to the wrapped function.
Edit: actually, I might've misunderstood what sqlbind is doing internally, maybe this approach doesn't quite make sense
Where's the injection? (you might want to take a look at the example in the issue I opened, it's more complete - I'm not bypassing the DBAPI, I'm working with it)
It's a little difficult to parse through the README, but I believe placeholders and placeholder structure is being rendered in the SQL, not placeholder values.
ps = f'SELECT * FROM table WHERE field1 = {q/value1} AND field2 > {q/value2}'
# SELECT * FROM table WHERE field1 = ? AND field2 > ?
However, I would imagine that if any external input is passed through to this framework, then there might still be the possibility of SQL injection attacks passing through this framework and ending up in the prepared statement SQL.
It's not obvious how to make SQL with nested queries with own WHERE in Norm. One of design goals of sqlbind was an ability to write complete high level structure of a big query. And to eliminate `if` statements as much as possible.
Norm:
def get_users(cursor, user_ids, only_girls=False, minimum_age=0):
s = (SELECT('first_name', 'age')
.FROM('people')
.WHERE('user_ids IN :user_ids') # doesn't work in sqlite
.bind(user_ids=user_ids))
if only_girls:
s = s.WHERE(gender='f')
if minimum_age:
s = (s.WHERE('age >= :minimum_age')
.bind(minimum_age=minimum_age))
return cursor.run_query(s)
VS sqlbind:
def get_users(cursor, user_ids, only_girls=False, minimum_age=0):
q = QParams()
filters = [
q.user_ids.IN(user_ids), # renders into FALSE if user_ids are empty and supports SQlite.
q.cond(only_girls, "gender = 'f'"),
q.age >= truthy/minimum_age,
]
sql = f'''\
SELECT first_name, age
FROM people
{WHERE(*filters)}
'''
return cursor.execute(sql, q)
I find it really difficult to write parameterized SQL. The problems are many fold:
I want to write a sql query I can deploy against test and prod, so I need to be able to parameterize table names to some extend. Then there are values, as shown here, with all the footguns that entails. But in the end I also want to be able to have IDE niceties while developing. Autocomplete on column names and table names and inline be able to see types and those kind if things.
And I have never seen anything that can give you all those things.
But if I remember correctly, when I'm executing bigquery against parquet files in a bucket, they are kinda all in the big database of global bucket names on GCP.
I came across this yesterday for golang: https://sqlc.dev which is somewhat like what you want, maybe.
Not sure it allows you to parameterize table names but the basic idea is codegen from sql queries so you are working with go code (autocompletion etc).
If you are allowed to create new schema objects then you can create views over the tables. Give the views a consistent name, but create the view to reference the actual table you want. Then update the view.
Interesting approach. But for me the main disadvantage of SQL code embedded in other language is lack of syntax highlighting/checking.
I've had the best experience with embrace-sql which translates sql-files with magic comment-delimited parametrized queries to python-callable modules: https://pypi.org/project/embrace/
I wish there was a way to get around the syntax highlighting, checking, and completion issue. Pandas has a nice `query` method, which is hard to use for the same reason.
The proposed API is really strange and not intuitive, there is a lot of black magic going on. q/email? which just returns a question mark and the value storage happens in q. Usually when designing a library, explicitness and the Principle of Least Surprise are very good principles. Don't write smart code, write straightforward code!
Unfortunately straightforward code leads to spaghetti mess in my domain: big reporting queries. API solves an issue of binding a value in exact place of raw SQL query.
I had my own take on this concept[1], though with considerably less language magic involved. I imagine there's a lot of these kind of things running around. My criteria were:
a) let me write actual SQL, not a python DSL that generates SQL
b) be placeholder-safe
c) be composable
Though it was somewhat intentionally limited to what I needed to support for my own needs at the time.
I went through the first example in the README more than a few times, and found it quite difficult to make sense of...
Are the query params stored in `q`? Which is also updated when it's inserted in the query itself? Why does a `sqlbind.Dialect.default()` turn into an `str`?
And... what's the difference between that and normal parameterized queries? It just seems weirder and with a lot of side effect magic going on.
It's not a lot of magic, but it's pretty opaque magic. The q object overloads the division operator and returns a ? and stores the "passed" value inside itself. The q object extends either a list or a dictionary to act as the set of values to pass to the parametrized query.
So in effect, the slash behavior is just a spicy way to append or set (depending on which dialect option you choose).
No, I can see that. It's the same as the logical OR operator | computing unions of sets. I just fail to see how this solution is better than a standard parameterized query, it feels much more opaque without any obvious upsides.
Yes, but to get the exact query with interpolated parameters that can be copy pasted into e.g. pgadmin for a nice explain analyze visualization is a bit more involved. As I could never remember it I always copy pasted that function in any project. But other than that minor complaint I have always been thoroughly impressed by both SQLAlchemy ORM and the Core query builder, and how they can be combined to have a best of both worlds situation. I have written some very complex queries that remained maintainable thanks to composition with cte's, sub queries and computed properties. Doing that in pure SQL would also have been possible, but using using SQLlachemy allowed me to name and reuse smaller components, avoiding the wall of text SQL statements.
You don't have to use the modeling part of an ORM. Just the built-in security, session, and connection pool handling is already valuable. And you can already do:
```
params = [123]
session.query("SELECT * FROM my_table WHERE userID = ?", params)
If you have a lot of values to add to a query, they're physically distant from the place you're using them. With the unnamed version like you used, you're relying on your ability to count how many values are used so you get the list correct. With named parameters, you're adding another layer of indirection. Use a value more than once? If you change it or want to remove it, you'd better be sure you do it for each usage of the name in your query!
I love Django, including aspects of the ORM (its very easy to write schema, migrations are pretty easy to use, even queries are very concise and easy to write) but I have no idea what the SQL will look like except in the simplest cases.
I often do not particularly care, and it is easy enough to see them when I do, but I feel a bit icky about the extra layer of stuff I do not really understand.