Nice idea, but it breaks quite quickly (still). Tried to see if Cosette can prove equivalence of these two queries:
/* define schema s1,
here s1 can contain any number of attributes,
but it has to at least contain integer attributes
x and y */
schema s1(x:int, ya:int, ??);
schema s2(yb:int, ??); -- define schema s2
table a(s1); -- define table a using schema s1
table b(s2); -- define table b using schema s1
query q1 -- define query q1 on tables a and b
`select distinct x.x as ax from a x, b y
where x.ya = y.yb`;
query q2 -- define query q2 likewise
`select x.x as ax from a x, a y, b z
where x.x = y.x and x.ya = z.yb
union select 1 as ax from a x where 1 = 0`;
verify q1 q2; -- does q1 equal to q2?
Thanks for trying out. The parser error report is not great now. The real error actually, it should be "union all" rather than "union".
I changed your query:
/* define schema s1,
here s1 can contain any number of attributes,
but it has to at least contain integer attributes
x and y */
schema s1(x:int, ya:int, ??);
schema s2(yb:int, ??); -- define schema s2
table a(s1); -- define table a using schema s1
table b(s2); -- define table b using schema s1
query q1 -- define query q1 on tables a and b
`select distinct x.x as ax
from a x, b y
where x.ya = y.yb`;
query q2 -- define query q2 likewise
`(select x.x as ax
from a x, a y, b z
where x.x = y.x and x.ya = z.yb)
union all (select 1 as ax from a x where 1 = 0)`;
verify q1 q2; -- does q1 equal to q2?
The Rosette execution indeed returns a counterexample (since they are not equal). There is an error in Coq code generation part, we are fixing that now.
Yes, I understood later on that you're not supporting union yet. Would have been nice, though, there are quite some caveats when trying to transform distinct / union into one another... :)
Btw, another parser error happened when I indented the code, e.g. pasting the indented code from Hackernews here into your tool. The parser then complains about \r or something like that.
Me again - I really like the idea. I wish I had something like that at a previous job, where queries ran against some 10-level nested views, involving tons of unnecessary joins, projections, unions, sorts, distincts, and what not.
I'm pretty sure that Oracle figured it out all correctly, so the excessive SQL wasn't causing performance issues, but from a maintainability perspective, it would have been nice to have a tool that was able to reduce arbitrary SQL to a more minimal, semantically equivalent alternative.
One problem is that bag semantic chase is very complicated to implement. We are actually working on a new chase algorithm on our SQL formalization right now.
I'm sure bags cause some trouble, but I had something much "simpler" in mind. For instance, JOIN elimination is a powerful tool implemented by most modern RDBMS' optimisers.
Take this query:
SELECT c.*
FROM customer c
JOIN address a
ON c.address_id = a.address_id
With a foreign key on customer(address_id), the above JOIN can easily be proven useless, and thus eliminated. Furthermore, take this query:
SELECT c.*
FROM customer c
LEFT JOIN address a
ON c.address_id = a.address_id
In this case, we don't even need the foreign key as long as there is a unique constraint on address(address_id).
CURRENT_TIMESTAMP is truly random. One thing we can do is to over approximate. For example, let CURRENT_TIMESTAMP be a symbolic variable, which essentially can be any value. If we still can prove the equivalence, then the equivalence is sound. But there might be some false negative.
ROWNUM is interesting. You can actually declare ROWNUM as an extra attribute of the table and then assert it is a key (We will add support of key constraints very soon). Then many cases will be covered.
ROWNUM is actually more tricky than an extra attribute of the table, as it is influenced by ORDER BY. I guess I wouldn't prioritise support for ROWNUM, though. There are more interesting, and more standard things. E.g. comparing a correlated subquery to a window function.