> often you have to inject data into new tables or columns
No tool can help you with that, simply because this kind of data migration depends on your particular business logic that the tool has no way of knowing about.
While SQL Server Data Tools has its warts, it has been immensely useful for us in making sure every little detail gets handled during migration. That doesn't usually mean that it can do the entire migration itself - we do the manual adjustments to the base tables that SSDT cannot do on its own, and then let it handle the rest, which in our case is mostly about indexes, views, functions and stored procedures.
After all that, SSDT can compare the resulting database with the "desired" database, and reliably flag any differences, preventing schema drift.
Why use string as status, instead of a boolean? That just wastes space for no discernable benefit, especially since the status is indexed. Also, consider turning event_type into an integer if possible, for similar reasons.
Furthermore, why have two indexes with the same leading field (status)?
Boolean is rarely enough for real production workloads. You need a 'processing' state to handle visibility timeouts and prevent double-execution, especially if tasks take more than a few milliseconds. I also find it crucial to distinguish between 'retrying' for transient errors and 'failed' for dead letters. Saving a few bytes on the index isn't worth losing that observability.
> Boolean is rarely enough for real production workloads. You need a 'processing' ... 'retrying'... 'failed' ...
If you have more than 2 states, then just use integer instead or boolean.
> Saving a few bytes on the index isn't worth losing that observability.
Not sure why having a few well-known string values is more "observable" than having a few well-known integer values.
Also, it might be worth having better write performance. When PostgreSQL updates a row, it actually creates a new physical row version (for MVCC), so the less it has to copy the better.
Postgres supports enum that would fit this use case well. You get the readability of text and the storage efficiency of an integer. Adding new values used to require a bit of work, but version 9.1 introduced support for it.
That's true for seeks into the clustered (primary) index because that index includes all fields, so you don't need to "jump" to the heap to get them.
However, seeking into a secondary index, and then reading a column not included in that index incurs an additional index seek (into the clustered index), which may be somewhat slower than what would happen in a heap-based table.
I have found very minimal penalty on secondary index reads in practice such that it has never made a difference.
Remember some databases always use clustered index internally (SQLite, MySql) such that even if you have no primary key they will create a hidden one instead for use with the index.
It is nice to have the choice which way to go and would be nice if PG implemented this. It can have significant space savings on narrow table with one primary index and performance advantages.
For inserts, you cannot escape writing into the base table and all indexes. However, my understanding is that for updates PostgreSQL has a write amplification problem due to the fact that each time a row is updated this creates a new row (to implement MVCC), and a new physical location in the heap, so all indexes need to be updated to point to the new location, even those not containing the updated columns.
OTOH, with a heap-less (aka. clustered, aka. index organized) table, you would only have to update the indexes containing the columns that are actually being updated. You don't need to touch any other index. Furthermore, only if you are updating a key column would you physically "move" the entry into a different part of the B-tree. If you update an included column (PK columns are automatically "included" in all secondary indexes, even if not explicitly mentioned in the index definition), you can do that in-place, without moving the entry.
Here is how this works in SQL Server - consider the following example:
CREATE TABLE T (
ID int,
NAME nvarchar(255) NOT NULL,
AMOUNT int NOT NULL,
CONSTRAINT T_PK PRIMARY KEY (ID)
);
GO
CREATE INDEX T_I1 ON T (NAME);
GO
CREATE INDEX T_I2 ON T (AMOUNT);
Now, doing this...
UPDATE T SET AMOUNT = 42 WHERE ID = 100;
...will only write to T_PK and T_I2, but not T_I1. Furthermore T_PK's entry will not need to be moved to a different place in the B-tree. SQL Server uses row versioning similar to PostgreSQL, so it's conceivable that PostgreSQL could behave similarly to SQL Server if it supported clustered (index-organized) tables.
On the other hand, if the compiler can prove at compile-time what type the object must have at run-time, it can eliminate the dynamic dispatch and effectively re-enable inlining.
Which is why runtime polymorphism in Rust is very hard to do. The its focus on zero-cost abstractions means that the natural way to write polymorphic code is compiled (and must be compiled) to static dispatch.
Is there anything especially hard about decompiling (to) Java?
.NET/C# decompilers are widespread and generally work well (there is one built into Visual Studio nowdays, JetBrains have their own, there were a bunch of stand-alone tools too back in the the day).
< disclaimer - I wrote CFR, which is one of the original set of 'modern' java decompilers >
Generic erasure is a giant pain in the rear. C# doesn't do this. You don't actually keep any information about generics in the bytecode, however some of the metadata is present. BUT IT COULD BE FULL OF LIES.
There's also a huge amount of syntactic sugar in later java versions - take for example switch expressions.
Personally, I don't get the sentiment. Yeah, decompiling might not produce the original source code, which is fair. It's possible to generate code using invokeDynamic and what not - still being valid code if a compiler opts to do so.
When decomiling bytecode there has to be a reason for, and a good one. There has to be a goal.
If the code is somewhat humanly understandable that's ok. if it's more readable than just bytecode, that's already an improvement.
Reading bytecode alone is not hard when it comes to reverse engineering.
Java already comes with methods and fields available by design. Having local variable names and line numbers preserved is very common, due to exception stack traces being an excellent debugging tool. Hence debugging info gets to be preserved.
try/finally shares the same issues, albeit less pronounced.
C# doesn't erase all generics; but there's also some type erasure happening: nullable reference types, tuple element names, and the object/dynamic distinction are all not present in .NET bytecode; these are only stored in attributes for public signatures, but are erased for local variable types.
C# also has huge amounts of syntactic sugar: `yield return` and `await` compile into huge state machines; `fixed` statements come with similar problems as "finally" in java (including the possibility of exponential code growth during decompilation).
You're awesome! I had really good experiences with CFR in the mid 2010s.
I used it for game modding and documentation (and caught/reported a few game bugs + vulnerabilities along the way). I'd pull game files from Steam depots with steamkit, decompile with CFR, and run the resulting java through doxygen.
My personal experience with both is that decompilers work great for easy code. I still have both Java and C# projects that I wish I decompiled even to worst possible, but almost compilable code. Instead getting just decompiler errors or code where all variables got the same letter/name and of course different types...
I think I've tried all available free tools and some paid in Java case. Finally I just deducted logic and reverse engineered the most important path.
No tool can help you with that, simply because this kind of data migration depends on your particular business logic that the tool has no way of knowing about.
While SQL Server Data Tools has its warts, it has been immensely useful for us in making sure every little detail gets handled during migration. That doesn't usually mean that it can do the entire migration itself - we do the manual adjustments to the base tables that SSDT cannot do on its own, and then let it handle the rest, which in our case is mostly about indexes, views, functions and stored procedures.
After all that, SSDT can compare the resulting database with the "desired" database, and reliably flag any differences, preventing schema drift.
reply