More

morgo · on Feb 16, 2022

I work remotely and I've decided that ON is the right choice for me. I made this choice when I started joining a lot of calls where they had to be in English because of me. I figured camera on was a good way to show I was paying attention.

However one of the perks of being a male is that society only ever expects me to take a shower to be presentable. So I'm totally cool if a colleague wants to leave the camera off even if it's a 1 on 1.

morgo · on Jan 5, 2022

You are correct, it is a disk-format issue, and MySQL officially supports in-place upgrade between versions.

It's actually quite hard to fix bugs in charset/collations, because any changes to the sort order implicitly could affect the on-disk format (indexes are sorted).

morgo · on Aug 3, 2020

I really like the workflow GitHub uses for this: https://github.blog/2020-02-14-automating-mysql-schema-migra...

(It makes use of skeema, which allows you to track your schema in a declarative way vs. ALTER TABLE statements.)

awinter-py · on Aug 3, 2020

skeema is similar -- if I understand it correctly, it keeps 'truth' as a SQL file and diffs it against a live mysql DB

automig diffs the SQL files directly without looking at a live DB

pros & cons to both approaches

evanelias · on Aug 4, 2020

Skeema author here -- you're correct that a live DB is involved, but the notion of "truth" in Skeema actually depends on what operation you're running :)

In Skeema, you have directories of *.sql files, where each directory defines the desired state of a single "logical schema". A config file in each dir allows you to map that logical schema to one or more live databases, possibly in different environments (prod, stage, etc) and/or possibly different shards in the same environment.

Most commonly, users will want to "push" the filesystem state to the databases, e.g. generate and run DDL that brings the database up to the desired state expressed by the filesystem. But users can also "pull" from a live database, which does the reverse: modifies the filesystem to look like the current state of a live database, essentially doing a schema dump. So the source of truth depends on the direction of the operation.

Skeema also uses the live database as a SQL parser. Instead of needing to parse every type of CREATE statement accurately across many different versions/dialects of MySQL, Percona Server, MariaDB, (and maybe someday Aurora etc), Skeema runs the statements in a temporary location and then introspects the result from information_schema. This avoids an entire possible class of bugs around parser inaccuracies.

That all said, I do wholeheartedly agree that generating DDL from git alone is both useful and really cool! I actually built a "database CI" product around this concept as a GitHub app last June: https://www.skeema.io/ci/

awinter-py · on Aug 4, 2020

whoa neat

and agree re parsing -- turned out to be a huge pain. I think someone bundled the postgres parser for python; I wish this were true for every dialect but even so, there are versions to consider.

morgo · on Nov 27, 2019

For MySQL there are better ways to do this: https://mysql.wisborg.dk/2018/08/10/innodb-progress-informat...

The article is MariaDB specific.

dethi · on Nov 27, 2019

Thanks, this exactly what I was looking for today!

wolf550e · on Nov 27, 2019

How is this MariaDB specific? It's MyISAM specific, I think.

morgo · on Aug 23, 2019

An interesting example to bring up mysql_embedded :-)

I was the product manager, and we did have complaints about the size. So it was removed:

https://mysqlserverteam.com/mysql-8-0-retiring-support-for-l...

I don't believe there have been any regrets.

morgo · on July 4, 2019

Hi! Former product manager for MySQL here. The defaults have changed a lot across major releases:

https://mysqlserverteam.com/new-defaults-in-mysql-8-0/

https://dev.mysql.com/doc/refman/5.7/en/added-deprecated-rem...

https://dev.mysql.com/doc/refman/5.6/en/server-default-chang...

One detail that is not always obvious is how much work goes into limiting regressions. The work to switch to utf8mb4 really started in MySQL 5.6 by not allocating the sort buffer in full (and then further improved in 5.7). 8.0 then added a new temptable storage engine for variable length temp tables.

These are not small cases either: When you compare to latin1 because the _profile_ of queries could change from all in memory to on disk, we could be talking about 10x regressions. In MySQL 8.0 it is more like 11% https://www.percona.com/blog/2019/02/27/charset-and-collatio...

Edit: Also forgot to mention, switching the default character set broke over 600 tests. It's not as easy as it sounds!

tracker1 · on July 5, 2019

While I appreciate that it's the default now (utf8mb4)... If someone specified (by error) "utf8" as the collation, is that real utf8 or some other implementation currently?

morgo · on July 14, 2019

If someone uses `utf8` in MySQL 8.0, they will get a warning suggesting they should use `utf8mb4`, because `utf8` will be deprecated.

Redefining `utf8` to mean 4-byte would break the upgrade since existing tables would not be able to join against newly created tables.

This is discussed here: https://mysqlserverteam.com/sushi-beer-an-introduction-of-ut...

morgo · on Jan 22, 2019

Morgan from the TiDB team here.

I agree that column and row store have very different characteristics, but what I think is worth mentioning is that some hybrid solutions actually store as both row and columnar and have a query optimizer that can pick between them. For example: Oracle DB In-Memory, SQL Server Columnstore index.

At the same event as this announcement, we also announced that we are working on TiFlash which will do similar. Stay tuned for a blog post with more details :-)

georgewfraser · on Jan 22, 2019

As described in that paper, it’s not sufficient to simply store a second, columnar projection of the data to get good performance. You also need a block-oriented execution engine, which means you effectively have two separate databases operating side-by-side. This is a huge challenge and it’s not clear if it’s really worth it, since for logistical reasons you will nearly always operate a separate data warehouse doing mostly OLAP and production database doing mostly OLTP.

morgo · on Jan 22, 2019

Update: The documentation has now been updated https://github.com/pingcap/tidb/pull/9144

morgo · on Jan 22, 2019

Morgan from the TiDB team here. We are working on at rest encryption now - stay tuned.

w.r.t. nested transactions, this is not something that MySQL currently offers (TiDB is MySQL 5.7 compatible). Sometimes this is emulatable via savepoints, which is a feature we plan to add in the future.

gigatexal · on Jan 22, 2019

How good or bad is the SqlAlchemy support?

morgo · on Jan 22, 2019

It should work fine, with one item to note:

I believe SqlAlchemy defaults to READ-COMMITTED, whereas TiDB defaults to (and recommends) MySQL's default of REPEATABLE-READ.

gigatexal · on Jan 27, 2019

Any chance you guys run it against the SQLalchemy test suite?

gigatexal · on Jan 23, 2019

I’ll check it out today. Thanks.

morgo · on Jan 22, 2019

Morgan from the TiDB team here. Thank you for the feedback, and I agree with you. We actually took this line out from the same copy in the docs: https://pingcap.com/docs/

(We must have missed a spot, and I will follow up and make sure it is addressed).

We try to be transparent about the differences from MySQL. On the compatibility page, there are a few cases described such as large transactions, small transactions and single threaded workloads:

https://www.pingcap.com/docs/sql/mysql-compatibility/