A similar research is done for humpback whales by HappyWhale: https://happywhale.com/
They're doing a great effort with already over 600K user/tourist submitted photos of whale encounters to track individual whales over the globe
Great approach and good write-up! I’ve implemented a similar technique before on PostgreSQL, but with the materialisation in the application backend. Still works like a charm.
>So instead, we created a special kind of trigger that tracks the minimum and maximum times modified across all the rows in a statement and writes out the range of times that were modified to a log table. We call that an invalidation log.
Does this invalidation log also take into account cases where the view has an aggregate that is based on data from a bucket other than itself? For example, a lag() or lead() might be used to calculate a delta compared to the previous bucket. Then, if a data point inside bucket 1 is added into the realtime table and bucket 1 is invalidated and re-materialised, for integrity reasons also bucket 2 needs to be re-materialised?
In some cases for removing repeating (intermediate) calculations, I generally find it easier to use a lateral join (in postgres), like
select
title,
country,
avg(salary) as average_salary,
sum(salary) as sum_salary,
avg(gross_salary) as average_gross_salary,
sum(gross_salary) as sum_gross_salary,
avg(gross_cost) as average_gross_cost,
sum(gross_cost) as sum_gross_cost,
count(*) as emp_count
from
employees,
lateral ( select
(salary + payroll_tax) as gross_salary,
(salary + payroll_tax + healthcare_cost) as gross_cost
) employee_ext
where
country = 'usa'
and gross_cost > 0
group by title, country
having count(*) > 200
order by sum_gross_cost
limit 3;
So now we have easily come up with three different ways of rewriting the query to avoid that duplication (which obviously was not a problem at all to begin with): subquery, CTE and lateral join. And there are also several more well known ways (views, custom functions, computed columns etc) so the whole premise now for even inventing a "better" language than SQL is then false? Or what am I missing.
It's also weird how people always argue for immutability and eliminating local state, when using procedural languages, but as soon as they switch to SQL, that actually works like this, they immediately want to introduce mutability and local state.
> so the whole premise now for even inventing a “better” language than SQL is then false?
I don't think anyone is using the above examples to try invalidate PRSQL, just suggesting the baseline for comparisons should account for all constructs available in the SQL standards and common implementations there-of.
> Or what am I missing.
The statement “I can do X better than <SQL example> with <something else>” does not properly show the benefit of <something else> if “I can do X better than <SQL example> with <another SQL example>” is also true (assuming <another SQL example> is actually agreed to be better, not for instance convoluted/confusing/long-winded/other so just replacing some problems with others).
If there's multiple ways to do the same thing that's usually a BAD thing in terms of language design. Especially if some approaches are just newbie traps that experts learn to avoid, or if deciding the best method is a really subtle context-dependent decision. The ideal design is that the language encourages the one obviously "good" way to do it.
I'm not aware of any general purpose programming language that doesn't have multiple ways to achieve a specific goal. Can you give an example of language with good design?
Nope, that's - as far as I can see - just a sequence of increasing integers. Both K and J examples give an array of random integers in 0-1000000 range.
For reference, in J such sequence of integers can be generated with:
1+i.1000000
1 2 3 4 5 6 7 8 9 10 ....
or:
1+i.1000000 1
1
2
3
4
5
6
7
8
9
10
....
To explain the previous examples (and let's use smaller integer for less typing...), the `$` verb is called "shape"/"reshape", and it takes a (list of) values on the right side, and a list of dimensions on the left:
5 2 $ 10
10 10
10 10
10 10
10 10
10 10
if there's not enough values on the right, they are cycled:
5 2 $ 10 11 12
10 11
12 10
11 12
10 11
12 10
which degenerates to repetition if there's only one value on the right. The `~` adjective (called "reflex") modifies a verb to its left in the following way:
V~ x NB. same as x V x
so `$~10` is the same as `10$10`, which is a list of ten tens. That list is passed to `?`, which is a verb called "roll", which gives a random integer in the 0-(y-1) range when written as `? y`. `y` here can be a scalar, or a list, in which case the roll is performed for each element of the list:
The dyadic case, ie. `x ? y` is called "deal", which selects `x` elements from `i. y` list at random, without repetitions. `?~ y`, then, effectively shuffles the `i. y` list:
?~10
6 9 7 3 5 1 8 0 4 2
"deal" can be used to shuffle any list, not only the `i. y` sequence, by using the shuffled list as indexes of another list (using `{` verb, called "from"):
...I know, I know, it is strange. But it's so interestingly mind-bending that I'd be really happy if I had a valid excuse to pour hundreds of hours into learning J properly. Sadly, I don't have anything like that, so I only spread the strangeness from time to time in comments, like I do right now :)
This can generally fall under the category of “Generate Test Data in SQL” with both generic techniques like Recursive CTEs, as in the OP, and SQL dialect or tool specific options. Search is your friend. The OP also provides a table lookup technique for readable names but doesn’t address other data types such as timestamps or other functions such as MD5(). Other data distribution techniques other than random may also be needed. This post scratches the surface of testing with generated data.
You can use a with-expression. It's true that you can't use `limit` to limit it, but you can use a `where` condition. This is equivalent to cribwi's example:
WITH RECURSIVE
t AS (
SELECT 0 id
UNION ALL
SELECT id + 1
FROM t
WHERE id < 1000000
)
SELECT id, random() FROM t;
You can assign a custom shortcut to it via Settings > Keymap and then the action "Main Menu > File > Local History".
I myself have assigned the shortcut control+shift+=
I've been using Phabricator for almost a year now in our company. At first, I also had my questions about the workflow with arcanist (the local PHP cli software to interact with Phabricator). However, right now I really love it and miss it on every repository which works without Phabricator. Commands like 'arc feature', 'arc tasks' and 'arc inlines' provide a solid abstraction over the repository to bring the workflow to the cli.
About the unit tests: indeed, out-of-the-box Phabricator only supports running them locally (you can configure the unit engine to run every time a piece of code is been send into review). Luckily, it's not that hard to hook into the event engine (https://secure.phabricator.com/book/phabricator/article/even...) to, for example, trigger your CI-server to run the test suite for a specific revision.
Also, Phabricator is still being developed actively. There are a bunch of alpha and beta applications which will only make it more powerful (for example: DryDock and Harbormaster, which will ease the work for automatic builds: http://www.guywarner.com/2014/05/integrating-jenkins-and-pha...)
In my opinion, Phabricator is one of the most powerful tools around for doing code reviews and effective collaboration management. Even with the large amount of beta applications (which are promising in many cases) the decision for us was easy to make.
Since it has been broken some time ago: no. But when it was still working: yes, I found it rather useful. Especially with very small commits/bugfixes and such.
I hope they'll fix it soon