I thought I really enjoyed startups, because I spend most of my spare time thinking of ideas. And I’m really good at turning ideas into something and convincing others what I’m building is good. But there too is minutiae.
I think working on my own programming language would be awesome. I think it’s the web I take issue with, tedious frontend designs, shitty spaghetti code mixed with view logic.
I really enjoy nature and the outdoors, but have arthritic hips(im not even 30 lol) so that limits me a bit! Otherwise I probably would be a forest ranger :)
I like helping people as well, my favourite part of my current role has been helping the junior engineer and pair-programming with them
Trino is a nice query engine solution if you want to not just run SQL on Elasticsearch but also want to be able to join data from Elasticsearch with data in other systems Trino supports. It also supports raw elasticsearch queries that are serialized back into Trino data types.
Check out Trino Summit 2022 to learn more about the federated query engine and "Federate 'em all". Talks include Lyft, Shopify, Astronomer, Starburst, and much more to be announced in the coming weeks!
Trino (formerly PrestoSQL) is a query engine that originally aimed to replace the Hive runtime and grew into a powerful federated query engine. It can connect to Snowflake, BigQuery, Hive, Oracle, Mongo, Iceberg, Elasticsearch, and many more. This enables powerful joins across your platform from one location.
In this episode of Trino Community Broadcast, we interview Ryan Blue to discuss the latest innovations in Apache Iceberg, why choose Apache Iceberg, latest Trino innovations and much more.
All use the Lyft "Presto but really Trino"-Gateway project to run different clusters to handle various workloads. They go into various details for how this is achieved.
Regarding the Trino/Presto split. I recommend looking at this blog to better understand why these two communities aren't mergeing. TL;DR Presto is a Facebook-driven project that mainly considers running on the Facebook infrastructure. Trino is community-driven that works on running well with all clouds and common infasturcture in the Trino community which is why you see a higher velocity there.
Ali here, with a perspective about the split. Disclosure - I work at Ahana and am an active member of the Presto Foundation. When I see things like this, it appears that Trino/Starburst wants to continue to push the narrative that Presto is a Facebook-driven project to keep the communities fractured which is pretty unfortunate. In reality, Presto is a community-based open source project housed under The Linux Foundation and has dozens of companies actively contributing to it and using it - Uber, Bytedance, Intel, Twitter, Tencent, and many more. There's no reason why the 2 communities can't coexist peacefully.
For all intents and purposes, both projects are active and lively. It seems that Trino is more focused on federation and building out connectors. Presto is more focused on being the engine for the data lake/lakehouse. Both projects are doing well and solving different problems. There's been a lot of innovative features in the Presto project over the last year that are only in Presto, like Presto-on-Spark, disaggregated coordinator, Project Aria, etc. In fact we just hosted a fantastic user conference a few weeks ago that showcased a lot of that innovation and how companies are using Presto at massive scale today (if interested, check out the sessions: https://www.youtube.com/watch?v=Gi8i7eHqwyw&list=PLJVeO1NMmy...)
Long story short, Presto is alive and well, is not solely backed by 1 company (quite the opposite of Trino/Starburst), and has a lot of tech innovation on the roadmap. We're excited about the future of Presto.
Yes, definitely it may help if going with multiple clusters, however, there are also many scenarios that we don't want to maintain multiple clusters. For example, when we come to a SaaS platform, multi-tenant is pretty typical where different tenants may have different workloads, and workload management would be needed for different users, or even within the same tenant. So the "built-in" workload management (besides other features for multi-tenant) would be a big plus.
Clickhouse is a realtime system where Trino is a batch-oriented system. There are tradeoffs for doing realtime vs batch.
Realtime is generally more expensive to run as you process every individual row as it comes, batch is when you can deal with minute latency and want to handle a lot of data in chunks.
It also happens to connect to Clickhouse and it's very common that people will use Trino to query clickhouse realtime data and join it with data in big query, an object store data lake, or Snowflake: https://trino.io/docs/current/connector/clickhouse.html
I mean, the hive migration path is one thing. Now that Iceberg is taking over the old Hive model, data lakes are all the rage again.
The other thing I would say is that Trino and Presto are not one-trick ponies or just hive replacements. There's also the ability to query across multiple systems that is, to me, the feature that future proofs a lot of architectures. It inherently frees you up to fiddle with your data in different systems but keep the access to that system in one location.
Yeah I think that is the key question: will data lakes become the dominant paradigm? There is certainly a lot of talk around them, though I see a ton of companies are still just going all in on a conventional data warehouse, but they tend not to talk about it because it’s not a new or interesting thing to do.
Yeah, though a lot of Fivetran customers are likely the type that would go all in on paying for a conventional data warehouse where people using open source stacks may be the ones that are using open ingestion alternatives.
We see a pretty even mix from the Trino/Starburst lens. Bigger companies like to mix and match.