More

bitsondatadev · on Aug 19, 2024

Is there any market or industry that you would like to work in? Like the domain itself is interesting to work in?

purple-leafy · on Aug 19, 2024

I thought I really enjoyed startups, because I spend most of my spare time thinking of ideas. And I’m really good at turning ideas into something and convincing others what I’m building is good. But there too is minutiae.

I think working on my own programming language would be awesome. I think it’s the web I take issue with, tedious frontend designs, shitty spaghetti code mixed with view logic.

I really enjoy nature and the outdoors, but have arthritic hips(im not even 30 lol) so that limits me a bit! Otherwise I probably would be a forest ranger :)

I like helping people as well, my favourite part of my current role has been helping the junior engineer and pair-programming with them

bitsondatadev · on Oct 24, 2022

Trino is a nice query engine solution if you want to not just run SQL on Elasticsearch but also want to be able to join data from Elasticsearch with data in other systems Trino supports. It also supports raw elasticsearch queries that are serialized back into Trino data types.

https://trino.io/docs/current/connector/elasticsearch.html

bitsondatadev · on Sept 23, 2022

Check out Trino Summit 2022 to learn more about the federated query engine and "Federate 'em all". Talks include Lyft, Shopify, Astronomer, Starburst, and much more to be announced in the coming weeks!

Trino (formerly PrestoSQL) is a query engine that originally aimed to replace the Hive runtime and grew into a powerful federated query engine. It can connect to Snowflake, BigQuery, Hive, Oracle, Mongo, Iceberg, Elasticsearch, and many more. This enables powerful joins across your platform from one location.

bitsondatadev · on Sept 12, 2022

In this episode of Trino Community Broadcast, we interview Ryan Blue to discuss the latest innovations in Apache Iceberg, why choose Apache Iceberg, latest Trino innovations and much more.

Disclaimer: I am a Trino contributor.

bitsondatadev · on Aug 4, 2022

Also, you can keep track of all the BQ progress here: https://github.com/trinodb/trino/issues/6867

dmead · on Aug 5, 2022

thanks. If we can eliminate the costs and upkeep of hive in gcp, it would make my life easier for sure.

bitsondatadev · on Aug 4, 2022

For managing difrerent workloads, check out this blogs and this videos from Shopify, Salesforce, Goldman Sachs, and Electronic Arts, respectively:

- https://engineering.salesforce.com/how-to-etl-at-petabyte-sc... - https://shopify.engineering/faster-trino-query-execution-inf... - https://trino.io/episodes/33.html - https://www.youtube.com/watch?v=-5mlZGjt6H4

All use the Lyft "Presto but really Trino"-Gateway project to run different clusters to handle various workloads. They go into various details for how this is achieved.

https://github.com/lyft/presto-gateway

Regarding the Trino/Presto split. I recommend looking at this blog to better understand why these two communities aren't mergeing. TL;DR Presto is a Facebook-driven project that mainly considers running on the Facebook infrastructure. Trino is community-driven that works on running well with all clouds and common infasturcture in the Trino community which is why you see a higher velocity there.

https://trino.io/blog/2022/08/02/leaving-facebook-meta-best-... https://trino.io/blog/2020/12/27/announcing-trino.html

Soon we anticipate that Trino will become the common name in the community space but we'll always love the origins of the Trino project being Presto.

ambigali · on Aug 4, 2022

Ali here, with a perspective about the split. Disclosure - I work at Ahana and am an active member of the Presto Foundation. When I see things like this, it appears that Trino/Starburst wants to continue to push the narrative that Presto is a Facebook-driven project to keep the communities fractured which is pretty unfortunate. In reality, Presto is a community-based open source project housed under The Linux Foundation and has dozens of companies actively contributing to it and using it - Uber, Bytedance, Intel, Twitter, Tencent, and many more. There's no reason why the 2 communities can't coexist peacefully.

For all intents and purposes, both projects are active and lively. It seems that Trino is more focused on federation and building out connectors. Presto is more focused on being the engine for the data lake/lakehouse. Both projects are doing well and solving different problems. There's been a lot of innovative features in the Presto project over the last year that are only in Presto, like Presto-on-Spark, disaggregated coordinator, Project Aria, etc. In fact we just hosted a fantastic user conference a few weeks ago that showcased a lot of that innovation and how companies are using Presto at massive scale today (if interested, check out the sessions: https://www.youtube.com/watch?v=Gi8i7eHqwyw&list=PLJVeO1NMmy...)

Long story short, Presto is alive and well, is not solely backed by 1 company (quite the opposite of Trino/Starburst), and has a lot of tech innovation on the roadmap. We're excited about the future of Presto.

jerryjerryjerry · on Aug 4, 2022

Yes, definitely it may help if going with multiple clusters, however, there are also many scenarios that we don't want to maintain multiple clusters. For example, when we come to a SaaS platform, multi-tenant is pretty typical where different tenants may have different workloads, and workload management would be needed for different users, or even within the same tenant. So the "built-in" workload management (besides other features for multi-tenant) would be a big plus.

bitsondatadev · on Aug 4, 2022

Clickhouse is a realtime system where Trino is a batch-oriented system. There are tradeoffs for doing realtime vs batch.

Realtime is generally more expensive to run as you process every individual row as it comes, batch is when you can deal with minute latency and want to handle a lot of data in chunks.

Trino is also a query engine rather than a database and it connects to many different systems: https://trino.io/docs/current/connector.html

It also happens to connect to Clickhouse and it's very common that people will use Trino to query clickhouse realtime data and join it with data in big query, an object store data lake, or Snowflake: https://trino.io/docs/current/connector/clickhouse.html

bitsondatadev · on Aug 4, 2022

Thanks for the shoutout! :)

If you want to get started with Trino, here's a repo I created to do so: https://github.com/bitsondatadev/trino-getting-started

bitsondatadev · on Aug 4, 2022

Check out this PR. I believe we may have tackled this one but you'd need to try it out on Trino: https://github.com/trinodb/trino/pull/1415

gavinray · on Aug 4, 2022

Hooray! Yet another data point for Trino > Presto as far as I'm concerned ;^)

bitsondatadev · on Aug 4, 2022

If you want to try an SaaS Athena alternative that's backed by Trino you can check out Starburst Galaxy: https://www.starburst.io/platform/starburst-galaxy/

Full disclosure I work at Starburst.

gavinray · on Aug 4, 2022

Oh nice, I have high opinions of you folks!

Guy who goes by the name of "Randgalt" online builds some great Java libraries and works there too I believe.

simpligility · on Aug 4, 2022

Yep .. Jordan works on Trino and Starburst Galaxy. We got lots of other great engineers helping as well btw.

https://github.com/randgalt

ofrzeta · on Aug 7, 2022

On the homepage there's two occurrences of "ELT" (instead of "ETL" I would guess). Is that correct?

bitsondatadev · on Aug 4, 2022

I mean, the hive migration path is one thing. Now that Iceberg is taking over the old Hive model, data lakes are all the rage again.

The other thing I would say is that Trino and Presto are not one-trick ponies or just hive replacements. There's also the ability to query across multiple systems that is, to me, the feature that future proofs a lot of architectures. It inherently frees you up to fiddle with your data in different systems but keep the access to that system in one location.

georgewfraser · on Aug 4, 2022

Yeah I think that is the key question: will data lakes become the dominant paradigm? There is certainly a lot of talk around them, though I see a ton of companies are still just going all in on a conventional data warehouse, but they tend not to talk about it because it’s not a new or interesting thing to do.

bitsondatadev · on Aug 4, 2022

Yeah, though a lot of Fivetran customers are likely the type that would go all in on paying for a conventional data warehouse where people using open source stacks may be the ones that are using open ingestion alternatives.

We see a pretty even mix from the Trino/Starburst lens. Bigger companies like to mix and match.