I really appreciate that the linked article uses the phrasing "source-available", the lower case "free", and doesn't use the phrase "open source". Terminology matters a lot.
For me, a lot of the value in Free software comes from being able to make modifications to the software (either yourself, or by hiring others), and generally being in control of your own "software destiny".
With that in mind, I think it's important to call attention to this license's prohibition of running modified versions in production. This prohibition applies regardless of your modifications being distributed (and in fact, later in the license, distribution of modifications is expressly prohibited as well):
Clause 2.1 (d): "A license to prepare, compile, and test Derivative Works of the TSL Licensed Software Source Code solely in a Non-Production Environment ...
I've often pined for visibility into the source code of proprietary software that I use. I suppose this is a "win" for TimescaleDB in my mind over source-unavailable proprietary software. In the end, however, this license means it's still just proprietary software.
The original intent of that clause was to avoid us needing to support modified versions that were deployed to production. (Note: We provide a lot of free support in our 4000+ member Slack channel [0].)
But that clause was written 1.5 years ago, and a lot has changed since then. There’s actually an internal debate right now on whether we need to keep it. So thank you and HN for spurring this discussion!
If you intend to change it to allow running open-sourced changes, you might consider allowing changes submitted to you privately too, for vulnerability reports.
One possible approach compatible with true software freedom and the usual definition of open source is not to restrict use of modified versions of the code, but instead to use naming to distinguish between the two, and only support the unmodified version.
For example, the code build system could have variables for the name and maybe the logo and other trademark/brand-ish things, and the public codebase could be configured by default to call itself Timescale Community DB or Timescale Custom DB or some other name instead of TimescaleDB. Your private build would simply substitute the json file with those data values and maybe point to logos that aren't in the repo instead of generic ones that are, or something similar to that.
You'd also have the option to use any mixture of trademark law or copyright conditions to restrict the commercial version's name and branding assets.
All of the options I described above are used in reality by various projects out there. For example, the git repository for VS Code OSS has a product.json file with most of the customization points (not all) that MS changes in building their supported VS Code release, TeX and Red Hat apply naming restrictions, and Red Hat also has rules in their support contract.
I second this. I'm quite puzzled by the apparent dislike for open source ideals expressed elsewhere in the comments here.
Licensing is a complex problem and open source isn't some magical solution. It isn't the right model for every usecase out there and that's OK! Pay to play is sometimes the only fair and workable approach from a business perspective, but that doesn't make it open source. There's nothing wrong with that though!
Using terminology correctly is important. Timescale gets the terminology right and I appreciate that. (I also think it's awesome that they're releasing this product for free.)
This is an interesting window into their business model. This could be a purely an altruistic decision, which businesses sometimes do, contrary to popular belief. More likely it's a bet that wider adoption from making the clustered version free will drive more revenue through their managed database as a service offering. Which shows that their non-OSI open-source license is actually leading to more code and features being available free and (mostly) open-source. As opposed to gating features for paying customers.
I think we're too hung up on OSI open source licenses. The additional restriction in the timescaledb license that you can't run a paid database as a service offering affects hardly anyone negatively (AWS). It affects us all positively by providing a sustainable business model to support additional development and support of an open-source product we use. Win-win if ever there was one. I'd like to see more open-source and closed-source companies consider this model.
You are spot on. Before the Timescale License, we were left with a tough decision: do we open-source a feature so that everyone can have it for free OR do we close a feature so that the mega-clouds don't have access to it?
We didn't like either of those options, which is why we created the Timescale License, which allows us to offer capabilities for free (and make the source code available) to everyone except the cloud providers (ie free for 99.9999% of all users).
We find that this has resulted in a mutually beneficial outcome for ourselves and our users.
"I think we're too hung up on OSI open source licenses. The additional restriction in the timescaledb license
that you can't run a paid database as a service offering affects hardly anyone negatively (AWS).
It affects us all positively by providing a sustainable business model to support additional development
and support of an open-source product we use. Win-win if ever there was one. I'd like to see more open-source
and closed-source companies consider this model."
It's available to _see_, but not to "prepare, compile, and test Derivative Works of the TSL Licensed Software Source Code" in a production environment, as per your license's clause 2.1 (d). That's a pretty big departure from open-source; and a bit discouraging for use by non-mega-cloud business interests too. One of the important reasons I personally use and support open-source is the freedom to not only inspect (which the TSL provides) but to also not have to ask someone else and wait on them to make any changes I need to the software I use. Any chance the TSL can be modified to include this freedom too?
This. I don't care if I can see the source code if I can't actually _do_ anything with it. If I can't run my modifications in production, it doesn't guard me against vendor lock-in and it doesn't give me the right-to-repair. So what's the benefit?
Note that I am not arguing for OSS licences, but something like the Commons Clause (use freely, even for commercial use, repair as you wish, just don't sell) seems much more suitable for such cases imho. It protects the business from cloud providers, while still offering some basic protections to the users. This... doesn't.
I don't think Java would have gotten anywhere if not for the fact that most of the source code was available. Trying to divine information about Windows from the header files was excruciatingly painful.
Being able to step into the code and figure out why it doesn't like 0 in the third argument was a massive boost to my efficacy as a coder. I could add a guard and then file a very precise bug report to get the issue fixed.
We did consider the Commons Clause when investigating our own licensing approach, but ended up concluding that its definition of "Sell" were actually much vaguer than we felt comfortable with:
“Sell” ...a product or service whose value derives, entirely or
substantially, from the functionality of the Software.
But then you have put a limitation on use in production instead of sell? Commons Clause might be (too?) vague, but the concept is imho more fair to end users.
Still, I for one applaud you for this step. We need better non-OSS licenses and it's good to be having this discussion.
I appreciate the discussion here, and as mentioned elsewhere, this has been good motivation for us to revisit this clause in the Timescale License, as a bit has changed since we released it in 2018.
That said, I think a lot of this discussion misattributes the frequency of behaviors. For example, I bet the percentage of organizations that modify the source code of open-source databases like MySQL or Postgres before using them production is some tiny tiny fraction of 1%. While on the other hand, a huge fraction of TimescaleDB users (and MySQL/PG users) use it to build external-facing services and products that they in turn sell to their customers. So it was especially important to us that folks felt confident in their ability to use TimescaleDB in commercial settings.
I agree with the thoughts on discussion and appreciate that you are taking the time to answer this thread!
Just to clarify: while the percentage of those who exercise this particular freedom (modifying the source and then using it in production) is probably very very small, it is an important freedom to have for a much bigger part of customers. It is an insurance policy that gives users confidence that Timescale Corp will not become user hostile - or if it does, it is still possible to make critical security fixes until a more permanent solution is found.
But I do understand it is difficult to find a solution that would please everyone. I am very curious if you will manage to find a more generic solution to this problem. :) Good luck!
This is a completely valid point. I also appreciate the "right-to-repair" comment made elsewhere.
The original intent of this clause was so that we wouldn’t have to support modified versions that were deployed to production. (As pointed elsewhere, we provide a lot of free support in Slack [0].)
But that clause was written 1.5 years ago, and a lot has changed since then. There’s an internal debate right now on whether we should remove this restriction.
Is this really an issue in practice? For libraries (react, jquery) that can’t be used on their own as a product, a lot are adopting MIT. For a “service” - mongodb, redis, rabbitmq, Kafka, Postgres, etc. I have never run into an issue where I would be comfortable modifying something, rebuilding source and deploying into production.
_You_ may not have, but plenty of us have. Although, it's not as important when or how many times one has _needed_ to exercise one's freedoms, as it is to have them. But yes, plenty of us open-source users and supporters have exercised this very freedom. In fact, quite a lot of open-source contributions happen _because_ of this freedom: someone has an itch, they scratch it, and _then_ they upstream it.
Is that not possible under this license, to upstream a change? It sounds like you can’t put the change into production without it first being accepted, but not that you couldn’t contribute in other ways. I get the spirit of your argument. However, the issue is that companies are not able to make open-source compatible, permissive licenses that allow commercial use due to the new reality that creating a service and supporting a product are the main moneymakers. The code is not itself valuable to them but it is valuable as a holistic system because it’s an already built and adopted and production ready standardization of an idea.
1. I have to wait until upstream accepts it before I can use my own change in my own production.
2. I have to hope upstream accepts my change at all.
3. I have to give up the rights to my change if upstream accepts it at all.
High risk of vendor lock-in, and no right-to-repair.
----
> However, the issue is that companies are not able to ...
That's fine. I'm not asking any company to do something. I stating my reasons why I can't use what the company is offering. It sounds like what the company wants is to not license the code, but to sign a contract with me for a service. Would you sign a contract you don't like?
I run a bunch of forks in production that the maintainers didn't want to merge in. Then they made breaking changes... Software doesn't necessary need to be updated all the time. Adding more features often make the program slower and break stuff, the hardest part of dev is to say no to new features, its much easier to implement new features then solving though problems.
It's also a hedge against the company/project shutting down or pivoting in a radically different direction than you want.
All the things you listed are foundational pieces of technology that are incredibly risky+costly to swap out, so if you needed to, there's an option to continue with a fork. If the thing is popular enough, there's a good chance a community-driven effort will pop up, you can find consultants to work on code for you, or even a new company will form around it, letting you continue working on your core product (mostly) uninterrupted.
A slightly different perspective of our licensing/copyright approach is there isn't confusion about ownership, so if we were ever to decide on a more permissive license, we have the clean ability to do so.
Zero plans, but this also applies if a company like ours were to ever pivot/shut down: we as copyright holders can just relicense _all_ the code to be more permissive / dual licensed / etc, which is not the case for projects where individuals hold copyright over merged contributions. (Note the opposite is not true: we can't "unrelease" versions of the code already released under a more permissive license, such as how most of our code-base in Apache 2.)
In short, I understand your point, but I think there are actually multiple sides to this issue as well.
This cuts both ways. MongoDB Inc., back when MongoDB was AGPL-only, wouldn't accept contributions without copyright assignment, which allowed them to go proprietary when they did.
What you're saying here is that you have the ability to "give away" things that you own. Sure, but everybody has that. Even if you did not have all the copyrights to yourselves, you could relicense your portions and others could relicense their portions and others still could just replace the remaining portions with new code. In fact, this is precisely how OpenSSL recently relicensed itself completely.
Sure, what you claim you're adding here is a bit of convenience, but it's just that. When compared to the possibility that your company could "pivot/shut down" _without_ doing anything to the code, it's convenience-for-you vs. risk-for-your-customers.
Disclosure: I work on Google Cloud (but am glad to see you protecting your rights to your software).
The conversation down thread though raises an interesting point: why does the license say you can’t run modifications in production (under any circumstances) versus some sort of “for commercial purposes” clause? It seems to me like it’s infeasible to have actual contributions if someone isn’t allowed to have a patch, carry it forward, and attempt to upstream it over time.
I assume the intent / goal of your license is to prevent people (AWS, Azure, GCP) from taking your code and offering it as a service. I don’t disagree with that. I think it’s also fine to prevent even small companies from saying “and now we wish to be the TimescaleDB company!”. But it seems strange to also prevent non-commercial usage to run patched versions.
Lawyering is hard, but is there a clear reason against patched non-commercial?
It's a good question. How do we classify non-commercial? Is a telecom company using timescaledb for internal time series storage non-commercial although it is directly supporting a commercial offering (maybe mobile traffic platforms)?
I do get the direct commercial inference. What about the indirect ones? Just about anything in production is directed towards supporting some sort of commercial offering.
We took great effort to try to draw a clear line within the actual Timescale License language [0].
Usage is permitted, as long as:
the [end-]customer is prohibited, either contractually or technically, from
defining, redefining, or modifying the database schema or other
structural aspects of database objects, such as through use of the
Timescale Data Definition Interfaces, in a Timescale Database utilized by
such Value Added Products or Services.
In other words, if your service just provides DML access (read/write/modify), then that is permitted, while DDL access (modifying/creating schemas) is not permitted.
And in fact we already have thousands of companies building commercial applications on top of Timescale Licensed software (while adhering to the license).
If I'm reading https://github.com/timescale/timescaledb/blob/master/tsl/LIC... right, then for a SaaS company -- not necessarily a database-as-a-service company -- section 3.11 states that SaaS company can't run Data Definition Language (DDL) commands like CREATE, DROP, ALTER, TRUNCATE, COMMENT, and RENAME.
So if I need to adjust my schema using ALTER TABLE, how would I do that and stay license compliant?
Or if I'm running out of disk and need to run DROP TABLE, is my only choice to simply get more disk space rather than dropping tables?
Some of our customers will need their own unique schema, and will need their own tables. So, how would we even run CREATE TABLE and stay compliant?
We use and love Timescale, so we've been paying attention to this feature. We currently use the open source version and it's very nice.
Would you mind clarifying a bit, because the blog post doesn't really explain: how much of TS do you expect to remain Open Source versus proprietary? Is the idea here that you will switch the entire project to this new license (i.e., this means you're killing the open source project), or is the idea that you'll continue to work on the open source version (but with the enterprise functionality now available under the new license)?
The omission of this really important information in the blog post makes me suspect that this is, in effect, the end of OSS Timescale. I'd love to be reading that wrong! If the core will remain open source, you should consider mentioning that in the post.
Hi JeremyNT: We've never "changed" any Apache-2 licensed code to TSL-licensed code. And in fact, we've recently basically eliminated most of our enterprise features (read: paid only) and converted them to community features (read: free under the TSL).
So I'm curious: Why do you use only the Apache-2 version of TimescaleDB rather than the Community version?
(I realize that I'm saying "TSL-licensed" versus proprietary, because I'm not sure what it means to think of the code as proprietary when it's all source available on github, people can contribute, and 99% of companies just use it for free.)
First, thanks for replying. But, I'll note you didn't answer my question, which is about the future of the current open source codebase. Knowing that you haven't changed the license on such code yet is great, but that doesn't speak to the future direction for that codebase.
EDIT: I just saw this reply to another child, which addresses this concern for the core timescale, I think! [0]
To be clear, I don't think the license change as described is a blocker for my org, given their use case. Indeed, it may be an near term win, as they will likely be able to take advantage of the new features that you are placing under this license.
That said, I always prefer open source code that I can modify myself. Open source licenses guarantee that the code can't be "taken away" from me, that I can integrate a technology without concerns for the sands shifting under me. If a company goes away, I and others can keep on working on it, and it can live on even if the original authors decide to no longer maintain it.
So, as an example: say Timescale changes the license of all its code under this new license tomorrow, then happily adds features and changes some fundamental things over the next few years, and then is bought by Oracle, who decides to take it fully proprietary under a new license that is more restrictive than the TSL license. This would make a fork unlikely or at least very difficult to get going!
Just to clarify, the Timescale License was originally announced in December 2018. At that time, we didn't "relicense" any existing Apache-2 code, we just said some future features will be licensed under the TSL rather than Apache 2.
Many people over the past year+ knew that we were working on a distributed version of TimescaleDB; a common question was whether the distributed TimescaleDB would be paid-only (like some other time-series database alternatives) or whether it would also be free.
This announcement was meant to say: Yes, multi-node TimescaleDB is free, not paid.
So there wasn't any _new_ license announced today; just that multi-node TimescaleDB would be released under the TSL rather than as a paid-only option, which many of our users had assumed.
Not JeremyNT, but from my perspective, avoiding vendor lock-in is my number one reason to favor open source software. Avoiding vendor lock in is very important to me when I am evaluating options.
Just because your business model and my business model currently align, does not guarantee that they will always align. I don't want you to be stuck being my vendor if I am not a good customer for you- and I don't want to be stuck using you as my vendor if you are no longer able or willing to provide the product I need.
I admire the `except the cloud providers`. What GCP, Azure and AWS have done with paid Redis offerings make me curious what @antirez (Salvator) thinks about it. They're making billions off Redis while the core contributors get nothing. I guess they agreed to it by having their work as BSD licence.
I do think there is a place for royalty based software. Free for personal and development use. For production use, you pay a small royalty to have it on the cloud. It's a win win on both sides. User gets managed offering + support + ability to look at source code, db/service authors get sustainable revenue, cloud providers get their usual PaaS cut.
> make me curious what @antirez (Salvator) thinks about it. They're making billions off Redis while the core contributors get nothing. I guess they agreed to it by having their work as BSD licence.
"About myself, I’ll keep writing BSD code for Redis. For Redis modules I’ll develop, such as Disque, I’ll pick AGPL instead, for similar reasons: we live in a “Cloud-poly”, so it’s a good idea to go forward with licenses that will force other SaaS companies to redistribute back their improvements. However this does not apply to Redis itself. Redis at this point is a 10 years collective effort, the base for many other things that we can do together, and this base must be as available as possible, that is, BSD licensed."
I'm a huge fan of Splunk but always want to keep my eye open for alternatives. My use case is mostly security analytics against event content and patterns, and for that the Splunk Processing Language is very well suited.
That said I find it's fairly tedious to do a lot of time-series analysis and pattern discovery/anomaly detction across rich event models (think aws cloudtrail events).
Anything TimescaleDB can help with here? Are there case studies you can point us to? It feels like there is probably home for both just in my domain and quite obviously in the broader context of large enterprise ops/security.
Here is a doc on using TimescaleDB as a horizontally-scalable, easy-to-deploy, operationally-mature data store for Prometheus data (i.e., metrics), put together by another of our engineering teams:
Building an open-source analytical platform for Prometheus
I use TimescaleDB for mass storage and query of security events (up to 100s of millions) - the speed of queries and aggregate queries even on a single node is very impressive.
I haven't done anything with regards to anomaly/trend detection yet, but it's planned. Not really sure where you see a database (TimescaleDB) fitting into that though?
We're in that scale domain where everything is a pain in the ass but not obviously outside the scope of commercial solutions. I just checked and we're averaging ~500k events per second in the five areas I'm interested in.
I feel that we could probably use a time-series database to reflect our streams as 'last observed state' type collections as well as do the aggregations that we need to feed back into anomaly detection.
I'd like to also use something like that to create a 'heat map service' where you can feed a property/window/range and get back scalar for color coding and possibly a slice of values for sparkline type UI.
Without getting hands on, though, it's hard to say for sure.
It wouldn't be me reaching out but I'll put a bug in the right person's ear. This has been something I've been thinking about for a bit, the HN post is just a bit serendipitous.
Did you discuss the license with any free/open source software organization to get their input? Are other products adopting similar licenses, and if so, what are the differences?
I'm not fundamentally opposed to any deviation in licensing. But I am much less inclined to use software if the license hasn't been reviewed/endorsed/used by others widely.
Also, I'm curious if you considered just addressing the license to anyone excluding $COMPANY_YOU_ARE_WORRIED_ABOUT. Not sure what the implications of that would be, but it would be interesting. Realistically, a startup has to prove itself in a relatively short period of time, and in that short period of time there are generally very few competitors (typically zero or one) that threaten your business model itself. If you have 10 competitors that actually threaten your business model, your business has already proven itself ;-)
We launched the Timescale License about 1.5 years ago.
At that point we did engage with multiple folks in different organizations, but realized that we had a business to build and didn't have the bandwidth for all the politicking usually required to establish a standard.
However, if someone wants to propose a standard around the principles of the Timescale License, then I would completely support that discussion.
Also, so that we are 100% clear: most of our code base remains licensed under Apache 2. The main thing that this post is trying to convey is that multi-node will be free under the Timescale License.
(There is discussion elsewhere in this threads about whether you give DDL access to your users, i.e., they themselves define schemas, tables, indexes, etc. Otherwise, Datadog is primarily a Value Added Service over just the database; huge numbers of companies utilize Timescale for building their SaaS services.)
I imagine the model will be effective because it will help you to increase adoption, provide an easy path to transition to your cloud managed version and likely a lot of support/training opportunities.
Plus, it makes it easier to just start with Timescale even if you don’t need it because we all like to preoptimize.
> I think we're too hung up on OSI open source licenses.
I disagree. The free software criteria were defined as they are for a reason. AWS and other cloud vendors are taking advantage, but that is not a good reason to give up on the ideals of the movement. I would be much more comfortable contributing to timescaledb if the license had a date at which it expired to AGPL or some other OSI/DFSG/fourfreedoms license.
Ideals for the sake of ideals doesn't resonate with me, in software or outside of it. Give me a practical reason.
The expiry clause is interesting, but I'm not sure it matters in practice. Not many people want to use code several years out of date instead of the current version just because of practically no additional freedoms. Except maybe a potential competitor. I'd be happier to have a reversion to an OSI license if the product stops being maintained or gets acquired and shutdown. That's always a risk with young companies.
> Ideals for the sake of ideals doesn't resonate with me, in software or outside of it. Give me a practical reason.
Fair enough.
I am mainly looking to prevent the timescale corp. from coasting off of their long past work. In such a scenario, the several years out of date code would not be that much different from the current version, because income for timescale would not have been spent on meaningful improvements.
Another benefit is as a check to the timescale corp. in case they start acting up. In such a case, a large contributor or user might pick up maintenance of an old version and start porting it to newer postgresql versions as leverage. Users of TimescaleDB could be reassured that timescale will not abuse them through the licensing situation because there is some backup plan.
Your legal ability to apply patches to the software you run to better suit your needs, and in extreme cases to fork and continue development if the maintainers can't or won't accommodate your usecase. It's kind of the entire point of the open source movement.
In principle, yes I agree, but in practice: no! AGPL is a minefield, it's a shame it is, but it is. The spirit of GPL (and AGPL) was that you could do whatever you want but you have to release the source, but they have continually been handicapped by the evolution of software. Existing licenses are not appropriate for SaaS businesses, they are either too liberal or too restrictive. What you're seeing here is the right move: I can use this for free, for personal use or business. I can't sell it directly -- why should I be able to? It's sleazy and encourages closed ecosystems (AWS, GCP, etc.).
The spirit of free software is very much alive in this decision, methinks.
> The spirit of free software is very much alive in this decision, methinks.
From the text of the Timescale License, clause 2.1 (d): "A license to prepare, compile, and test Derivative Works of the TSL Licensed Software Source Code solely in a Non-Production Environment". Further along, in section 2.2, the following prohibition is laid out: "You agree not to, except as expressly permitted in Section 2.1(d), prepare Derivative Works of any TSL Licensed Software"
That removes the freedom to run your own modifications in production. Pretty incompatible with the spirit of free software.
> ... it encourages users to upstream their changes.
Not really, and not as much. Not really because one cannot begin using their own changes unless and until the upstreaming process concludes successfully. Not as much because, unlike with open-source licenses, one does not get to keep their copyright.
One of the important reasons I personally use and support open-source is the freedom to not only inspect (which the TSL provides) but to also not have to ask someone else and wait on them to make any changes I need to the software I use. The restriction against production use prevents that.
One of the important reasons I personally don't mind contributing to open-source, is the fact that I get to retain my rights.
> This is only for the parts of the code that are licensed under the Timescale license (most code is not).
This is a moot point because without the parts that are TSL licensed, we'd not be having this discussion.
> The spirit of GPL (and AGPL) was that you could do whatever you want but you have to release the source, but they have continually been handicapped by the evolution of software.
This was largely my assessment, as well. And the motive behind the work that led to Parity (https://paritylicense.com).
The argument of go 100% FOSS or not reminds me of the argument of free market vs regulation. Yes a free market sounds ideal, but in practice, a few major players take advantage and everyone who isn't them is left out in the cold.
The same applies to FOSS vs Proprietary software. If everyone did FOSS to the fullest extent, and anyone building products on FOSS made their product 100% FOSS, then it wouldn't be nearly as big a deal if Amazon took TimescaleDB and sold it, because anything they did would be available to everyone else.
But that isn't how it works. They just wait until a market establishes itself and swoop in and do something someone else is already doing successfully, but better by using their massive resources behind it. Honestly it's a lot like embrace, extend, extinguish. I'm not saying this to shit on Amazon, I don't necessarily think it's intentionally malicious, it just often ends up hurting a lot of organizations building FOSS.
I think of it this way: A lot of people agree that tech giants are increasingly becoming more powerful as they expand their reach, often due to their ability to simply buy or kill the competition through undercutting with their massive cash stockpiles accumulated through their primary business.
For FOSS, it is the same strategy except the competition isn't being "bought", it's simply being taken (FOSS) and made into something "better" with their nearly unlimited resources, and then undercutting the original proprietors.
> The additional restriction in the timescaledb license that you can't run a paid database as a service offering affects hardly anyone negatively (AWS).
That's not the only additional restriction. The Timescale License does not give you the freedom to run modified versions of the code in production. Pretty big difference compared to Open Source.
I think I agree with this sentiment with one caveat: (I didn't read their license in detail, but really for licenses like this which seems to be the broader topic)
Today Timescale offers Timescale-as-a-service, so this allows them a kind of soft-monopoly on being a paid provider for this, but do these licenses generally contain a provision such that if they no longer provide that service themselves, whether from going out of business or a pivot to another product, then someone else could step in and offer it in the future? Closed source products have often had a kind of source-code escrow arrangement so that if they go out of business, you're not stuck unable to fix your own bugs, but similarly, if part of the value in adopting it is that the paid service IS available, knowing that someone else can offer a compatible service if they disappear might be a nice reassurance for the license to offer.
I don't think (most) people take issue with source available licenses per se, but rather attempts to misapply the term open source to them. The Timescale license is great if it allows them to provide source code and a free product while still operating a successful business! That doesn't make it open source though. (And to their credit, Timescale doesn't make that claim.)
I think you greatly misunderstand the implications of the Timescale License. Others have pointed out additional restrictions, but as to the restriction you mention, if it were the only additional restriction:
If I build my product on such software, host it on the official service, and latter their business model changes to no longer be a good fit for my business model (or if my model changes), I am stuck hosting the fork myself- I can't pay some one else to host it anymore. Now lets say that I am not alone- but there are several other customers in this situation- we can't band together to create a fork with a thriving, healthy community, including hosting options.
Avoiding vendor lock-in is my number one reason for choosing Open Source software! If your non-open source "Diet" Open license license leads to vendor lock-in, it is not much better to me than any other proprietary license.
I think you are being a bit generous on the causes and effects. The #1 concern of a product like this is adoption. Not in some unsustainable subsidized taxi or food delivery type thing, just that data stores are naturally sticky and come with long term opportunity as usage grows. If they dominate the time series use case, and there is good reason to believe they will, earning revenues will fall out of that in a multitude of ways. I will believe the license proves prophetic if a major cloud or minor cloud with deep pockets does a SaaS license. Until then, this is a pretty standard FOSS+support business model that became popular in the past decade.
I don't think it's FOSS+support, more like a cloud database service offering. Time will tell I suppose as to which business model dominates. And one could make the argument the service is a form of support.
My own experience is the majority of people are using it on their own cloud instances, on prem, or embedded. It's not obvious the first-party aaS will catch on right now, just like the novel license. I don't mean any of this negatively, it's clearly a well run business by smart people that are experimenting with revenue models and trying to achieve the fair outcomes for customers and the business.
What this doesn't touch on is the reality of selling enterprise software as an early startup - being open source is a hard requirement for many buyers.
> It affects us all positively
Except for users who want the highest quality hosted Timescale possible and see this license as an attempt to prevent others from creating better offerings. The open source companies that compete with cloud offerings are not exactly struggling.
Don't get me wrong, I don't have a problem with them using their position to prevent competition, but when people sell it as 'best for all the users', it feels disingenuous.
I work with a lot of time series tables in Postgres, albeit not at the scale that this targets. (some millions of rows, distributed sparsely over time, on which the median insert/update size is <10, but with some tail-end inserts/updates touching >200k rows).
I like concepts behind TimescaleDB, and understand the value it's adding to vanilla Postgres. We have our own implementation at my company and it's quite good for our purposes, but it would certainly struggle at TDB's targeted scale.
As I understand it (correct me if I'm wrong, this is my impression from the marketing page), TimescaleDB is "more than an Extension" to Postgres, because it rewrites some of the Postgres internals (query parser, etc)?
If this is true, I'm curious, was it not possible to package the same results into an extension? What was the decision process like? Could the concept not be upstreamed into Postgres? I'm relatively ignorant of this side of the community, so please forgive me if this question is naive.
Finally, if it is "more than an extension", does this imply that TimescaleDB is a fork of Postgres, with all the risks to adoption that entails?
TimescaleDB is packaged as a Postgres extension. The "more than an extension" is meant to highlight that TimescaleDB makes changes and adds capabilities far beyond what the typical extension does.
Am i the only one who thinks it is really cool that this will be free but hesitant to use it?
I have seen so many distributed data storages fail in a multitude of ways that i just dont trust anyone anymore. After 2-3 years they may have ironed out most bugs and i can evaluate again whether i do trust their implementation to store my data safely.
This is why we built this on top of Postgres. It allows us to inherit Postgres reliability.
While I can't guarantee there won't be bugs ;-), we have found that building on Postgres has enabled a much higher level of reliability than other time-series databases.
One thing to recognize that the lowest-level storage guts of TimescaleDB is Postgres, which really provides a super-stable, reliable foundation. This obviously doesn't avoid all distributed bugs, but it's a huge benefit.
It's also the case that TimescaleDB provides real benefit and scale even in "single-node" form, which allows for traditional primary/replica replication (for fault tolerance / HA / read replicas and scaling), especially when coupled with our native compression.
So we have users storing 100s of billions of rows in hypertables in the non-distributed version of TimescaleDB as well, including in our fully managed cloud service.
>All of these capabilities are being released under the Timescale License, our source-available license that permits broad usage, except for where organizations are providing TimescaleDB-as-a-service.
So it's not open-source because AWS hasn't been nice with ElasticSearch and they don't want to be in the same situation?
That sounds like open source to me, I bet they're just being really conservative about saying "open source" because there's been so much backlash at MongoDB/Cockroach/etc for similar restrictions.
It's restricted OSS because AWS takes things, runs them, and eats up all the potential revenue.
> That sounds like open source to me, I bet they're just being really conservative about saying "open source" because people there's been so much backlash at MongoDB/Cockroach/etc for similar restrictions.
Open Source has a very defined meaning. Please read up on the history of open source and source available licenses before saying it is all the same.
We've been defending it against a number of attacks and we will probably do it again, so please don't get on the wrong side of history ;-)
Note: this is not a criticism of Timescale. I can see what they did and respectfully did not pretend it was Open Source. Compared to a proprietary license their license opens a llt of possibilities.
The term "open source" was marketing to take advantage of Netscape releasing their source code. Since since, everyone seems keen on trying to usurp its definition for whatever personal perspective they have that week.
To the rest of the community outside of the OSI and FSF (which is 99%+ of the software community), this is a perfectly acceptable example of "open source" that we're all that much richer for having.
The Timescale license checks almost all the boxes of the OSI definition (and I'm not certain how denying cloud providers specifically violates any of the language):
> To the rest of the community outside of the OSI and FSF (which is 99%+ of the software community), this is a perfectly acceptable example of "open source"
Please review clause 2.1 (d) and section 2.2. The freedom to run your own modifications in production is not granted. This is a big deal, and rightly a deal-breaking omission for something to be acceptable as either open source or free (as in freedom).
> The Timescale license checks almost all the boxes ...
_Almost_ all, but not all. Some things work only when all of them work, like freedoms.
Of course, this discussion is not about code that _is_ open-source. It is about the code that isn't but some people would like us to believe is — and I'm quoting my parent comment here — "a perfectly acceptable example of "open source"".
> The Timescale license checks almost all the boxes of the OSI definition
By my reading, it fails most of the interesting ones, particularly points 1, 3, 4, 6, and 9, due to the field-of-use restrictions and the prohibition on distributing modified versions.
> this is a perfectly acceptable example of "open source"
No! You cannot modify and give away the code or even run your own modifications in production. That is pretty far from both
the letter and the spirit of open source..!
> that we're all that much richer for having.
Agree, thanks Timescale members for sharing it! Also I'm happy that you on the team have decided not to pretend it is Open Source.
My beef is only with people who want to pretend that it is OK to say that software that cannot be modified and used/distributed is open source.
Frankly the pedantry around the definition of Open Source, which I understand, is incredibly nauseating. Sure, this isn't Open Source by "the definition", but it's close enough if you squint. The difference doesn't impact almost anyone. Are you or someone you love impacted by this licensing decision?
Throw in an expiry date, dual licensing (pay to play seems more than fair) and I'm content. History be damned.
I'm so sick of it. Just because one group defines it a certain way doesn't make it gospel. The OSI has no power over me.
> Frankly the pedantry around the definition of Open Source, which I understand, is incredibly nauseating. Sure, this isn't Open Source by "the definition", but it's close enough if you squint. The difference doesn't impact almost anyone. Are you or someone you love impacted by this licensing decision?
If we had accepted this line of reasoning Open Source had been a synonym for source available by now.
For those who wasn't there when it happened you just have to believe us old timers that some companies tried to pass of all kinds of almost-open-source-but-you-are-still-trapped deals almost since the term was coined.
Now even Microsoft have learned but it seems the war against misinformation isn't over yet.
> Throw in an expiry date, dual licensing (pay to play seems more than fair) and I'm content. History be damned.
Fine. I'm not against everything except open source. and I'll happily use it but why why why do you have to call it something that means something else?
Yep. I've been putting code out for free (MIT) for over a decade. I'm an open source developer. But I'll decide what gatekeeping I participate in for myself thank you
Maybe we just need to have "little O" open source. Unless someone's saying "Open Source" don't get in on splitting these hairs
> pedantry around the definition of Open Source, which I understand ... it's close enough if you squint
Then I think that while you might have read and understood the definition, you seem to have missed the broader idea behind it.
> pay to play seems more than fair
That's irrelevant. Sure it's fair, but it's fundamentally _not open source_.
It's not about gospel or having power over you. It's about communication and well established meanings. Calling a fish a bird doesn't make it a bird, and resisting attempts by others to redefine the language I use is a Good Thing as far as I'm concerned.
(To their credit, Timescale uses the terminology correctly and I greatly appreciate that. I also think they picked the right licensing model given how things seem to work these days.)
> you seem to have missed the broader idea behind it.
Funny, I feel that you might have missed the point as well.
> It's about communication and well established meanings.
Yes, but it's fundamentally impossible to bucket various licenses into what they do and do not do, and what obligations or burdens they place upon the end user. What is Open Source? The OSI includes GPLv3, which is certainly not "free" for a ton of commercial uses.
Let's peek at the OSI's FAQ:
> This history has led to occasional confusion about the relationship between the two terms. Sometimes people mistakenly assume that users of the term "open source" do not intend to communicate a philosophical point of view via that term, even though many actually do use it that way. Another mistake, which has occasionally been seen since about 2008, is to assume that "free software" refers only to software licensed under copyleft licenses, since that is how the FSF typically releases software, while "open source" refers to software released under so-called permissive (i.e., non-copyleft) licenses. In fact, both terms refer to software released under both kinds of license.
> Neither term binds exclusively to one set of associations or another, however; it is always question of context and intended audience. When you sense a potential misunderstanding, you may wish to reassure your audience that the terms are essentially interchangeable, except when being used specifically to discuss the history or connotations of the terminological difference itself. Some people also prefer to use the term "free and open source software" (or FOSS, FLOSS [free, libre and open source software]) for this reason.
Okay so, let's recap:
A) it's confusing.
B) the terms are often interchangeable but context matters.
C) not everyone agrees.
At this point, the value of any "Open Source Definition" is severely diluted for any considerable purpose. Just because a license meets OSI's definition doesn't mean I should make any assumptions about what I can or cannot do with it, so what value does this provide beyond adding confusion?
> Can I call my program "Open Source" even if I don't use an approved license?
> Please don't do that. If you call it "Open Source" without using an approved license, you will confuse people. This is not merely a theoretical concern — we have seen this confusion happen in the past, and it's part of the reason we have a formal license approval process. See also our page on license proliferation for why this is a problem.
I'd argue the confusion is already present. A license is a license is a license. OSI is an organization that says "please don't call your thing Open Source if it doesn't meet our standards" -- I don't care. Whether or not I comply with this polite request changes nothing, offers me no direct benefits.
To be clear: I see value in OSI, and everything they provide. They have definitely provided a net benefit to the world. I do not see value in pedantry around the term "open source", those words are so plain and ordinary that gluing any tertiary meaning to them is foolish. It's as subjective as "good code".
You are either severely misunderstanding or misrepresenting the FAQ that you quoted. It's simply clarifying that copy-left and permissive licenses are both compatible with the meaning of the term open source (ie they're disjoint subsets).
Regarding the "please only used approved licenses bit" - you've again missed the point. There is an effectively infinite set of possible licenses which satisfy the meaning of the term "open source" as it is currently used. The OSI is merely pointing out that it will make everyone's lives easier if developers try their best to use one of the licenses that already exists.
Licensing is a complex topic and not everyone is a lawyer or can afford to consult one (particularly for hobby and volunteer projects). If every project out there used a unique license it would be a complete nightmare. For that reason, it's better for everyone if at least some minimal attempt is made to use a well established license whenever possible.
> The OSI includes GPLv3, which is certainly not "free" for a ton of commercial uses.
Incorrect. Commercial customers are free to use, modify, and redistribute just like everyone else. They can even sell a product based on it - they just can't keep any changes they might make closed source if they do so.
> the value of any "Open Source Definition" is severely diluted for any considerable purpose
Not at all. The meaning is quite clear - I'm either free to use, modify, and redistribute it or I'm not. That's it.
Take a look at some of the approved licenses - for example AGPL, GPL, MPL, and MIT. In _all_ cases I'm free to modify and redistribute. In some cases I might be required to make my changes available, but never am I barred from making use as I see fit. Source available is simply not the same thing.
> I do not see value in pedantry around the term "open source", those words are so plain and ordinary that gluing any tertiary meaning to them is foolish. It's as subjective as "good code".
The fact that you see it as subjective is the misunderstanding that I refer to. It is _not_ subjective, but convincing others that it is can sometimes confer monetary benefits. This is precisely why such pedantry exists in great quantity surrounding the topic.
I don't really want to debate with you here, and if anything I feel like the points you are hammering on just fortify my stance that this topic is a waste of breath.
> The OSI is merely pointing out that it will make everyone's lives easier if developers try their best to use one of the licenses that already exists
Maybe, maybe not! Perhaps existing licenses are not sufficient. I doubt we'll be using the same licenses in 100 years. I bet there are better licenses waiting to be authored -- maybe TFA's license is the future?
> > The OSI includes GPLv3, which is certainly not "free" for a ton of commercial uses.
>
> Incorrect. Commercial customers are free to use, modify, and redistribute just like everyone else. They can even sell a product based on it - they just can't keep any changes they might make closed source if they do so.
Look, I live in the real world. If I link to a GPLv3 library in my product, I have to release all of my source code. This is potentially a pretty big burden on a lot of folks. Sure, there's a lot of legal FUD, but unfortunately, while FUD, has a very real impact. Lawyers won't sign off on a lot of this stuff.
> Take a look at some of the approved licenses - for example AGPL, GPL, MPL, and MIT. In _all_ cases I'm free to modify and redistribute. In some cases I might be required to make my changes available, but never am I barred from making use as I see fit. Source available is simply not the same thing.
AGPL is defacto banned at most companies, such as Google, which has this to say[0]:
> The license places restrictions on software used over a network which are extremely difficult for Google to comply with. Using AGPL software requires that anything it links to must also be licensed under the AGPL. Even if you think you aren’t linking to anything important, it still presents a huge risk to Google because of how integrated much of our code is. The risks heavily outweigh the benefits.
Gee, that sure sounds "open" to me.
Grouping AGPL and MIT in the same bucket is borderline harmful -- they're wildly different! This is what I mean when I say the term "open source" is a fuzzy descriptor. You can't have a fuzzy descriptor and then complain about things which don't fit your worldview. That's what the OSI basically does in a nutshell with their "Open Source Definition".
> It is _not_ subjective, but convincing others that it is can sometimes confer monetary benefits. This is precisely why such pedantry exists in great quantity surrounding the topic.
I agree we shouldn't accept anyone abusing the term for profit. At the same time, I don't think it's appropriate to conflate themes of "encumbered" or "burdensome" or "infectious" with the word "open" -- that is just as misleading and confers a different set of benefits that are not universally appreciated.
And one final thing: Just because I disagree with OSI's terminology doesn't make me incorrect. Statements like that come off as abrasive and trend towards a hostile, gatekeeping tone. The term you're looking for is "I disagree". It's easy to interpret your worldview as very small. If I polled a group of random software developers about what open source meant to them, I would be surprised if any of them referenced the OSI definition. Most developers would, sadly, conclude "stuff on github".
You say you don't want to debate here, but from my perspective you're actively spreading misinformation and FUD.
I agree that there might be better possible licenses out there - you might notice that I described the set of possible open source licenses as being infinitely large! The point is that you should go with an existing license for the good of the community unless you run into a limitation for your particular usecase that isn't adequately addressed.
Note that this has happened before! It's how the MPL (non-viral) and AGPL (anti proprietary SaaS) came about for example.
The one thing they all have in common is that they protect the user's right to modify and redistribute the software they receive. Yes, that necessarily places some limits on things in order to disallow abridging such rights for downstream users.
Moreover, there is indeed a balance between the degree to which such rights are preserved versus the number of restrictions the license must impose in order to accomplish its purpose. This is why a range from copyleft to permissive exists, with the MPL squarely in the middle. The presence of such nuance doesn't make the definition fuzzy or unclear though - there is a consistent protection of user freedom throughout, with restrictions existing only to further this goal. (Compare this to source available licenses, which carry additional restrictions unrelated to preserving user freedom.)
> Grouping AGPL and MIT in the same bucket is borderline harmful -- they're wildly different! This is what I mean when I say the term "open source" is a fuzzy descriptor. You can't have a fuzzy descriptor and then complain about things which don't fit your worldview. That's what the OSI basically does in a nutshell with their "Open Source Definition".
Again, this is a factually incorrect statement. You are verifiably and demonstrably wrong here. The definition of open source is consistent, and all of those licenses fit it. Source available licenses, on the other hand, do not.
You mention a bunch of objections you (and others) have to AGPL, GPL, etc. That's fine, and those licenses may not be right for you, but that doesn't somehow make them "not open source". Trying to shoehorn in some other definition by claiming that having issues for you or someone else makes them "not open" isn't a valid line of argument. The meaning of the term is very well established at this point and you are misusing it.
I realize you have (apparently) an ideological axe to grind against viral licenses. I don't particularly like them either, but that doesn't magically change the definition of an established term.
> The term you're looking for is "I disagree".
No, I used precisely the term I was looking for when I said that you were incorrect. It is true that I disagree with all of your following statements as a result though! You might legitimately hold that the term "open source" has a different definition than the one I use, but (as you might have gathered from what I wrote) I'm not even remotely convinced. In fact I made it clear that I hold such views to be ignorant, and that I believe you have fundamentally misunderstood the entire point of the open source movement. I can see how such a view might come off as abrasive, but that doesn't change it.
I am glad to hear there are more and more people that accepts those new kind of open source, not the official definition of the OSI but more like the global idea behind it
> not the official definition of the OSI but more like the global idea behind it
The trouble is that it _isn't_ the "global idea behind it" - the idea behind open source is the unrestricted freedom to modify and reuse. Closed source, source available, and open source are quite distinct from one another. Terminology sometimes has well established meaning and that can be very important for effective communication.
I actually like proprietary source available software, but it isn't the same thing as open source and anyone claiming otherwise is simply ignorant of the very well established meaning of that term. Pedantry can be called for, particularly when a monetary incentive exists to confuse and deceive. Consider for example that the definitions of many food products are defined in law and regulated to protect the consumer from deceptive vendors.
(To their credit, Timescale gets the terminology right and I appreciate that. It's people in the HN comments section that are incorrectly throwing the term open source about and completely missing the point.)
I've heard several points for not choosing ClickHouse and going to TimescaleDB as an extension of PostgreSQL:
1. As it is already mentioned, if metadata (data about timeseries) are already in PostgreSQL, then it is nice to stay in the same database engine for querying data with joins of both metadata and timeseries data, so there is no need to implement integration of the two source in the application layer.
2. Also related to the first item: advantage of already knowing PostgreSQL API. ClickHouse has different management API, so it is necessary to learn. While if you know PostgreSQL, you don't need to learn new management API and only timeseries specific API of TimescaleDB.
3. ClickHouse doesn't support to update and delete of existing data in the same way as relation databases.
Then the final decision still depends on your need.
The biggest reason is if you're using Postgres already as an operational database and want some timeseries/analytical capabilities.
Originally Timescale wasn't much more than automatic partitioning but with the new compression and scale out features, along with the automatic aggregations and other utilities, it can actually be pretty good overall performance. It still won't get you the raw speed of Clickhouse but instead you get all the functionality of Postgres (extensions, full SQL support, JSON, etc) and can avoid big ETL jobs.
Another PG extension is Citus which does scale-out automatic sharding with distributed nodes but is more generalized than Timescale for handing non-timeseries use-cases. Microsoft offers Citus on Azure.
Yes, along with Aiven and a few others. Unfortunately the free community license is great but require either the Timescale Cloud or running it yourself.
Another thing to mention is that TimescaleDB has much stronger ACID guarantees than ClickHouse. Which means you get more clear semantics for consistency
If you use PostgreSQL, then it feels natural to add TimescaleDB extension and start storing time series or analytical data there alongside other relational data.
If you need effectively storing trillions of rows and performing real-time OLAP queries over billions of rows, then it is better to use ClickHouse [1], since it requires 10x-100x less compute resources (mostly CPU, disk IO and storage space) than PostgreSQL for such workloads.
If you need effectively storing and querying big amounts of time series data, then take a look at VictoriaMetrics [2]. It is built on ideas from ClickHouse, but it is optimized solely for time series workloads. It has comparable performance to ClickHouse, while it is easier to setup and manage comparing to ClickHouse. And it supports MetricsQL [3] - a query language, which is much easier to use comparing to SQL when dealing with time series data. MetricsQL is based on PromQL [4] from Prometheus.
We spent about 6 months looking at pretty much every database tech on the market, cockroach, clickhouse, influx, voltdb, memsql etc were top contenders, there was an outdated article on medium.com (by victoria metrics) which slammed TimescaleDB for its disk usage, we did not realised it was biased, so we dropped TSDB dropped off the list, but we saw a email about their compression segment by device_id, and gave it a shot, ....we implemented it, 5 months after our production release we now have outstanding performance and compression (95x)
We are planning to move the rest of our databases to TSDB now as it ticks our boxes our use case is HTAP, not solely OLAP and OLTP
I'm super excited about this news, but TSDB please work on allowing us to put data over 1 year old on slow disk seperate servers, so we can keep the hot stuff on the NVME servers, once you get this sorted it will be the perfect fit for us.
> TSDB please work on allowing us to put data over 1 year old on slow disk seperate servers, so we can keep the hot stuff on the NVME servers, once you get this sorted it will be the perfect fit for us.
ClickHouse recently added multi-volume storage for exactly the use case you describe. [1] It's a great feature.
Glad to hear it is working out for you! I'll relay the request re: old data. But please also feel free to email me directly at ajay (at) timescale.com (or email support (at) timescale.com) if you have any follow up questions / requests.
I understand that the Timescale license can't be utilized by cloud providers, but what about others who need a timeseries database for their SaaS offering? Is this permitted as long as you aren't marketing a hosted TimescaleDB solution?
That would be permitted, as long as the service isn't just a "TimescaleDB-as-a-service." [0]
For example, if the service allowed users to only make DML changes (access / modify data) then it is ok, but DDL changes (creating / modifying database schemas) is not permitted.
In fact, we already have 100s of SaaS companies using TimescaleDB as part of their offering.
More specifically, the text of the license says you can't offer any service that is "primarily a database storage or operations product", even one that doesn't allow schema modifications.
If that wasn't what you intended to prohibit, you should probably fix the wording of section 3.21(i).
Timescale DB Core ( if there is such a thing ) is still available under Apache 2.0. So nothing has changed. You can use it just like any other open source project with no restriction.
Timescale DB multi-node, originally not free and only available in Timescale Cloud. Is now Freely available under the Timescale License, a source-available license.
Timescale DB multi-node and its license only forbid you to provide TimescaleDB "multi-node" itself-as-a-service. And does not allow running it with any changes that is not upstreamed. You can still resell any software or services built on top of Timescale DB multi-node.
TimescaleDB multi-node - was never before released, is now released for free under the Timescale License, a source-available license
There are other capabilities (e.g., gap-filling) that are also under the Timescale License, in addition to multi-node.
The Timescale License prevents "TimescaleDB-as-a-service" usage.
You can still run software / services on top of Timescale Licensed software, as long as you are not offering "TimescaleDB-as-a-service".
The Timescale License currently prevents running any modifications in production, but we are actively debating removing that restriction (as I mention elsewhere).
Does TimescaleDB support automated downsampling using various functions (min/max/mean/avg) and then during querying automatically picking the correct downsampled data? This is the biggest issue that I and others have with InfluxDB, that it doesn't do that, so the only convenient way to use it is just to expire all data outside the retention policy. Ticket here: https://github.com/influxdata/influxdb/issues/7198
It allows you to define aggregations that are automatically used when quering the raw table if the query matches, and it also allows you to drow the raw data with a retention policy but keep the aggregated form (https://docs.timescale.com/latest/using-timescaledb/continuo...)
OK, but it looks like I still have to define these aggregates manually. I was really more talking about the standard use-case that folks used to use Graphite / rrdtool for: Keep track of real-time high-fidelity metrics while still being able to query aggressively-downsampled historical data for comparison, and doing so without having to configure anything.
Hi @heipei -- one thing to observe is that Graphite & rrdtool are designed for a specific monitoring use case, while TimescaleDB is a more general-purpose time-series database.
So what that means is that TimescaleDB has mechanisms to make it really easy to define downsampling (continuous aggregates, data retention policies), and even have queries that transparency query across the historical aggregates and new raw data (real-time aggregates, which parent pointed to, which isn't supported by InfluxDB).
What the database _by itself_ doesn't do is automatically create certain continuous aggregates on metrics immediately, because frankly, users' needs vary so much.
That said, we have built stacks/solutions that leverage TimescaleDB and do precisely that. For example, we just released a design doc and beta around our refreshed native integration with Prometheus, that addresses an extremely similar use case to Graphite / rrdtool. Because now this is automated, it defines many of these things out-of-the-box, so you don't need to configure anything. Check it out and input welcome!
Thanks for the pointer. I truly understand that TimescaleDB is a general-purpose time-series DB and I understand that most use-cases are unique in that it makes sense to make these decisions about what and how to downsample consciously. However, I feel that there is a large audience of people who "just" want a database that they can point their system-metrics collector at (Telegraf), point their dashboard at (Grafana) and just hit "go", much like would with something like Datadog, and have the confidence that they can still scale the database if its ever necessary. Much like ElasticSearch provides default mappings (text/keyword/date/number), this would a great 80-20 solution for the default use-case of "I want to collect system metrics from my hundreds of servers and have a few sensible defaults about granularity, downsampling and data-retention, and only then will I start to worry about whether that data will eventually exceed my one-server deployment."
Yep, that's exactly what the "Timescale Observability" stack is about. Type "helm install", and a full stack is spun-up and auto-configures to scrape information. You have graphs up in Grafana within 2 minutes, zero configuration.
To be clear, it was developed in a branch so all the individual commits have been reviewed beforehand when landing on that branch. And the branch was rebased throughout the development cycle. This is just the final PR to merge that branch back into master :)
In the current version, you can execute `compress_chunks` on each of the data nodes and enjoy those same savings (and will work transparently with queries, as before).
In subsequent releases, we'll add full support of compression, e.g., just create a compression policy on the access node and you are off and running.
Sounds great. So I just manually execute this `compress_chunks` command once on each data node and then I have compression enabled forever on those nodes?
> All of these capabilities are being released under the Timescale License, our source-available license that permits broad usage, except for where organizations are providing TimescaleDB-as-a-service.
Maybe someone can give clarification on this, but the line between using TimescaleDB to build a product and providing TimescaleDB-as-a-service seems incredibly blurry. If I have a product that in some way let's you query time series data, and that product is powered by a TimescaleDB, would that count as providing TimescaleDB-as-a-service?
I used to work for Heap which is an analytics tool. In a way you can view Heap as just a wrapper around Postgres. We stored event data in Postgres and provided a UI that allowed you to express queries (e.g. count the number of logins over the past month). We would take the query in the UI, compile it into a SQL query, and run the SQL against Postgres. If Heap was powered by TimescaleDB, would that violate the Timescale License? In fact, you could technically view any dashboarding product that queries TimescaleDB as providing "TimescaleDB-as-a-service".
I looked at the actual license[0] to see what it says, and it seems really unclear. The license gives you permission to use TimescaleDB to develop "Value Added Products or Services" which it defines as a product that uses TimescaleDB as part of a larger offering. One of the requirements for a product or service to be considered "Value Added" is:
> (ii) such value-added products or services add substantial value of a different nature to the time-series database storage and operations afforded by the Timescale Software and are the key functions upon which such products or services are offered and marketed
This seems incredibly vague. What exactly does "substantial value of a different nature" mean? In the end, tons of products are just wrappers around DBs. If products like Heap or Datadog were to be backed by TimescaleDB, would they add "substantial value of a different nature" on top of it? In the end, Heap and Datadog are products designed for querying time series data. I could definitely make a case that they don't provide value of a different nature from TimescaleDB. This vagueness seems like a huge risk and without further clarification, makes me want to stay far away from TimescaleDB.
Hi @malisper, we totally appreciate concerns around potential uncertainty what a "Value Added Service" means.
In fact, when we were looking at Timescale licensing, we took careful look at what a lot of other like company licenses did here (Confluent, Redis, etc), and what later became the Polyform License. Most of them left this definition pretty vague -- because frankly, legal language is never as precise (and perhaps shouldn't be) as what an engineer may like.
We went a step further, and tried to define this more precisely about what it means to "offer" TimescaleDB:
(iii) users of such Value Added Products or Services are prohibited,
either contractually or technically, from defining, redefining, or
modifying the database schema or other structural aspects of database
objects, such as through use of the Timescale Data Definition Interfaces,
in a Timescale Database utilized by such Value Added Products or
Services.
What that means is that if you've defined the Heap schema, you have built the indexes and tables, and then are offering a SaaS product on this, you're fine:
- You are offering a product/marketing SaaS service around usage/product analytics, not a time-series-database-as-a-service
- You are not approaching the market and saying, "Here's how to get TimescaleDB-as-a-service" (unlike, say, Managed TimescaleDB running on Rackspace or Digital Ocean), you are saying "Here's a full Product/Marketing Analytics Solution".
- You are not giving your users direct/psql access to the raw database to define their tables/schemas/indexes and otherwise just treat that service as a hosted TimescaleDB instance.
> We went a step further, and tried to define this more precisely about what it means to "offer" TimescaleDB
I don't understand how the bit you posted helped make things more concrete? Section 3.21, the section you referenced lists three conditions, all of which have to be true for your product to be considered "Value Added". I agree the third condition, the one you quoted, is pretty clear. But the second condition, the one I quoted seems really vague so the definition of "Value Added" as a whole becomes really vague.
> What that means is that if you've defined the Heap schema, you have built the indexes and tables, and then are offering a SaaS product on this, you're fine.
FWIW, Heap would automatically create new tables for customers as they sign up and would also automatically create new indexes for customers as needed. For that reason alone, I'm pretty sure Heap would violate the Timescale license.
I agree that it's pretty difficult to be specific about what "value added" means. I'm not sure what the right solution is. I would still want to go over with the Timescale License with an IP lawyer pretty thoroughly before I were to use TimescaleDB.
> FWIW, Heap would automatically create new tables for customers as they sign up and would also automatically create new indexes for customers as needed. For that reason alone, I'm pretty sure Heap would violate the Timescale license.
Nope! The user doesn't define or control that those tables and indexes are created. I.e., the user, through the Heap UI, doesn't say: I want a table with this schema and I want to create an index on (event_id, timestamp).
Given how much interest we had using TimescaleDB for this, we recently built and released (in beta) a new "full-stack" of Prometheus + TimescaleDB + Grafana that comes fully configured and "just works" out-of-the-box:
This is so nice that design doc is opened for commenting - so much good thoughts there. Thank you for sharing!
After reading that I have a two questions:
1. While the integration with Prometheus sounds great it still requires to run pretty complicated system behind it. The distributed TimescaleDB could require a lot of knowledge to operate and additionally a connector that could become a one more point of failure. Have you considered to merge connector into Timescale to make setup more simple and robust?
2. Significant part of my everyday work is connected with writing PromQL queries and I often check week/month ranges while plotting timeseries. And I heard many complains that remote-read might be very expensive when it touches a lot of data. Do you consider possibility to support PromQL in TimescaleDB to avoid remote-read bottleneck?
Personally, I have a good experience working with Thanos and VictoriaMetrics because of seamless usage experience - same queries, same Grafana dashboards, same alert rules. Would love to see more products that support the same standards for timeseries data.
1. Even though it is newer, distributed TimescaleDB is probably more robust and easier to operate (and already more operationally mature) than other local storage options for Prometheus metrics, in part thanks to the underlying maturity of Postgres.
2. Yes, supporting PromQL directly (ie not via remote_read) is already in internal testing. Coming very soon.
Would really appreciate feedback if/when you get to try it out yourself. Please feel free to ping me directly: ajay (at) timescale.com
I've had luck with https://thanos.io/ for a big (~1 billion timeseries across all our DCs) Prometheus scale out project. Horizontally sharded Prometheus that can be queried and alerted on in a unified view with object store backend.
You may also want to checkout https://eng.uber.com/m3 which is a highly available RF=3 multi-node TSDB metrics backend and is used with heavy Prometheus workloads and is used to ingest tens of millions of timeseries per second.
The GPL puts no restrictions whatsoever on how you can use software that falls under it. Timescale's license, on the other hand, gives you very limited usage rights. You can use unmodified versions of the software, but you can't allow clients to make schema changes, nor can you use it to provide any service that is "primarily [a] database storage or operations product or service".
In addition, Timescale's license is much more restrictive about allowing derivative works. The GPL lets you create modified versions and/or reuse code in other products, no matter how extensive your changes, as long as the results are also GPL-licensed. Timescale's license lets you create modified versions, but you're not allowed to:
* make any changes that bypass "usage restrictions"
* use your changes in production
* distribute your changes in any way, except for assigning all the rights back to Timescale
Changes you make to GPL software only have to be provided under the GPL if you redistribute the work to others- if you keep it to yourself, run it yourself, etc, you are not required to release it as GPL.
"It has one added requirement: if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there."
As the term you cited says, the AGPL only requires distribution if you are providing it to some one else, including as service over a network- if you use it yourself, and don't distribute it (including over a network service) to anyone else, you still don't have to share your source.
> In addition, Timescale's license is much more restrictive about allowing derivative works. The GPL lets you create modified versions and/or reuse code in other products, no matter how extensive your changes, as long as the results are also GPL-licensed
I mean, to be fair and add some balance here, a lot of people find that part of the GPL to be very restrictive.
There are many organisations who have banned the use of GPL code altogether because of this, and also because of ambiguity in the license (e.g. the never ending debate about static and dynamic linking etc).
> I mean, to be fair and add some balance here, a lot of people find that part of the GPL to be very restrictive.
I often see comments like this, but they make no sense. If you don't agree with the GPL, then don't use software licensed under it. The same as if there's proprietary software you don't want to license, don't use it. There's nothing to debate.
> There are many organisations who have banned the use of GPL code altogether because of this
Good, they read the license and don't want to follow it, so they don't use it. Exactly as intended.
You're confusing the GPL viral nature, which is a central feature, with something different that you wished for, but isn't real.
And by the way, since Linux is GPL, those same companies almost always make an exception, don't they now?
I think this is a good move. But from an enforcement perspective, how realistic is it to prevent someone like Amazon from offering a clone service (at least for backend components) and claiming they wrote it from scratch? Is there any way to force them to reveal the source for a particular service?
Are there any tools to migrate from elasticsearch to timescale ? We are considering a switch from our es and are evaluating options. Timeseries is also one of the contenders. We are not looking for text search just some nested queries on a timeseries data.
Disclosure: I work for Timescale, previously worked for Elastic
Pretty much any ETL tool you like could do this, as long as it speaks to elasticsearch and postgres.
Logstash (if you're using the ELK stack) can write to CSV or other formats as well as do any processing, but it doesn't have a JDBC output plugin, so you'd have to ingest with something else. Conversely, fluentd for example can output to Postgres, but doesn't have an elasticsearch input (at least that I could find), so you'd have to export from es with something else.
So it might be a couple of steps, though there are rich clients for most major programming languages for both elasticsearch and postgres. If your schema is fairly simple, this might not be too bad to roll your own.
That said, the hardest part is likely massaging your data, if your elasticserch schema is complex. Because you have to totally denormalize things for es (generally), you might have to unravel some of that going back into a relational database.
Can someone please summarize what it does because I couldn't figure out from website? It says its "on Postgres", is it a flavor of PG? or it sits on top of multiple PG instances.
TimescaleDB is a distributed time-series database that is packaged as a Postgres extension (a "mega-extension" to quote someone else on this thread).
TimescaleDB:
* Scales to over 10 million of metrics per second [0]
* Supports native compression, using delta-delta, Gorilla, Simple-8B RLE, and other best-in-class compression algorithms (achieving a median 94% compression based on user data) [1]
* Offers native time-series capabilities, such as data retention policies, continuous aggregate views, real-time aggregates, downsampling, data gap-filling, and interpolation
* Handles high cardinality [2]
* Outperforms other non-relational databases including InfluxDB [3], Mongo [4], Cassandra [5] for time-series data
With TimescaleDB you also get all of the goodness that is built into Postgres: full SQL, a variety of data types (numerics, text, arrays, JSON, booleans), ACID semantics, and operationally mature capabilities including high-availability, streaming backups, upgrades over time, roles and permissions, and security.
How does TimescaleDB work as a traditional OLTP db? Can I run general analytical queries on it and leverage its distributed nature? Or is it better for single table append only workloads?
The Hypertables and Distributed Hypertables can be used to store any kind of data, but works best as long as it has a monotonously increasing partitioning key (e.g. time), with high ingest load, few data modifications (preferable bulked)
The beauty of TimescaleDB being built on Postgres is you can have your regular Postgres tables (OLTP schema) and time-series data (Hypertables) live side by side. Use 1 language (1 mindset) to query them, join them, work with them as you see fit. With Distributed Hypertables (what the post is about) you can now partition your data to live across multiple servers, and still use your 1 mindset to query all that data.
edit:
With the preferred workload you get the most out of TimescaleDBs advanced features like compression, continuous aggregates and data retention policies. You can use the aggregates to build complex auto-updating materialized views that are automatically used even when you query the raw tables also (https://docs.timescale.com/latest/using-timescaledb/continuo...)
This sounds like the perfect fit to a write only event log table we stored in postgres at a previous employer. I pushed to move it to BigQuery but this sounds like it would have been fine.
"With real-time aggregation, when you query a continuous aggregate view, rather than just getting the pre-computed aggregate from the materialized table, the query will transparently combine this pre-computed aggregate with raw data from the hypertable that’s yet to be materialized. And, by combining raw and materialized data in this way, you get accurate and up-to-date results, while still enjoying the speedups that come from pre-computing a large portion of the result."
TimescaleDB uses heavily PostgreSQL API and hooks, which expose many data structures, macros and functions. My understanding is that using Rust or even C++ will require to write large FFI and also maintain it between PG major versions, which are released every year. Also, just having FFI is unlikely enough, and will require to write wrappers on top of it to use the best of Rust and not just another syntax on top of C.
Short answer is: The 2.0 release won't natively support automated failover, although you can build around using PG tools like physical replication + Patroni. But these capabilities are certainly things we are working on.
Per the PR notes:
The current implementation has many more limitations
that will be addressed over time:
- HA and replication has to be managed node-by-node.
This will be improved with native replication.
You can utilise multinode data replication for high availability of data, however it is still necessary to use an external tool for HA of the access node, which distributes data and queries to data nodes.
An important clarification is that Azure, Digital Ocean, Rackspace (Object Rocket), Alibaba Cloud -- which all support managed TimescaleDB today -- only offer the Apache-2 version of TimescaleDB.
Many of the more advanced features of TimescaleDB, including this distributed options, is released under the Timescale License.
All code under the Timescale License is also source available and people are free to use, incorporate into their commercial SaaS services, distribute, etc. with the primary limitation being if you are offering TimescaleDB as a hosted DBaaS (like RDS, Azure Postgres, etc.)
Instead, Timescale Cloud is the place to get TimescaleDB advanced features as a fully managed DBaaS.
I was hoping that too but I think Amazon is still working on their time series database[1]. We registered for their preview in 2018 and it's still in preview with no access.
I would recommend looking at Aiven if you want to deploy Timescale on AWS (we use it to deploy on GCP, which is also missing the extension in their CloudSQL offering).
Very happy to see Timescale making more features available in the community edition.
We first started evaluating time series databases a month or two ago, some features like continuous aggregation (rollups) were enterprise only. Perhaps their strategy is to drive adoption and letting people try their feature out, hoping that some of these adoptors will end up using their managed solution. I checked their pricing, and the delta between their pricing and the underlying AWS instance seems quite reasonable.
We ended up testing Influx first, because it seems to be a safe choice with wider adoption and extensive documentation.
With Influx, it was very easy to put together a prototype quickly. But once we started throwing some real workload at it, it would lose writes under load. But it makes sense that it failed, because according to Influx's documentation (https://docs.influxdata.com/influxdb/v1.8/guides/hardware_si...), we would need cluster to make it work. Influx is very transparent in their documentation that writes and queries will fail immediately when a server is unavailable without cluster.
This isn't to say that Influx wouldn't work for other use cases. But at least in our use case, their open source offering isn't suitable for us, and it's unclear how much better the cluster version is.
Timescale, on the other hand, was able to handle the same workload under stress. As we are unable to backfill some of the ingressing data, it's quite vital that the system can degrade more gracefully.
For my use case, one feature that still need some work in Timescale is their real time aggregation. It is currently impossible to define a rollup on top of another rollup, which means that if you are ingesting a lot of data into the raw table, and you down sample into a wide time bucket (e.g. a day, or week), queries against these wider buckets will potentially ended up having to query a lot of data points, slowing the system down considerably. Granted, it is a new feature that just got released about a month ago. Hopefully, with multi-node nearing completion, continuous aggregation will get a bit more love.
I spoke with their engineers about this over Slack, and their suggestion was to manually modify the rollup materialized view to aggregate over a combination of the materialized buckets (currently handled by the continuous aggregation) + real time aggregation from a higher resolution bucket.
We are still testing out Timescale, of course. But so far, it's been holding up its end of the bargain. The fact that Timescale is "just an extension" built for Postgres also makes it a less risky choice and offers a lot of flexibility; if Timescale doesn't work out, we could still work with Postgres, and that IMHO is a very nice thing.
Thanks for the feedback. Really glad to hear that your experience is going well with TimescaleDB! Feel free to ping me directly ajay (at) timescale.com if there's anything I can do to help.
Multi-node TimescaleDB is a great contribution to open source world!
BTW, it would be great comparing multi-node TimescaleDB to VictoriaMetrics cluster [1], which is licensed under vanilla Apache2 open source license [2].
For me, a lot of the value in Free software comes from being able to make modifications to the software (either yourself, or by hiring others), and generally being in control of your own "software destiny".
With that in mind, I think it's important to call attention to this license's prohibition of running modified versions in production. This prohibition applies regardless of your modifications being distributed (and in fact, later in the license, distribution of modifications is expressly prohibited as well):
Clause 2.1 (d): "A license to prepare, compile, and test Derivative Works of the TSL Licensed Software Source Code solely in a Non-Production Environment ...
I've often pined for visibility into the source code of proprietary software that I use. I suppose this is a "win" for TimescaleDB in my mind over source-unavailable proprietary software. In the end, however, this license means it's still just proprietary software.