More

djk447 · on Sept 22, 2022

NB: I work at Timescale.

TimescaleDB is a PostgreSQL extension, just to be clear.

djk447 · on Sept 22, 2022

Disclosure, I work at Timescale.

We are kindred spirits I think! I did this too [0] a while back at a previous company and it actually served as part of the inspiration for our compression work! It's fun, but a bit difficult to query at times. Our compressed columns do also get TOASTed and stored out of line.

I'm not sure that it's going to be much more efficient than the Timescale format once it's compressed, we have some pretty good compression algos, but I might be missing something about your case, we generally can achieve close to 10x compression, but right now you can't write directly compressed data, so you would save on the write side I suppose.

It is true that you need to put the uncompressed version into memory at some point, but we do try to limit that and in many cases you end up IO limited moreso than memory limited. We're also thinking about doing some work to push processing down towards the compressed data, but that's still in the "glint in our eye" stage, but I think it has a lot of promise.

(As a side note, TOAST is still the best acronym around ;) ).

[0]: https://www.youtube.com/watch?v=sPoz1OPuRUU

miohtama · on Sept 22, 2022

If I am running already compressed filesystem like ZFS with zstd, can I disable TOAST and compressed columns altogether somehow?

https://dba.stackexchange.com/questions/315063/disable-toast...

djk447 · on Sept 22, 2022

Disclosure, I work at Timescale.

Though I didn't write this post, I'd imagine at least part of it is that it's already nearly 4000 words and a 15 minute read and we just didn't want to add another set of things to it, to be perfectly honest.

`pg_partman` is cool! I haven't used it in a while, and because it uses declarative partitioning, it has some locking issues that we address with our partitioning scheme, but implying that it is OSS and we're not in terms of things like data retention features is a bit misleading as well. The `drop_chunks` command used for data retention is in the Apache 2 licensed portion of Timescale.

ensignavenger · on Sept 22, 2022

But almost all of your posts and benchmarks are based on the closed source version of Timescale. Everywhere I have seen it is always recommend to use the closed source version to get decent performance out of it.

akulkarni · on Sept 22, 2022

(Timescale co-founder)

Just to clarify: Nothing on Timescale is closed-source. It is all source available, all on Github. Some of it is Apache2 licensed, some of it is Timescale Licensed. And it is all free.

ensignavenger · on Sept 22, 2022

Since the Timescale License is not an open source license, it is a closed source license. You are right, it is source available, but source available is also closed source. It is closed because it is not open. And it might be free as in beer, but it is not free as in freedom.

doctor_eval · on Sept 22, 2022

My recollection is that the TS license simply has protection against using the TS code to compete with TS, ala Amazon RDS.

While some people on HN feel that this is an impurity they can’t live with, I personally think it’s a small price to pay to enable development of TS to continue. In my opinion, claiming that it’s closed source is somewhat dogmatic. Many open source licenses have some kind of restrictions on use; the GPL comes to mind.

ensignavenger · on Sept 23, 2022

I have always been more on the pragmatic side of the FOSS movement (open source versus the philosophical/moral stance of the FSF, but still have tremendous respect for the FSF). For pragmatic reasons I reject the Timescale License. It doesn't just prevent Amazon (that alone would be misguided enough, though) but also prevent anyone other than Timescale from hosting the software for me. That means even if Timescales current offerings lined up perfectly with my business, I would be locked into whatever decisions they make in the future which may not align. The chances for a successful community fork are greatly reduced under the restrictions of the Timescale license. It makes it impossible for the community to make contributions on equal footing to Timescale, thus anyone making contributions are just doing free work for a corporation, rather than contributing to a product the community benefits from just as much as the corporation.

tempnow987 · on Sept 23, 2022

Ah, I suppose you'd prefer they switched to Affero GPL v3 (which BTW IS open source and written by FSF itself) - check it out here:

https://www.gnu.org/licenses/agpl-3.0.en.html

This license turns out to be very difficult to use for almost developer.

ensignavenger · on Sept 23, 2022

I prefer the Apache or Mozilla Public Licenses myself. But I accept the AGPL and the GPL family of licenses will consider useing software licensed under them in limited ways (it isn't clear how the AGPL interacts with infrastructure software).

skeletal88 · on Sept 22, 2022

The source code is available, when someone says something is closed source then it usually means that the source code is not publicly available.

Do you want amazing things? Everything can't be "free as in beer" wtf does that even mean, i don't get free beer from anywhere.

teraflop · on Sept 22, 2022

"Open source" and "closed source" are not the only options. There are plenty of products out there where you're technically allowed to look at the source code, but very restricted in how you can legally use it. The "open" in "open source" is generally understood to mean that users have permission to use, modify and redistribute the software. (Without that permission, calling it "freeware", "shared source" or "source available" would be more accurate.)

That's what "free as in beer" means -- it's a well-established phrase meaning "zero monetary cost": https://en.wikipedia.org/wiki/Gratis_versus_libre

In the case of the non-Apache-licensed version of TimescaleDB, you're allowed to use the software without payment, and you can distribute unmodified copies. But you're essentially forbidden from letting users define their own schemas, or from modifying it or reusing components unless your modified version imposes that same restriction. (The exception is if you agree to transfer ownership of your changes back to Timescale.)

Nobody's saying that Timescale can't build a non-open-source database, only that they should be clear about which parts are actually open. In my opinion, describing it on the homepage as an "open-source relational database" and then promoting it by benchmarking the proprietary version is at least a little bit misleading.

ensignavenger · on Sept 22, 2022

It is a phrase that Richard Stallman created, and well known in free software communities- https://www.wired.com/2006/09/free-as-in-beer/, https://www.gnu.org/philosophy/free-sw.html

djk447 · on July 14, 2022

NB: Post author

So we thought about doing something like that with multinode where each of the nodes would maintain their own materialization but abandoned it for that very reason it’s very, very difficult to maintain any sort of consistency guarantees in that case, or even to reason about it.

Instead we use the access nodes as coordinators to do the materialization. right now the materialization only exists on the access node but there’s no reason we couldn’t send it back out to the data nodes, you just need a coordination point to start a distributed transaction to have some semblance of a guarantee.

djk447 · on July 14, 2022

And glad you liked the article!

hodgesrm · on July 14, 2022

It was excellent. I work on ClickHouse and wrote something similar a while back for ClickHouse mat views. [0] It was not nearly as good, hence the appreciation.

Don't know if you do talks but I have a couple opportunities coming up at open source events. We like hearing about what people are doing outside of ClickHouse.

[0] https://altinity.com/blog/clickhouse-materialized-views-illu...

djk447 · on July 14, 2022

NB: Post author

Yes. this is generally handled automatically, there may be times though where you want to essentially pause refreshing the view for a while while you do some backfilling and then eventually let it catch up, especially if you're overwriting the same time period multiple times in a row. If you can insert in time order then it just breaks up re-calculation into smaller segments, which can be quite useful rather than having to process the whole data set again.

This can be a little bit different if you're doing compression, but with continuous aggregates I think it should work fine. I'm not 100% sure that was what you were looking for, let me know if it's not.

djk447 · on July 14, 2022

NB: Post author

I'm not 100% sure I understand what you're asking, but essentially something that would look for data modifications and at query time run the query over the older regions as well?

If that's what you're asking the answer is yes, we did consider it, but basically decided that it was something that relatively few people needed and the complexity and performance tradeoffs were unlikely to be worth it for most folks.

Essentially, we could do something like this now by looking at our invalidation log and running a join against it to get to a more strongly consistent state (I haven't thought through the full implications and whether it's truly strong consistency, I think it might be, but it'd require a proof / some thinking through of all of our locking logic to really get there). It's interesting to consider though.

djk447 · on July 14, 2022

NB: post author here!

Thanks yes! Totally true, was thinking about including some of that but it felt like it opened a can of worms about join types and why certain things would be included and others not (ie inner join needs to see that it's there on both sides whereas the left join doesn't) etc. and the post was already kinda long in the tooth.

djk447 · on June 21, 2022

NB - Timescale person here. Totally true! It's also a much harder problem :) One of the things that we try to focus on at Timescale is figuring out how we can simplify problems based on the specific needs of time-series data. Postgres has to solve things for very general cases, and sometimes that just is much harder. And then they often won't work all that well for time-series, because they're not all that optimized for them.

djk447 · on Feb 22, 2022

Totally fair and something that I'm actually forming a team to work on! We're starting with some very foundational material [1], that may well be review and it's not as formal / professional as Mongo University or the like, but I am going to be continuing this course and then we'll be iterating more from there. I'd really love some feedback and also your questions, ie what you want to cover or what you find confusing. You can leave comments on the video or in our community Slack channel[2] or forum[3]. Thanks for the feedback and I hope we'll be able to do some of that for you over the coming months!

[1]: https://www.youtube.com/watch?v=tLJm2oStD9w [2]: timescaledb.slack.com [3]: https://www.timescale.com/forum/

akulkarni · on Feb 23, 2022

+1

We agree that the world needs more PostgreSQL and TimescaleDB educational content, and welcome any help!

miohtama · on Feb 23, 2022

TimescaleDB has some of the best documentation available for open source and super helpful developer advocates. They are also happy to reach out to dev teams using TimescaleDB (even open source version). We recently did a guest blog post with them:

https://www.timescale.com/blog/how-trading-strategy-built-a-...

pcthrowaway · on Feb 23, 2022

(original commenter suggesting course-style educational content here)

I agree completely with this. The documentation is great and the community on slack has been incredibly helpful when I've posted there. Even still, I personally think a lot of people would benefit from course content like MongoDB provides.

akulkarni · on Feb 23, 2022

Thank you for the wonderful guest post (and the kind words!)

djk447 · on Oct 19, 2021

(NB: Post author here)

Glad you liked it! Please do give us feedback especially about how it is to use with your library…will be intrigued to see how they interact. Also , cool to hear about the library more generally, any particularly good syntax you think we should try to learn from?