Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Designing DB partitions you don't have to babysit (explainanalyze.com)

62 points by rtolkachev 4 days ago | 14 comments

Insimwytim 12 hours ago [-]

  The application probably still treats id as unique, but nothing in the schema guarantees it. And you can’t recover the guarantee with a separate UNIQUE (id) constraint: both MySQL and PostgreSQL require every unique constraint on a partitioned table to include the partition key columns. The uniqueness property has effectively been traded away.

Not really?

MySQL has AUTO_INCREMENT [1]

PostgreSQL has SERIAL [2] and CREATE SEQUENCE [3]

What am I missing?

[1] https://dev.mysql.com/doc/refman/8.4/en/example-auto-increme...

[2] https://www.postgresql.org/docs/18/datatype-numeric.html#DAT...

[3] https://www.postgresql.org/docs/18/sql-createsequence.html

toledocavani 11 hours ago [-]

And the post did mention it, argueing it may be better to partition by auto-incremented ID with the support of catch-all range and partution monitoring.

> The primary key already exists. For tables using BIGINT AUTO_INCREMENT, it’s monotonically increasing: newer rows have larger IDs. That’s the property range partitioning needs. The primary key is the partition key.

yasaheblasa 19 hours ago [-]

This is a topic that interests me a lot but there's a lot I find surprising since I finally started working with postgres dependent apps. Why for example is the id a good primary key? Joins are not uncommon, but I don't have anyone searching on id in my application and it is not even supposed to be user visible. I would think every possible user search would look at all partitions indexes if I did this instead of creation date.

jagged-chisel 17 hours ago [-]

Allow me to clarify a bit more.

You “search” for a record (perhaps based on username [and then you verify the password hash, but I digress]) and now you know the ID. Carry the ID in a session (the app doesn’t display this) and you can modify this specific user’s record. This is especially useful if you allow editing fields on which searches happen. Want to change a username? Great, you can.

If the username is your primary key, and you allow users to change them, there are edge cases and nuances and headaches … just use something immutable to identify records - autoincremented int, UUID, etc

yasaheblasa 14 hours ago [-]

I'll certainly mull this over. I think the point of including the creation time in the primary key was to eliminate lookups in old partitions, I.e. if an active order can only be 90 days old it can only span 2 year based partitions. A cache of what id is 91 days old would seem similar but is even more partition details affecting app queries than saying 90 days directly. Similarly, I don't cache ids in any sort of session, an API call reusing cache is another kind of layer violation I don't want to get into.

jagged-chisel 1 hours ago [-]

At least one UUID format has a time part that can be used for partitioning.

jagged-chisel 18 hours ago [-]

Your primary key is there to uniquely identify a record. You need additional indexes on fields you will search on.

djfobbz 20 hours ago [-]

Or you could just warehouse the daily data into something like ClickHouse and start fresh every day. It's built for this kind of workload and has demonstrated some absolutely insane analytical performance at massive scale. We're currently running it on an $170/month VPS, querying over 500+ billion rows daily without any issues. At that point, partitioning an ever-growing OLTP table starts looking like the harder problem.

mamcx 18 hours ago [-]

With the other comment about kdb+ and this show a misread of the article: The article AND ANY use of a columnar DB show the same issue: Point lookups are badly pessimized with both partitions on other columns and/or columnar.

That is why columnar DBs, not matter how impressive, are not used by OLTP workloads (and here this article point you can undo the advantages without pay attention at the consequences of partitions).

hilariously 14 hours ago [-]

Well that's not fair, there's been a bunch of tries at HTAP systems (hybridizing both an OLTP and OLAP system with some columnar bolt ons) but there's no free lunch converting between data formats - columnar formats usually have very useful things like run length encoding which is insanely fast to unroll but annoying AF to change.

djfobbz 17 hours ago [-]

[flagged]

rtolkachev 12 hours ago [-]

[flagged]

piterrro 20 hours ago [-]

For anyone interested in the topic I suggest reading about snowflake id [https://en.wikipedia.org/wiki/Snowflake_ID] or uuid7 the patterns from the article translate cleanly. The bigint is 64 bytes where uuid is 128. There are other caveats but its all about tradeoffs.

khurs 8 hours ago [-]

Technically - as Postgresql stores the UUID data type as binary, it's 16 bytes.

piterrro 2 hours ago [-]

Oh, I meant bits not bytes. You are right, the uuid in psql is 16 bytes, the bigint is 8 bytes.

onchainbuilder 15 hours ago [-]

[flagged]

leprechaun1066 21 hours ago [-]

Or just use kdb+ and 1bn rows a day is par for the course.

Rendered at 14:12:36 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.