Snowflake id generation for serverless runtime

chientrm
3 min readFeb 3, 2024
Photo by Darius Cotoi on Unsplash

Let’s compare common approach for generating id first including timestamp, hasing, uuid and snowflake method.

Timestamp

Append the the timestamp when the id is generated as the prefix of id.

Pros

  • The data insert is order by creation time.

Cons

  • Multiple id generation at the same millisecond could cause clashing.

Hashing

Hash the content of record and use it as part of the id.

Pros

  • Meaningful id

Cons

  • Data inserted are not order by creation time.

UUID

Generate randomly.

Pros

  • Globally unique.
  • Suitable for distributed system.

Cons

  • Meaningless id.
  • Long value, usually 128-bit.

Snowflake id

Timestamp + worker id + sequence id.

Pros

  • Order by creation time.
  • Meaningful id (worker id).
  • Suitable for distributed system.

Cons

  • Harder to implement

Method

Timestamp is current_timestamp.

worker_id is the predetermined value for each machine or database (in case of serverless, this is database id).

sequence_id is `autoincrement` and auto reset to 0 when it it hits the maximum value.

However, in serverless environment, we don’t have a store for sequence_id. So that we need the value to be randomly generated.

Implementation

We use the Snowflake table to generate time, sequence and check for their uniqueness combination.

CREATE TABLE Snowflake(
now uint64 NOT NULL,
align uint16 NOT NULL,
PRIMARY KEY (now, align)
);

The snowflake generator:

import { unique } from './unique';

const random_integer = (range: number) => Math.round(Math.random() * range);
const max_align = 4096;
const epoch = Date.UTC(2023, 0, 1);

export const generateId = async (d1: D1Database, databaseId: number) => {
while (true) {
try {
const now = Date.now(),
align = random_integer(max_align);
await d1
.prepare('INSERT INTO Snowflake(now, align) VALUES(?1, ?2)')
.bind(now, align)
.run();
const id = encodeId({ now, databaseId, align });
return id;
} catch (e) {
if (unique(e)) {
// delay 1 milliseconds and retry
await new Promise((resolve) => setTimeout(resolve, 1));
} else {
throw e;
}
}
}
};

export const encodeId = ({
now,
databaseId,
align
}: {
now: number;
databaseId: number;
align: number;
}) => {
const idBigInt =
(BigInt(now - epoch) << BigInt(22)) |
(BigInt(databaseId) << BigInt(12)) |
BigInt(align),
id = idBigInt.toString();
return id;
};

export const decodeId = (id: string) => {
const idBigInt = BigInt(id),
now = Number(idBigInt >> BigInt(22)) + epoch,
databaseId = Number((idBigInt >> BigInt(12)) & 0x3ffn),
align = Number(idBigInt & 0xfffn),
date = new Date(now);
return { now, date, databaseId, align };
};

now and align are generated by worker to offload the computational power off the database engine.

In case of clashing, the function delay 1 millisecond and retry every 1 ms.

Discussion

Instead of using autoincrement sequence_id, we use a random align, which will eventually lead to a higher chance of clashing. However, we resolve the issue by retry generating after 1ms.

Although this implementation enable to use snowflake id in serverless environment, align become meaningless and records inserted at the same milliseconds are not order by creation time.

--

--