Snowflake id generation for serverless runtime

3 min readFeb 3, 2024

Let’s compare common approach for generating id first including timestamp, hasing, uuid and snowflake method.

Timestamp

Append the the timestamp when the id is generated as the prefix of id.

Pros

The data insert is order by creation time.

Cons

Multiple id generation at the same millisecond could cause clashing.

Hashing

Hash the content of record and use it as part of the id.

Pros

Meaningful id

Cons

Data inserted are not order by creation time.

UUID

Generate randomly.

Pros

Globally unique.
Suitable for distributed system.

Cons

Meaningless id.
Long value, usually 128-bit.

Snowflake id

Timestamp + worker id + sequence id.

Pros

Order by creation time.
Meaningful id (worker id).
Suitable for distributed system.

Cons

Harder to implement

Method

Timestamp is current_timestamp.

worker_id is the predetermined value for each machine or database (in case of serverless, this is database id).

sequence_id is `autoincrement` and auto reset to 0 when it it hits the maximum value.

However, in serverless environment, we don’t have a store for sequence_id. So that we need the value to be randomly generated.

Implementation

We use the Snowflake table to generate time, sequence and check for their uniqueness combination.

CREATE TABLE Snowflake(
    now uint64 NOT NULL,
    align uint16 NOT NULL,
    PRIMARY KEY (now, align)
);

The snowflake generator:

import { unique } from './unique';

const random_integer = (range: number) => Math.round(Math.random() * range);
const max_align = 4096;
const epoch = Date.UTC(2023, 0, 1);

export const generateId = async (d1: D1Database, databaseId: number) => {
  while (true) {
    try {
      const now = Date.now(),
        align = random_integer(max_align);
      await d1
        .prepare('INSERT INTO Snowflake(now, align) VALUES(?1, ?2)')
        .bind(now, align)
        .run();
      const id = encodeId({ now, databaseId, align });
      return id;
    } catch (e) {
      if (unique(e)) {
        // delay 1 milliseconds and retry
        await new Promise((resolve) => setTimeout(resolve, 1));
      } else {
        throw e;
      }
    }
  }
};

export const encodeId = ({
  now,
  databaseId,
  align
}: {
  now: number;
  databaseId: number;
  align: number;
}) => {
  const idBigInt =
      (BigInt(now - epoch) << BigInt(22)) |
      (BigInt(databaseId) << BigInt(12)) |
      BigInt(align),
    id = idBigInt.toString();
  return id;
};

export const decodeId = (id: string) => {
  const idBigInt = BigInt(id),
    now = Number(idBigInt >> BigInt(22)) + epoch,
    databaseId = Number((idBigInt >> BigInt(12)) & 0x3ffn),
    align = Number(idBigInt & 0xfffn),
    date = new Date(now);
  return { now, date, databaseId, align };
};

now and align are generated by worker to offload the computational power off the database engine.

In case of clashing, the function delay 1 millisecond and retry every 1 ms.

Discussion

Instead of using autoincrement sequence_id, we use a random align, which will eventually lead to a higher chance of clashing. However, we resolve the issue by retry generating after 1ms.

Although this implementation enable to use snowflake id in serverless environment, align become meaningless and records inserted at the same milliseconds are not order by creation time.

Snowflake id generation for serverless runtime

Timestamp

Pros

Cons

Hashing

Pros

Cons

UUID

Pros

Cons

Snowflake id

Pros

Cons

Method

Implementation

Discussion

Written by chientrm