/base64uid

Generate UID like YouTube

Primary LanguagePHPMIT LicenseMIT

Latest Stable Version Total Downloads Build Status Coverage Status Scrutinizer Code Quality SensioLabs Insight StyleCI License

Base64 UID

Generate UID like YouTube.

Introduction

The library generates a unique identifier consisting of 64 characters and a length of 10 characters (you can change the length of the identifier). This gives us 6410 = 260 = 1 152 921 504 606 846 976 combinations.

To represent this number, imagine that in order to get all possible values of identifiers with a length of 10 characters and generating an ID every microsecond, it takes 36 559 years.

UUID works on the same principle, but its main drawback is that it's too long. It is not convenient to use it as a public identifier, for example in the URL. In order to get the same number of combinations as the UUID, we need 2128 = 6421 lines 21 characters long, that is, almost 2 times shorter than the UUID (37 characters). And if we take an identifier of the same length as the UUID, then we get 6437 = 2222 against 2128 for the UUID.

The most important advantage of this approach is that you ourselves control the number of combinations by changing the length of the string and the character set. This will optimize the length of the identifier for your business requirements.

Collision

The probability of collision of identifiers can be calculated by the formula:

p(n) ≈ 1 - exp(N * (ln(N - 1) - ln(N - n)) + n * (ln(N - n) - ln(N) - 1) - (ln(N - 1) - ln(N) - 1))

Where

  • N - number of possible options;
  • n - number of generated keys.

Take an identifier with a length of 11 characters, like YouTube, which will give us N = 6411 = 266 and we will get:

  • p(225) ≈ 7.62 * 10-6
  • p(230) ≈ 0.0077
  • p(236) ≈ 0.9999

That is, by generating 236 = 68 719 476 736 identifiers you are almost guaranteed to get a collision.

For calculations with large numbers, i recommend this online calculator.

Installation

Pretty simple with Composer, run:

composer require gpslab/base64uid

Usage

use GpsLab\Component\Base64UID\Base64UID;

$uid = Base64UID::generate(); // iKtwBpOH2E

With length 6 chars (646 = 68 719 476 736 combinations).

$uid = Base64UID::generate(6); // nWzfgA

The floating-length identifier will give more unique identifiers (648 + 649 + 6410 = 1 171 217 378 093 039 616 combinations).

$uid = Base64UID::generate(random_int(8, 10));

You can customize charset.

$charset = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/';
$uid = Base64UID::generate(11, $charset);

$charset = '0123456789abcdef';
$uid = Base64UID::generate(11, $charset);

Other algorithms for generate UID

Random char

Generate random characters of a finite UID from a charset.

$generator = new RandomCharGenerator();
$uid = $generator->generate(); // iKtwBpOH2E

Limit the length of the UID and the charset.

$charset = '0123456789abcdef';
$generator = new RandomCharGenerator(6, $charset);
$uid = $generator->generate(); // fa6c7d

Random bytes

Generate random bytes and encode it in Base64.

$generator = new RandomBytesGenerator();
$uid = $generator->generate(); // YCfGKBxd9k4
$generator = new RandomBytesGenerator(5);
$uid = $generator->generate(); // Mm7dpkM

Encoded random bits

Generate bitmap with random bits and encode it in Base64. The bitmap length is 64 bits and it require 64-bit mode of processor architecture.

$binary_generator = new RandomBinaryGenerator(32);
$encoder = new HexToBase64BitmapEncoder();
$generator = new EncodeBitmapGenerator($binary_generator, $encoder);
$uid = $generator->generate(); // 7MWx2BuWJUw

Encoded bitmap of time

Generate bitmap with current time in microseconds and encode it in Base64. The bitmap length is 64 bits and it require 64-bit mode of processor architecture.

$binary_generator = new TimeBinaryGenerator();
$encoder = new HexToBase64BitmapEncoder();
$generator = new EncodeBitmapGenerator($binary_generator, $encoder);
$uid = $generator->generate(); // koLfRhzAoI0
$uid = $generator->generate(); // zALfRhzAovg
$uid = $generator->generate(); // 18LfRhzAoQw

Generated bitmap has a structure:

{first bit}{random prefix}{current time}{random suffix}
  • first bit - bitmap limiter for fixed size of bitmap;
  • prefix - random bits used in prefix of bitmap. The length of the generated bits can be configured from $prefix_length;
  • time - bits of current time in microseconds.
  • suffix - random bits used in suffix of bitmap. The length is calculated from 64 - 1 - $prefix_length - $time_length.

Responsibly select the number of bits allocated to store the current time. The $time_length defines the limit of the stored date:

Bits limit Maximum available bitmap Unix Timestamp Date
40-bits 1111111111111111111111111111111111111111 1099511627775 2004-11-03 19:53:48 (UTC)
41-bits 11111111111111111111111111111111111111111 2199023255551 2039-09-07 15:47:36 (UTC)
42-bits 111111111111111111111111111111111111111111 4398046511103 2109-05-15 07:35:11 (UTC)
43-bits 1111111111111111111111111111111111111111111 8796093022207 2248-09-26 15:10:22 (UTC)
44-bits 11111111111111111111111111111111111111111111 17592186044415 2527-06-23 06:20:44 (UTC)
45-bits 111111111111111111111111111111111111111111111 35184372088831 3084-12-12 12:41:29 (UTC)

To reduce the size of the saved time, you can use a $time_offset that allows you to move the starting point of time:

Offset microseconds Offset date Maximum available date for 41-bits
0 1970-01-01 00:00:00 (UTC) 2039-09-07 15:47:36 (UTC)
1577836800000 2020-01-01 00:00:00 (UTC) 2089-09-06 15:47:36 (UTC)

Encoded bitmap of floating time

It is similar to the previous generator TimeBinaryGenerator, but the position with bits of the current time is floating. That is, the length of the prefix and suffix is randomly generated each time. Simultaneously generated identifiers have less similarity, but the likelihood of collision increases.

$binary_generator = new FloatingTimeGenerator();
$encoder = new HexToBase64BitmapEncoder();
$generator = new EncodeBitmapGenerator($binary_generator, $encoder);
$uid = $generator->generate(); // 5mqhb6MPH7g
$uid = $generator->generate(); // kFvow8joJys
$uid = $generator->generate(); // 8QRC30YeP3E

Snowflake-id

Snowflake-id use time in microseconds and generator id. This allows you to customize the generator to your environment and reduce the likelihood of a collision, but the identifiers are very similar to each other and the identifier reveals the scheme of your internal infrastructure. Snowflake-id used in Twitter, Instagram, etc.

$generator_id = 0; // value 0-1023
$binary_generator = new SnowflakeGenerator($generator_id);
$encoder = new HexToBase64BitmapEncoder();
$generator = new EncodeBitmapGenerator($binary_generator, $encoder);
$uid = $generator->generate(); // gBFKQeuAAAA
$uid = $generator->generate(); // gBFKQeuAAAE
$uid = $generator->generate(); // gBFKQevAAAA

Domain-driven design (DDD)

How to usage in your domain.

For example create a ArticleId ValueObject:

class ArticleId
{
    private $id;

    public function __construct(string $id)
    {
        $this->id = $id;
    }

    public function id()
    {
        return $this->id;
    }
}

Repository interface for Article:

interface ArticleRepository
{
    public function nextId();

    // more methods ...
}

Concrete repository for Article:

use GpsLab\Component\Base64UID\Base64UID;

class ConcreteArticleRepository implements ArticleRepository
{
    public function nextId()
    {
        return new ArticleId(Base64UID::generate());
    }

    // more methods ...
}

Now we can create a new entity with ArticleId:

$article = new Article(
    $repository->nextId(),
    // more article parameters ...
);

License

This bundle is under the MIT license. See the complete license in the file: LICENSE