A ‘short’ or ‘hash’ ID is a seemingly random sets of characters, popularised by services which need unique but readable (and typeable) URLs. For example, YouTube use short IDs for their video URLs:
I found myself needing to generate short IDs for URLs in a recent project. These were the requirements:
- Be short enough to type
- Be easy enough to speak (e.g. over the phone in support situations)
- Be unique (or close to unique as feasible)
- Not contain any rude words
I wasn’t looking at generating millions of IDs per day, all I needed was something simple.
Since I was using node.js to generate these IDs, the first port of call was npm, to see if there were any projects which already existed. My search yielded two main results:
These libraries didn’t really fit the requirements, however:
- shortid looks like a good fit, but it is over-engineered for what I need. You need an alphabet size of 64 characters, and there is no control over the minimum number of chars generated.
- node.hashids generates IDs from a sequence, and is intended to be a reverse-lookup for a number. I din’t need this, and so again it felt over-engineered.
Rolling my own
Under the guise of the KISS principle, I set about creating my own generator. Turns out, for my use case, it was rather straightforward.
Math.random() to generate an 8-character ‘random’ ID, using a pre-defined alphabet.
1 2 3 4 5 6 7 8 9 10 11
Math.randomcreates a number between
Math.random() * alphabet.lengthcreates a number anywhere between
0and just less than
Math.floor()takes the floating point number and returns the nearest integer, thus ending up with an integer between
alphabet.length - 1(in this case,
alphabet.charAt(index)returns the character in the string
alphabetat the given index.
This was a good start, and satisfied requirement 1) – it creates an 8 character ID which is easy to type.
Modifying alphabet size
Requirement 2) specifies it should be easy to speak over the phone. This will be useful in support situations – e.g. if a user needs assistance, they should be able to speak the ID to an operator without too many difficulties.
The first thing to trim is the lowercase/uppercase, which could cause confusion. IDs look nicer with lowercase, so the uppercase was removed.
Next up, numbers that look like letters. This is another source of confusion, and so the numbers
1 and the letters
l were removed.
Lastly, rude words, aka requirement 3). It could be quite embarrassing to generate an ID with a rude word. Sticking to English profanities, it turns out that most rude words are formed from the majority of the following characters:
cfhistu. These characters were also removed.
Our final alphabet contains 25 characters.
Uniqueness is a difficult goal to achieve. I was quite relaxed about requirement 4) – my ID generator would not need to generate millions of IDs per day.
Since we are generating an 8-character ID, the theoretical probability of a clash using the alphabet above of 25 chars is 258 = 152,587,890,625 – over 152 billion to 1.
Math.random() is a pseudorandom generator, which means it is not guaranteed to always produce random numbers, the probability will drop slightly. This being said, it is good enough for my needs.
As a failsafe, a function can be added to check that the generated ID is indeed unique:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
For this to work, you should pass in an array of previously generated IDs.
If the list of IDs was stored in a database, you could generate the ID and then check for its existence in the DB before continuing.
Generating your own ‘short’ ID needn’t be complicated. Depending on your requirements, a simple solution can often suffice.
The full code for this generator can be found here.