Creating Unique Random Numbers
-
Posted on April 6, 2012 by Derek Dieter
-
1
This is typically a hot topic and I’m going to try and tackle it with my rudimentary math. Ultimately given time and the frequency of generation, there is no such thing as a completely unique random number. There will always be some chance that a random number can be regenerated even though the chances do go way down when given a larger set of bytes.
Using Numbers
Let’s first look at probably the most random way to generate a random number. Using a function introduced in SQL 2008 called CRYPT_GEN_RANDOM(). This function takes a parameter of byte length, and returns a random hex based number within that range of bytes. We can easily convert this varbinary data to an integer based value. This function is based upon a low level windows API and is cryptographically secure.
[cc lang=”sql”]
SELECT CAST(CRYPT_GEN_RANDOM(8) AS bigint)
[/cc]
Using the method above returning an 8 byte random number and casting it as a bigint, your chances of repeating a duplicate are (roughly) between 1 and 15 billion. To break that out in terms of time, if you generated a random number every second you would without a doubt hit a duplicate random number between 31 and 47 years. However there’s no guarantee that it might not happen way before or way after however the chances go down the farther you move from the mean. Even though this may sound like a long time, the randomness that a bigint provides may not be enough for your application. In which case you may choose to possibly generate 2 bigints (if you wanted to go the number route) or you would have to settle for either varbinary or a GUID. Also note that these statistics do assume that you are taking advantage of the negative range of numbers bigint provides.
Using Numbers – Method 2
Let’s assume you want to provide customers with a random number. Another method you could use is to populate a table of integers and hand them out randomly while marking the ones that have been handed out. You could populate the table by looping through the number of integers you want to provide. When selecting an integer specify the TOP 1 clause along with calling NEWID(). This will provide a random ordering of the table.
[cc lang=”sql”]
SELECT TOP 1 rand_cust_id
FROM cust_numbers
WHERE is_given = 0
ORDER BY NEWID()
[/cc]
This query would return a single customer number that is randomly selected. It also guarantees the same number will not be selected which is something the other methods cannot provide. You would need to update the is_given flag in this method or you could alternatively place the given number in a separate table and perform a NOT EXISTS against that table when selecting numbers.
Using Binary
Aside from the previous method using binary will provide the highest chances for a unique id. The CRYPT_GEN_RANDOM function can return a maximum of an 8000 byte number. I can’t imagine ever needing something to that precision, and I won’t even begin to try to calculate it. I will say I can’t imagine needing anything over 16.
[cc lang=”sql”]
SELECT CRYPT_GEN_RANDOM(8000)
[/cc]
Using a GUID
We got a partial taste of the GUID method above when we used the NEWID() function. This method returns a Globally Unique IDentifier which is based off a 16 byte number. Our bigint above was based off an 8 byte number however don’t think they are double the randomness. According to my calculations (don’t crucify me I may be off) but if you generate a guid every second, you will not repeat it until 1 billion millenia.
To create a GUID simply call NEWID()
[cc lang=”sql”]
DECLARE @guid uniqueidentifier;
SET @guid = SELECT NEWID();
[/cc]
That does it. No I didn’t talk about RAND() because I guess I don’t see a need for it with CRYPT_GEN_RANDOM. RAND() I hear can also be considered deterministic, especially when given a seed value.
Oh, how did I come up with the numbers? I actually did it the hard way. I generated random numbers for tinyint, smallint, and int. When I got enough samples, I realized the mean of the numbers generated before a collision was roughly (n ^ 2) * .85 for twice the bytes. It may not be exact science, but I believe it’s somewhere in the ballpark.
- Comments (RSS)
- Trackback
- Permalink
Thank you for this article, it’s very reassuring to know that the probability of getting a duplicate random number using GUID in SQL is extremely small.