In a recent ODC thread someone had a piece of SQL that was calling dbms_random.string(‘U’,20) to generate random values for a table of 100,000,000 rows. The thread was about how to handle the ORA-30009 error (not enough memory for operation) that is almost inevitable when you use the “select from dual connect by level <= n” strategy for generating very large numbers of rows, but this example of calling dbms_random.string() so frequently prompted me to point out an important CPU saving , and then publicise through this blog a little known fact (or deduction) about the dbms_random.string() function.
If you generate a random string of length 6 using only upper-case letters there are 308,915,766 different combinations (266); so if you’re after “nearly unique” values for 100 million rows then a six character string is probably good enough – it might give you a small percentage of values which appear in a handful rows but most of the values are likely to be unique or have two rows. If you want to get closer to uniqueness then 7 characters will do it, and 8 will make it almost certain that you will get a unique value in every row.
So if you want “nearly unique” and “random 20 character strings” it’s probably sufficient to generate random strings of 6 to 8 characters and then rpad() them up to 20 characters with spaced – the saving in CPU will be significant; roughly a factor of 3 (which is going to matter when you’re trying to generate 100 million rows. As a little demo I supplied the OP with a script to create a table of just one million random strings – first of 20 random characters, then of 6 random characters with 14 spaces appended. The run time (mostly CPU) dropped from 1 minute 55 seconds to 41 seconds.
Why is there such a difference ? Because to generate a random string of 6 characters Oracle generates a random string of one character six times in a row and concatenates them. The difference between 6 calls and 20 calls per row gives you that factor of around 3. For a quick demo, try running the following anonymous PL/SQL block:
rem rem Script: random_speed.sql rem Author: Jonathan Lewis rem Dated: May 2010 rem begin dbms_random.seed(0); dbms_output.put_line(dbms_random.string('U',6)); dbms_output.new_line; dbms_random.seed(0); dbms_output.put_line(dbms_random.string('U',1)); dbms_output.put_line(dbms_random.string('U',1)); dbms_output.put_line(dbms_random.string('U',1)); dbms_output.put_line(dbms_random.string('U',1)); dbms_output.put_line(dbms_random.string('U',1)); dbms_output.put_line(dbms_random.string('U',1)); end; /
Jere are the results I got from instances of 12.1.0.2, 12.2.0.1, and 18.1.0.0 (from LiveSQL):
BVGFJB B V G F J B
I haven’t shown the tests for all the possible dbms_random.string() options but, unsurprisingly, changing the test to use the ‘L’ (lower case alpha) option produces the same effect (and the same 6 letters changed to lower case). The same effect, with different characters, also appeared using the ‘A’ (mixed case alpha), ‘X’ (uppercase alphanumeric) and ‘P’ (all printable characters) options.
I haven’t considered the effect of using a multi-byte character set – maybe Oracle calls its random number generator once per byte rather than once per character. The investigation is left as an exercise to the interested reader.
When generating a very large number of random strings – keep the “operational” part of the string as short as you can and leave the rest to be rpad()‘ed.
Recent comments
1 year 4 weeks ago
1 year 16 weeks ago
1 year 20 weeks ago
1 year 21 weeks ago
1 year 25 weeks ago
1 year 47 weeks ago
2 years 15 weeks ago
2 years 44 weeks ago
3 years 29 weeks ago
3 years 29 weeks ago