PostgreSQL - 大规模随机数据生成方法
背景
简单多行整数类
随机值及有限范围值的生成
日期类
综合示例
背景
大规模随机生成数据,对于经常发生的验证性的工作非常有用。适用于数据库架构调研、原型(POC)测试、性能测试等等,还有一些结论的现场验证。
建议常加练习和掌握。借助于generate_series()函数和random()函数,可以生成很多近似随机的大批量数据。而对于其它DBMS来说,基本原理和思路也一样, 都是相通的。
简单多行整数类
mydb=# select generate_series(1, 10000, 5) limit 5;generate_series-----------------16111621(5 rows)
随机值及有限范围值的生成
随机整数:select random() * (2*10^9)::int
mydb=# select random() * (2*10^9);?column?-------------------777211312.0870543(1 row)随机bigint: (random() * (9*10^18))::bigint
mydb=# select (random() * (9*10^18))::bigint;int8---------------------2220773689139246336(1 row)随机numberic , 采用: (random() * 100.)::numeric(4,2)
mydb=# select (random() * 100.)::numeric(4,2);numeric---------74.03(1 row)随机长度的重复字符串
mydb=# select repeat('ab', (random()*20)::integer);repeat----------------ababababababab(1 row)-- 来个binary的随机有限长的mydb=# select repeat('a', (random()*40)::integer)::bytea;repeat------------------\x61616161616161(1 row)随机长度的子串
mydb=# select substring('abcdefghijiklmnopqrstuvwxyz', 1, (random()*25)::integer);substring--------------------------abcdefghijiklmnopqrstuvw(1 row)真正随机长度的字符串
CREATE OR REPLACE FUNCTION random_string( int ) RETURNS TEXT as $$SELECT string_agg(substring('abcdefghijklmnopqrstuvwxyz', round(random() * 25 + 0.5)::integer, 1), '') FROM generate_series(1, $1); $$ language sql;postgres=# select random_string(100);random_string-----------------------------------------------------------------------------------------------------npdfilfpbvigpbhbetnjtkeeelcylrpahankustqtwllqtefdkqbeuqvcbjxfsnxhjsiqcnnfomcqvcplgigklrjfafpivftxxa(1 row)从几个固定串或值中选取某个值 (利用ARRAY)
-- 整数值mydb=# select (array[1,2,3,4,5])[0.5 + random() * 5], generate_series(1,5) as key;array | key-------+-----3 | 12 | 22 | 33 | 42 | 5(5 rows)-- 字符串值mydb=# select (array['Beijing', 'Nanjing', 'Shanghai', 'Wuhan'])[0.5 + random()*4];array---------Nanjing(1 row)mydb=# select (array['Beijing', 'Nanjing', 'Shanghai', 'Wuhan'])[0.5 + random()*4];array---------Beijing(1 row)
日期类
-- 从今年1月1号起的180天内的随机日期mydb=# select '2022-01-01'::timestamptz + random() * '180d'::interval, generate_series(1, 5) as key;?column? | key-------------------------------+-----2022-04-12 15:05:00.547293+08 | 12022-01-17 08:13:13.222297+08 | 22022-02-07 19:37:10.470187+08 | 32022-02-11 19:53:54.713278+08 | 42022-04-06 14:19:22.065327+08 | 5(5 rows)-- 现在起一个月的每一天mydb=# select date(generate_series(now(), now() + '1 month', '1 day')) limit 5;date------------2022-07-252022-07-262022-07-272022-07-282022-07-29(5 rows)-- 每10秒为一条记录, 生成一段时间(几个月)以内的数据mydb=# select extract(epoch from seq)::int, seq from generate_series('2022-01-01'::timestamptz,'2022-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq limit 5;extract | seq------------+------------------------1640966400 | 2022-01-01 00:00:00+081640966410 | 2022-01-01 00:00:10+081640966420 | 2022-01-01 00:00:20+081640966430 | 2022-01-01 00:00:30+081640966440 | 2022-01-01 00:00:40+08(5 rows)-- https://www.postgresql.org/docs/14/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT-- 每秒来一条mydb=# SELECT EXTRACT(EPOCH FROM '2022-07-25 05:00:00'::timestamptz);extract-------------------1658696400.000000mydb=# select extract(epoch from n)*1000 as id, n from generate_series('2022-07-26 05:00:00', '2022-07-26 05:01:00', interval '1 millisecond') as n limit 5;id | n----------------------+----------------------------1658782800000.000000 | 2022-07-26 05:00:00+081658782800001.000000 | 2022-07-26 05:00:00.001+081658782800002.000000 | 2022-07-26 05:00:00.002+081658782800003.000000 | 2022-07-26 05:00:00.003+081658782800004.000000 | 2022-07-26 05:00:00.004+08(5 rows)-- 试着造一批数据到一个csv文件里头mydb=# \copy (select extract(epoch from n)*1000 as id, n from generate_series('2022-07-26 05:00:00', '2022-07-26 05:01:00', interval '1 millisecond') as n) to '/pgccc/tmp/t.dat' CSV header;COPY 60001-- 看一下文件内容的前10行mydb=# \! head -10 pgccc/tmp/t.datid,n1658782800000.000000,2022-07-26 05:00:00+081658782800001.000000,2022-07-26 05:00:00.001+081658782800002.000000,2022-07-26 05:00:00.002+081658782800003.000000,2022-07-26 05:00:00.003+081658782800004.000000,2022-07-26 05:00:00.004+081658782800005.000000,2022-07-26 05:00:00.005+081658782800006.000000,2022-07-26 05:00:00.006+081658782800007.000000,2022-07-26 05:00:00.007+081658782800008.000000,2022-07-26 05:00:00.008+08
综合示例
将几种随机值组合起来.
mydb=# select generate_series(1,10) as key, (random()*100.)::numeric(4,2), repeat('1',(random()*25)::integer);key | numeric | repeat-----+---------+-------------------------1 | 38.24 | 11111111111111111111112 | 18.16 | 1111111113 | 53.70 | 11111111111111114 | 57.86 | 1111111111111111115 | 42.24 | 1116 | 54.62 | 111111111111111111111117 | 85.26 | 111111111118 | 98.51 | 11111111111119 | 1.98 | 111111111110 | 0.47 | 111111(10 rows)
利用上述查询,可以非常快速生成随机数据文件。
文章转载自数据库杂记,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




