null 虽作为数据库里一个非常常见的数值对象,但却常常让人感到很迷惑。一段时间可能理清楚了,过段时间可能又混了。这里把null值再做个简单总结,方便后面搞不清时再来看看。
定义
在OCP的教程里,Null值被这样定义:
Null is a value that is unavailable, unassigned, unknown, or inapplicable.
Null is not the same as zero or a blank space.
Null是一个不可用、未分配、未知或不适用的值,Null既不等于0 也不等于空格。
测试表
create table person(id int,married_status varchar2(30),region varchar2(10)); insert into person values(1,'single','east'); insert into person values(2,'married','west'); insert into person values(3,'single','south'); insert into person values(4,'married','north'); insert into person values(5,'single',''); insert into person values(6,null,'west'); commit; HR@orcl>select * from person; ID MARRIED_STATUS REGION ---------- ------------------------------ ---------- 1 single east 2 married west 3 single south 4 married north 5 single 6 west 6 rows selected.
复制
Null值基本查询
1) 查询region字段为null的行,使用 is null ,使用=null是错误的。 select * from person where region is null; select * from person where region=null; HR@orcl>select * from person where region is null; ID MARRIED_STATUS REGION ---------- ------------------------------ ---------- 5 single 如下=null查询结果为空 HR@orcl>select * from person where region=null; no rows selected 2) 查询region字段不为null的行,使用 is not null ,使用!=null是错误的 select * from person where region is not null; select * from person where region !=null; HR@orcl>select * from person where region is not null; ID MARRIED_STATUS REGION ---------- ------------------------------ ---------- 1 single east 2 married west 3 single south 4 married north 6 west 如下!=null查询结果为空 HR@orcl>select * from person where region !=null; no rows selected
复制
Null与反向查询
查询region不为"south"的行 第一反应会是想到如下SQL语句 select * from person where region !='south'; 如下查询结果是4行,是不包括region 为空的行 HR@orcl>select * from person where region !='south'; ID MARRIED_STATUS REGION ---------- ------------------------------ ---------- 1 single east 2 married west 4 married north 6 west 如果认为region 为空那肯定不是south的话,那么SQL语句应该加上 or region is null 条件才是正确的。 select * from person where region !='south' or region is null; HR@orcl>select * from person where region !='south' or region is null; ID MARRIED_STATUS REGION ---------- ------------------------------ ---------- 1 single east 2 married west 4 married north 5 single 6 west 另外关于反向查询大家会想到 not in或者 not exists,那这两者会有何区别呢 先使用 not in,region结果不包含NULL值等同于 !='south' 查询方法 select * from person where region not in (select t.region from person t where t.region='south'); HR@orcl>select * from person where region not in (select t.region from person t where t.region='south'); ID MARRIED_STATUS REGION ---------- ------------------------------ ---------- 4 married north 6 west 2 married west 1 single east 再使用 not exists, region结果包含了NULL值等同于 !='south' 加上 or region is null 条件查询方法 select * from person a where not exists (select 1 from person t where t.region='south' and t.id=a.id); HR@orcl>select * from person a where not exists (select 1 from person t where t.region='south' and t.id=a.id); ID MARRIED_STATUS REGION ---------- ------------------------------ ---------- 6 west 5 single 4 married north 1 single east 2 married west
复制
Null与分组查询
当使用分组查询时,任何的分组函数都是不包含对NULL的统计的
如下在person表添加 sal字段,id=3 的人不给予赋值 alter table person add sal number(8,2); update person set sal=8000 where id=1; update person set sal=8900 where id=2; update person set sal=9000 where id=4; update person set sal=9100 where id=5; update person set sal=10000 where id=6; commit; select * from person; HR@orcl>select * from person; ID MARRIED_STATUS REGION SAL ---------- ------------------------------ ---------- ---------- 1 single east 8000 2 married west 8900 3 single south 4 married north 9000 5 single 9100 6 west 10000 6 rows selected. 查询所有人员平均工资,总工资,总人数 select avg(sal) "平均工资",sum(sal) "总工资",count(*) "总人数" from person; 如下SQL 可以看到平均工资9000并不等于 总工资 45000 除以 总人数 6 HR@orcl>select avg(sal) "平均工资",sum(sal) "总工资",count(*) "总人数" from person; 平均工资 总工资 总人数 ---------- ---------- ---------- 9000 45000 6 而通过如下查询 平均工资9000 才是等于总工资 45000 除以总人数 5的 select avg(sal) "平均工资",sum(sal) "总工资",count(sal) "总人数" from person; HR@orcl>select avg(sal) "平均工资",sum(sal) "总工资",count(sal) "总人数" from person; 平均工资 总工资 总人数 ---------- ---------- ---------- 9000 45000 5 分组查询,查询单身人士和已结婚人士的工资状况 查询所有人员平均工资,总工资,总人数 select married_status,avg(sal) "平均工资",sum(sal) "总工资",count(*) "总人数" from person group by married_status; HR@orcl>select married_status,avg(sal) "平均工资",sum(sal) "总工资",count(*) "总人数" from person group by married_status; MARRIED_STATUS 平均工资 总工资 总人数 ------------------------------ ---------- ---------- ---------- 10000 10000 1 single 8550 17100 3 married 8950 17900 2 可以看到single的平均工资并不等于 总工资 17100除以 总人数 3的,这就是因为 3号人员sal行为Null,sum、avg都没有就算他(count(*)是统计该组总行数,不算分组函数)。 如果需要将他算进来就要预定义他这个SAL为Null时是啥含义,如不算工资,那就将其转换为0 ,这需要Null相关函数来加以处理,比如NVL (expr1, expr2)函数,改写成如下SQL: select married_status,avg(nvl(sal,0)) "平均工资",sum(nvl(sal,0)) "总工资",count(*) "总人数" from person group by married_status; HR@orcl>select married_status,avg(nvl(sal,0)) "平均工资",sum(nvl(sal,0)) "总工资",count(*) "总人数" from person group by married_status; MARRIED_STATUS 平均工资 总工资 总人数 ------------------------------ ---------- ---------- ---------- 10000 10000 1 single 5700 17100 3 married 8950 17900 2 这里就可以看到 single的平均工资 5700等于 总工资 17100 除以总人数 3 了。
复制
Null相关函数
下面来整理下null相关的主要函数
NVL(expr1,expr2) 如果 expr1为空,则输出expr2
如下 id 为3 的工资就输出为 0 了 select id,nvl(sal,0) "工资" from person; HR@orcl>select id,nvl(sal,0) "工资" from person; ID 工资 ---------- ---------- 1 8000 2 8900 3 0 4 9000 5 9100 6 10000 6 rows selected.
复制
NVL2(expr1,expr2,expr3) 如果 expr1 不为空, 则输出 expr2;如果expr1 为空, 则输出 expr3 。注意这里expr2,expr3数据类型是可以和expr1 不一样的。
如下 那个不领工资的员工就是老板没得说了 select id,nvl2(sal,'员工,有工资','老板,无工资') "员工情况" from person; HR@orcl>select id,nvl2(sal,'员工,有工资','老板,无工资') "员工情况" from person; ID 员工情况 ---------- ------------------ 1 员工,有工资 2 员工,有工资 3 老板,无工资 4 员工,有工资 5 员工,有工资 6 员工,有工资 6 rows selected.
复制
NULLIF(expr1,expr2),比较expr1,expr2的值,相等返回Null,不等则返回第一个值expr1
select id,nullif(MARRIED_STATUS,'single') "婚姻状态" from person; HR@orcl>select id,nullif(MARRIED_STATUS,'single') "婚姻状态" from person; ID 婚姻状态 ---------- ------------------------------ 1 2 married 3 4 married 5 6 6 rows selected.
复制
COALESCE(expr1, expr2, …, exprn) 返回第一个非空的值,如果所有值都为空那返回结果还是为空,这个就要求各个表达式数据类型一致了
select id,nullif(MARRIED_STATUS,SAL,ID) from person; HR@orcl>select id,coalesce(MARRIED_STATUS,SAL,ID) from person; select id,coalesce(MARRIED_STATUS,SAL,ID) from person * ERROR at line 1: ORA-00932: inconsistent datatypes: expected CHAR got NUMBER 来对测试表再加两列,同时重新赋值如下: alter table person add sal1 number(8,2); alter table person add sal2 number(8,2); update person set sal=8000 where id=1; update person set sal=null, sal1=8900, sal2=8910 where id=2; update person set sal=null,sal2=8900 where id=4; update person set sal=9100,sal1=9200,sal2=9300 where id=5; update person set sal=10000 where id=6; commit; HR@orcl>select * from person; ID MARRIED_STATUS REGION SAL SAL1 SAL2 ---------- ------------------------------ ---------- ---------- ---------- ---------- 1 single east 8000 2 married west 8900 8910 3 single south 4 married north 8900 5 single 9100 9200 9300 6 west 10000 6 rows selected. select id,coalesce(sal,sal1,sal2) from person; HR@orcl>select id,coalesce(sal,sal1,sal2) from person; ID COALESCE(SAL,SAL1,SAL2) ---------- ----------------------- 1 8000 2 8900 3 4 8900 5 9100 6 10000 6 rows selected.
复制
Null 对优化的影响
对测试表person增加点数据并收集下统计信息,同时创建另一张表person1,数据同person
declare i number; begin for i in 7..100000 loop if MOD(i,8)=0 then insert into person (id,married_status,region,sal) values (i,'married','east',trunc(dbms_random.value(8000,20000))); elsif MOD(i,8)=1 then insert into person (id,married_status,region,sal) values (i,'married','west',trunc(dbms_random.value(8000,20000))); elsif MOD(i,8)=2 then insert into person (id,married_status,region,sal) values (i,'married','south',trunc(dbms_random.value(8000,20000))); elsif MOD(i,8)=3 then insert into person (id,married_status,region,sal) values (i,'married','north',trunc(dbms_random.value(8000,20000))); elsif MOD(i,8)=4 then insert into person (id,married_status,region,sal) values (i,'single','east',trunc(dbms_random.value(8000,20000))); elsif MOD(i,8)=5 then insert into person (id,married_status,region,sal) values (i,'single','west',trunc(dbms_random.value(8000,20000))); elsif MOD(i,8)=6 then insert into person (id,married_status,region,sal) values (i,'single','south',trunc(dbms_random.value(8000,20000))); else insert into person (id,married_status,region,sal) values (i,'single','north',trunc(dbms_random.value(8000,20000))); end if; end loop; commit; end; / begin dbms_stats.gather_table_stats(ownname => 'HR', tabname => 'PERSON', estimate_percent => dbms_stats.auto_sample_size, method_opt => 'FOR ALL COLUMNS SIZE REPEAT', no_invalidate => false, cascade => true, degree => 6); end; / create table person1 as select * from person;
复制
1)Null值导致无法走索引而全表扫描
对person和person1的ID列创建单列索引,两张表都收集了统计信息 HR@orcl>create index ind_person_id on person (id); Index created. HR@orcl>create index ind_person1_id on person1 (id); Index created. HR@orcl>alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss'; Session altered. HR@orcl>select owner,table_name,num_rows,blocks, LAST_ANALYZED from dba_tables where table_name like 'PERSON%'; OWNER TABLE_NAME NUM_ROWS BLOCKS LAST_ANALYZED ------------------------------ ------------------------------ ---------- ---------- ------------------- HR PERSON 100000 496 2021-12-22 21:08:16 HR PERSON1 100000 400 2021-12-22 21:08:22 都查询语句:select count(*) from person/person1; HR@orcl>set autotrac on; HR@orcl>select count(*) from person; COUNT(*) ---------- 100000 Execution Plan ---------------------------------------------------------- Plan hash value: 1154882994 --------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 137 (1)| 00:00:02 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| PERSON | 100K| 137 (1)| 00:00:02 | --------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 442 consistent gets 0 physical reads 0 redo size 526 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed HR@orcl>select /*+ index(person ind_person_id) */ count(*) from person; COUNT(*) ---------- 100000 Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- Plan hash value: 1154882994 --------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 137 (1)| 00:00:02 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| PERSON | 100K| 137 (1)| 00:00:02 | --------------------------------------------------------------------- Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 442 consistent gets 0 physical reads 0 redo size 526 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed HR@orcl>select count(*) from person1; COUNT(*) ---------- 100000 Execution Plan ---------------------------------------------------------- Plan hash value: 2634609307 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 62 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | INDEX FAST FULL SCAN| IND_PERSON1_ID | 100K| 62 (0)| 00:00:01 | -------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 229 consistent gets 222 physical reads 0 redo size 526 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
复制
可以看到select count() from person语句走的是全表扫描 逻辑读消耗 442,select count() from person1 语句走的是索引快速全扫描 逻辑读消耗229,两者相差一倍,即使对select count(*) from person语句加hint 来促使其走索引也不会走。
这是因为对于B+树索引,Oracle是不存放Null值的,而person1对id 列加了 not null 约束,优化器就可以明确判断person1的ID不可能为空了,也就肯定所有ID值都会在索引上了,所以可以来对索引进行扫描间接知道全表数据而不用进行全表扫描了。
2)关于is null的优化
https://www.modb.pro/db/222582 文章有讲到使用联合索引的方式来促使 is null也可以走索引来避免全表扫描的办法,但是想想如果我们一开始就将Null值用个默认值替换,也就不用这么折腾的为了提升个is null的性能还要去创建个复合索引了。
3)使用位图索引
HR@orcl>create bitmap index ind_mar_person on person(married_status); Index created. HR@orcl>create bitmap index ind_mar_person1 on person1(married_status); Index created. HR@orcl>create bitmap index ind_reg_person on person(region); Index created. HR@orcl>create bitmap index ind_reg_person1 on person1(region); Index created. HR@orcl>select INDEX_NAME,INDEX_TYPE,TABLE_NAME,TABLE_OWNER from user_indexes where table_name like 'PERSON%'; INDEX_NAME INDEX_TYPE TABLE_NAME TABLE_OWNER ------------------------------ --------------------------- ------------------------------ ------------------------------ IND_REG_PERSON1 BITMAP PERSON1 HR IND_MAR_PERSON1 BITMAP PERSON1 HR IND_PERSON1_ID NORMAL PERSON1 HR IND_REG_PERSON BITMAP PERSON HR IND_MAR_PERSON BITMAP PERSON HR IND_PERSON_ID NORMAL PERSON HR 6 rows selected. HR@orcl>select * from person where region is null; ID MARRIED_STATUS REGION SAL SAL1 SAL2 ---------- ------------------------------ ---------- ---------- ---------- ---------- 5 single 9100 9200 9300 HR@orcl>update person set region='south' where region is null; 1 row updated. HR@orcl>update person1 set region='south' where region is null; 1 row updated. HR@orcl>alter table person1 modify region not null; Table altered. HR@orcl>select INDEX_NAME,INDEX_TYPE,TABLE_NAME,TABLE_OWNER from user_indexes where table_name like 'PERSON%'; INDEX_NAME INDEX_TYPE TABLE_NAME TABLE_OWNER ------------------------------ --------------------------- ------------------------------ ------------------------------ IND_REG_PERSON1 BITMAP PERSON1 HR IND_MAR_PERSON1 BITMAP PERSON1 HR IND_PERSON1_ID NORMAL PERSON1 HR IND_REG_PERSON BITMAP PERSON HR IND_MAR_PERSON BITMAP PERSON HR IND_PERSON_ID NORMAL PERSON HR 6 rows selected.
复制
可以看到如下两个执行计划第一个比第二个多了如下部分:而这部分也就是 region is null的判断部分 BITMAP MINUS BITMAP INDEX SINGLE VALUE| IND_REG_PERSON HR@orcl>select /*+ index(person ind_mar_person),index(person ind_reg_person) */* from person where married_status='single' and region !='south'; 37499 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3668889336 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 37499 | 878K| 362 (1)| 00:00:05 | | 1 | TABLE ACCESS BY INDEX ROWID | PERSON | 37499 | 878K| 362 (1)| 00:00:05 | | 2 | BITMAP CONVERSION TO ROWIDS | | | | | | | 3 | BITMAP MINUS | | | | | | | 4 | BITMAP MINUS | | | | | | |* 5 | BITMAP INDEX SINGLE VALUE| IND_MAR_PERSON | | | | | |* 6 | BITMAP INDEX SINGLE VALUE| IND_REG_PERSON | | | | | |* 7 | BITMAP INDEX SINGLE VALUE | IND_REG_PERSON | | | | | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("MARRIED_STATUS"='single') 6 - access("REGION" IS NULL) 7 - access("REGION"='south') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 2870 consistent gets 5 physical reads 0 redo size 1479681 bytes sent via SQL*Net to client 28008 bytes received via SQL*Net from client 2501 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 37499 rows processed HR@orcl>select /*+ index(person1 ind_mar_person1),index(person1 ind_reg_person1) */* from person1 where married_status='single' and region !='south'; 37499 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 887691403 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 37499 | 878K| 296 (1)| 00:00:04 | | 1 | TABLE ACCESS BY INDEX ROWID | PERSON1 | 37499 | 878K| 296 (1)| 00:00:04 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | | 3 | BITMAP MINUS | | | | | | |* 4 | BITMAP INDEX SINGLE VALUE| IND_MAR_PERSON1 | | | | | |* 5 | BITMAP INDEX SINGLE VALUE| IND_REG_PERSON1 | | | | | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("MARRIED_STATUS"='single') 5 - access("REGION"='south') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 2864 consistent gets 5 physical reads 0 redo size 1479681 bytes sent via SQL*Net to client 28008 bytes received via SQL*Net from client 2501 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 37499 rows processed
复制
Null 加默认值
总结
Null 可能会给程序计算、判断带来很多的麻烦,毕竟相信没谁会喜欢个啥也不是的玩意。
那给开发个建议就是设计表时能设置 not null就设置not null,不知存放什么就放个默认值好了。
评论
