重复组 - repeating group 在CODASYL中的作用

盖国强 2024-11-29

The term “repeating group” originally meant the concept in CODASYL and COBOL based languages where a single field could contain an array of repeating values. When E.F.Codd described his First Normal Form that was what he meant by a repeating group. The concept does not exist in any modern relational or SQL-based DBMS.

The term “repeating group” has also come to be used informally and imprecisely by database designers to mean a repeating set of columns, meaning a collection of columns containing similar kinds of values in a table. This is different to its original meaning in relation to 1NF. For instance in the case of a table called Families with columns named Parent1, Parent2, Child1, Child2, Child3, … etc the collection of Child N columns is sometimes referred to as a repeating group and assumed to be in violation of 1NF even though it is not a repeating group in the sense that Codd intended.

This latter sense of a so-called repeating group is not technically a violation of 1NF if each attribute is only single-valued. The attributes themselves do not contain repeating values and therefore there is no violation of 1NF for that reason. Such a design is often considered an anti-pattern however because it constrains the table to a predetermined fixed number of values (maximum N children in a family) and because it forces queries and other business logic to be repeated for each of the columns. In other words it violates the “DRY” principle of design. Because it is generally considered poor design it suits database designers and sometimes even teachers to refer to repeated columns of this kind as a “repeating group” and a violation of the spirit of the First Normal Form.

This informal usage of terminology is slightly unfortunate because it can be a little arbitrary and confusing (when does a set of columns actually constitute a repetition?) and also because it is a distraction from a more fundamental issue, namely the Null problem. All of the Normal Forms are concerned with relations that don’t permit the possibility of nulls. If a table permits a null in any column then it doesn’t meet the requirements of a relation schema satisfying 1NF. In the case of our Families table, if the Child columns permit nulls (to represent families who have fewer than N children) then the Families table doesn’t satisfy 1NF. The possibility of nulls is often forgotten or ignored in normalization exercises but the avoidance of unnecessary nullable columns is one very good reason for avoiding repeating sets of columns, whether or not you call them “repeating groups”.

重复组 "一词最初是指 CODASYL 和 COBOL 语言中的概念，即一个字段可以包含一个重复值数组。E.F.Codd 在描述他的第一正则表达式时，就是这个意思。在任何现代关系型数据库管理系统或基于 SQL 的数据库管理系统中都不存在这一概念。

术语 “重复组 ”也被数据库设计人员非正式地、不精确地用于指重复列集，即表中包含同类值的列集合。这与 1NF 的原始含义不同。例如，在一个名为 Families 的表中，列的名称为 Parent1、Parent2、Child1、Child2、Child3…等，Child N 列的集合有时被称为重复组，并被认为违反了 1NF 的规定，尽管它并不是 Codd 所希望的重复组。

如果每个属性都是单值，那么后一种意义上的所谓重复组在技术上并不违反 1NF。属性本身不包含重复值，因此不会因此而违反 1NF。然而，这种设计通常被认为是一种反模式，因为它将表中的值限制为预定的固定数量（一个族中最多有 N 个子表），并迫使查询和其他业务逻辑对每一列进行重复。换句话说，它违反了设计的 “DRY ”原则。由于这通常被认为是一种糟糕的设计，因此数据库设计人员，有时甚至是教师，都喜欢将这种重复列称为 “重复组”，并认为这违反了第一正则表达式的精神。

这种非正式的术语用法有点令人遗憾，因为它可能有点武断和令人困惑（一组列何时才真正构成重复？所有的规范表格都与不允许出现空值的关系有关。如果一个表允许任何列为空，那么它就不符合满足 1NF 的关系模式的要求。以我们的 “家庭 ”表为例，如果 “子女 ”列允许空（表示子女数少于 N 的家庭），那么 “家庭 ”表就不满足 1NF 要求。在规范化练习中，空的可能性经常被遗忘或忽略，但避免不必要的可空列是避免重复列集的一个很好的理由，无论你是否称它们为 “重复组”。

cobol