no test to determine which structure is in effect—then P
will fail on at least three of the remaining structures. More
specifically, if P succeeds with structure 5, it will fail with
all the others; if P succeeds with structure 3 or 4, it will fail
with at least 1,2, and 5; if P succeeds with 1 or 2, it will
fail with at least 3, 4, and 5. The reason is simple in each
case. In the absence of a test to determine which structure
is in effect, P fails because an attempt is made to exceute a
reference to a nonexistent file (available systems treat this
as an error) or no attempt is made to execute a reference to
a file containing needed information. The reader who is not
convinced should develop sample programs for this simple
problem.
Since, in general, it is not practical to develop application
programs which test for all tree structurings permitted by
the system, these programs fail when a change in structure
becomes necessary.
Systems which provide users with a network model of
the data run into similar difficulties. In both the tree and
network cases, the user (or his program) is required to ex-
ploit a collection of user access paths to the data. It does
not matter whether these paths are in close correspondence
with pointer-defined paths in the stored representation—in
IDS the correspondence is extremely simple, in TDMS it is
just the opposite. The consequence, regardless of the stored
representation, is that terminal activities and programs be-
come dependent on the continued existence of the user access
paths.
One solution to this is to adopt the policy that once a user
access path is defined it will not be made obsolete until all
application programs using that path have become obsolete.
Such a policy is not practical, because the number of access
paths in the total model for the community of users of a
data bank would eventually become excessively large.
1.3 A Relational View of Data
The term relation is used here in its accepted mathemat-
ical sense. Given sets S
1
,S
2
,...,S
n
(not necessarily dis-
tinct), R is a relation on these n sets if it is a set of n-tuples
each of which has its first element from S
1
, its second ele-
ment from S
2
, and so on.
1
We shall refer to S
j
as the jth
domain of R. As defined above, R is said to have degree n.
Relations of degree 1 are often called unary, degree 2 binary,
degree 3 ternary, and degree n n-ary.
For expository reasons, we shall frequently make use of an
array representation of relations, but it must be remembered
that this particular representation is not an essential part
of the relational view being expounded. An array which
represents an n-ary relation R has the following properties:
1. Each row represents an n-tuple of R.
2. The ordering of rows is immaterial.
3. All rows are distinct.
4. The ordering of columns is significant—it corresponds
to the ordering S
1
,S
2
,...,S
n
of the domains on which
R is defined (see, however, remarks below on domain-
ordered and domain-unordered relations).
5. The significance of each column is partially conveyed
by labeling it with the name of the corresponding do-
main.
1
More concisely, R is a subset of the Cartesian product S
1
×
S
2
× ...× S
n
.
supply (supplier part project quantity)
1 2 5 17
1 3 5 23
2 37 9
2 75 4
4 1 1 12
Fig. 1. A relation of degree 4
component (part part quantity)
15 9
25 7
35 2
2 6 12
36 3
47 1
67 1
Fig. 2. A relation with two identical domains
The example in Figure 1 illustrates a relation of degree
4, called supply, which reflects the shipments-in-progress of
parts from specified suppliers to specified projects in speci-
fied quantities.
One might ask: If the columns are labeled by the name of
corresponding domains, why should the ordering of columns
matter? As the example in Figure 2 shows, two columns
may have identical headings (indicating identical domains)
but possess distinct meanings with respect to the relation.
The relation depicted is called component. It is a ternary
relation, whose first two domains are called part and third
domain is called quantity. The meaning of component (x, y,
z) is that part x is an immediate component (or subassem-
bly) of part y, and z units of part x are needed to assemble
one unit of part y. It is a relation which plays a critical role
in the parts explosion problem.
It is a remarkable fact that several existing information
systems (chiefly those based on tree-structured files) fail to
provide data representations for relations which have two or
more identical domains. The present version of IMS/360 [5]
is an example of such a system.
The totality of data in a data bank may be viewed as a
collection of time-varying relations. These relations are of
assorted degrees. As time progresses, each n-ary relation
may be subject to insertion of additional n-tuples, deletion
of existing ones, and alteration of components of any of its
existing n-tuples.
In many commercial, governmental, and scientific data
banks, however, some of the relations are of quite high de-
gree (a degree of 30 is not at all uncommon). Users should
not normally be burdened with remembering the domain
ordering of any relation (for example, the ordering supplier,
then part, then project, then quantity in the relation sup-
ply). Accordingly, we propose that users deal, not with
relations which are domain-ordered, but with relationships
which are their domain-unordered counterparts.
2
To accom-
plish this, domains must be uniquely identifiable at least
within any given relation, without using position. Thus,
where there are two or more identical domains, we require
in each case that the domain name be qualified by a distinc-
tive role name, which serves to identify the role played by
that domain in the given relation. For example, in the rela-
tion component of Figure 2, the first domain part might be
2
In mathematical terms, a relationship is an equivalence
class of those relations that are equivalent under permu-
tation of domains (see Section 2.1.1).
文档被以下合辑收录
相关文档
评论