数据库基本概念与关系数据模型(Database Fundamentals and the Relational Data Model)
数据库的四个基本概念(Four Basic Concepts of Database Systems)
数据(Data)
- 数据(Data)是数据库中存储的基本对象。
- Data is the basic object stored in a database.
- 数据的定义:描述事物的符号记录。
- Data can be defined as symbolic records that describe things.
- 数据的形式可以是数字、文本、图形、音频、视频等。
- Data may take the form of numbers, text, graphics, audio, video, and so on.
- 数据的含义称为数据的语义(Semantics),数据与其语义不可分离。
- The meaning of data is called semantics, and data cannot be separated from its semantics.
示例:
(2018002, 刘晨, 女, 1999-09-01, 计算机科学与技术)对应语义:学号、姓名、性别、出生日期、主修专业。
数据库(Database, DB)
- 数据库是长期存储在计算机内、有组织、可共享的大量数据的集合。
- A database is an organized collection of shared data stored in a computer for a long period of time.
- 建立数据库的目的之一,是将应用所需的大量数据集中保存,以便进一步加工处理、提取信息并形成知识。
- One important purpose of a database is to store large amounts of application data for further processing, information extraction, and knowledge discovery.
- 数据库的基本特征包括:
- 按一定的数据模型(Data Model)组织、描述和存储数据。
- Data is organized, described, and stored according to a specific data model.
- 冗余度较低。
- Data redundancy is relatively low.
- 数据独立性较高。
- Data independence is relatively high.
- 可扩展性较强。
- The database is relatively easy to extend.
数据库管理系统(Database Management System, DBMS)
- DBMS 是计算机系统中的基础软件,也是一个大型、复杂的数据管理软件系统。
- A DBMS is a fundamental software system for managing data, and it is typically large and complex.
- 它位于用户与操作系统之间,负责对数据进行定义、组织、存储、维护和操纵。
- It sits between users and the operating system, and is responsible for defining, organizing, storing, maintaining, and manipulating data.
- 主要功能包括:
- 数据定义功能(Data Definition)
- 提供数据定义语言(DDL, Data Definition Language)。
- It provides a Data Definition Language (DDL).
- 定义数据库中的数据对象及其结构。
- It defines data objects and their structures in the database.
- 数据组织、存储和管理(Data Organization, Storage, and Management)
- 分类组织、存储和管理用户数据与元数据(Metadata)。
- It organizes, stores, and manages both user data and metadata.
- 提供多种存取方法以提高存取效率。
- It offers multiple access methods to improve efficiency.
- 数据操纵功能(Data Manipulation)
- 提供数据操纵语言(DML, Data Manipulation Language)。
- It provides a Data Manipulation Language (DML).
- 实现查询、插入、删除和修改等基本操作。
- It supports basic operations such as querying, inserting, deleting, and updating data.
- 数据定义功能(Data Definition)
数据库系统(Database System, DBS)
- 数据库系统是指在计算机系统中引入数据库技术后形成的整体系统。
- A database system is the complete system formed after database technology is introduced into a computer system.
- 它通常包括以下几个部分:
- A database system usually consists of the following parts:
- 数据库(Database)
- 数据库管理系统(DBMS)及其外围开发工具
- 应用系统(Application System)
- 数据库管理员(Database Administrator, DBA)
本节要点(Key Takeaways)
- 数据(Data)是带有语义的符号记录。
- Data is a symbolic record with meaning attached to it.
- 数据库(DB)是有组织、可共享、长期存储的数据集合。
- A database is an organized and shared collection of long-term stored data.
- DBMS 是管理数据库的软件系统,负责定义、存储和操纵数据。
- A DBMS is the software system that defines, stores, and manipulates data in a database.
- DBS 是由数据库、DBMS、应用系统和管理员共同组成的整体环境。
- A DBS is the overall environment consisting of the database, the DBMS, application systems, and administrators.
数据模型与关系模型(Data Models and the Relational Model)
什么是数据模型(What Is a Data Model?)
- 数据模型(Data Model)是用于描述数据或信息的一种记号体系。
- A data model is a notation for describing data or information.
- 一个数据模型通常由三个核心部分组成:
- A data model usually consists of three core parts:
- 结构(Structure):数据如何组织。
- Structure: how data is organized.
- 操作(Operations):可以对数据执行哪些操作。
- Operations: what actions can be performed on the data.
- 约束(Constraints):数据必须满足哪些规则。
- Constraints: what rules the data must satisfy.
- 常见数据模型包括:
- 关系模型(Relational Model)
- 以及其扩展形式,如对象-关系模型(Object-Relational Model)
- 图数据模型(Graph Data Model)
- 如 RDF、Property Graph
- 关系模型(Relational Model)
- 其他数据模型还包括:
- 层次模型(Hierarchical Data Model)
- 网状模型(Network Data Model)
- XML 数据模型(XML Data Model)
- 面向对象模型(Object-Oriented Model)
关系模型简介(Relational Model in Brief)
| title | year | length | genre |
|---|---|---|---|
| Gone With the Wind | 1939 | 231 | drama |
| Star Wars | 1977 | 124 | sciFi |
| Wayne’s World | 1992 | 95 | comedy |
- 结构(Structure):可以直观地把关系看成一张二维表。
- Structure: a relation can be viewed intuitively as a two-dimensional table.
- 列名(Column Headers)对应属性名(Attributes)。
- Column headers correspond to attributes.
- 每一行(Row)对应一个元组(Tuple)。
- Each row corresponds to a tuple.
- 操作(Operations):关系模型强调对整个表进行操作,例如关系代数(Relational Algebra)中的选择、投影、连接等。
- Operations: the relational model emphasizes operations on whole tables, such as selection, projection, and join in relational algebra.
- 例如:找出所有
genre = 'comedy'的电影。 - For example, we may ask for all movies whose
genre = 'comedy'.
- 例如:找出所有
- 约束(Constraints):关系中的数据必须满足一定规则。
- Constraints: data in a relation must satisfy certain rules.
- 例如:
genre必须取自一个固定集合(a fixed list of genres)。 - For example,
genremust take its value from a fixed list of genres. - 例如:不允许两行数据具有完全相同的键值。
- For example, no two rows may share the same key value.
- 例如:
类比说明:关系可以帮助我们联想到“结构体数组”,但这只是便于理解的类比。严格来说,关系模型还强调集合语义、约束规则以及基于代数的操作方式。
关系(Relation)
- 关系(Relation)可以直观理解为一张二维表。
- A relation can be viewed intuitively as a two-dimensional table.
- 但从理论上说,关系更接近“满足一定条件的元组集合(Set of Tuples)”。
- In theory, however, a relation is closer to a set of tuples satisfying certain conditions.
- 因此,元组的先后顺序通常不是关系的核心语义。
- Therefore, the order of tuples is usually not part of the essential meaning of a relation.
属性(Attribute)
- 属性(Attribute)是关系中的列。
- An attribute is a column in a relation.
- 每个属性都描述了该列数据的含义。
- Each attribute describes the meaning of the data in that column.
模式(Schema)
- 模式(Schema)描述关系的结构设计。
- A schema describes the structural design of a relation.
- 常见写法为:
关系名(属性1, 属性2, ...)。 - It is commonly written in the form
RelationName(attribute1, attribute2, ...). - 例如:
Movies(title, year, length)。 - For example:
Movies(title, year, length).
元组(Tuple)
- 元组(Tuple)是关系表中的一行数据,表示一条具体记录。
- A tuple is a row in a relation and represents a specific record.
域(Domain)
- 域(Domain)是属性允许取值的集合。
- A domain is the set of values allowed for an attribute.
- 在实际 SQL 实现中,域常通过数据类型(Data Type)和约束(Constraint)来近似表达。
- In SQL implementations, a domain is often approximated through a data type together with constraints.
- 例如:年龄这一属性的域可以是“所有合理的非负整数”。
- For example, the domain of an age attribute may be all reasonable non-negative integers.
关系的键(Keys of Relations)
- 键约束(Key Constraint)是关系模型中最基本、最重要的约束之一。
- Key constraints are among the most fundamental constraints in the relational model.
- 键(Key)是能够唯一标识一行数据的属性集合。
- A key is a set of attributes that can uniquely identify a tuple.
- 例如,在
Movies(title, year, length, genre)中,(title, year)可以作为一个键。 - For example, in
Movies(title, year, length, genre),(title, year)can serve as a key.
常见键的相关概念如下:
- 超键(Super Key)
- 任何能够唯一标识元组的属性集合。
- A super key is any set of attributes that uniquely identifies a tuple.
- 候选键(Candidate Key)
- 最小超键,即去掉其中任何一个属性后就不能再唯一标识元组。
- A candidate key is a minimal super key: removing any attribute from it destroys uniqueness.
- 主键(Primary Key)
- 从候选键中选出的、作为主要标识方式的那个键。
- A primary key is the candidate key chosen as the main identifier.
Tuple(元组):关系表中的一行数据,表示一个具体记录。
Key(键):能够唯一标识某个 tuple 的属性或属性组合。
Primary Key(主键):被选中用于主要标识元组的候选键。
外键(Foreign Key)
- 外键(Foreign Key)是一个关系中的属性(或属性组),其取值引用另一个关系中的主键或候选键。
- A foreign key is an attribute or set of attributes in one relation whose values reference a primary key or candidate key in another relation.
- 外键用于表达关系之间的联系(Relationship)。
- A foreign key is used to represent relationships between relations.
- 例如:
Movies.studioName可以引用Studio.name,表示一部电影由哪个制片厂发行或制作。 - For example,
Movies.studioNamemay referenceStudio.name, indicating which studio produced or distributed a movie.
电影数据库模式示例(Example Database Schema for Movies)
Movies(电影)
Movies(
title: string,
year: integer,
length: integer,
genre: string,
studioName: string,
producerCert: integer
)
MovieStar(演员)
MovieStar(
name: string,
address: string,
gender: char,
birthdate: date
)
StarsIn(出演关系)
StarsIn(
movieTitle: string,
movieYear: integer,
starName: string
)
MovieExec(电影高管 / 制片人)
MovieExec(
name: string,
address: string,
cert: integer,
netWorth: integer
)
Studio(制片厂)
Studio(
name: string,
address: string,
presCert: integer
)
模式关系总览(Schema Overview)
-
Movies.producerCert -> MovieExec.cert -
Movies.studioName -> Studio.name -
Studio.presCert -> MovieExec.cert -
StarsIn.(movieTitle, movieYear) -> Movies.(title, year) -
StarsIn.starName -> MovieStar.name -
StarsInconnects movies and actors. -
Moviesreferences bothStudioandMovieExecthrough foreign keys. -
The schema shows how multiple relations are linked together in a relational database.
概念层与实现层(Conceptual Layer vs. Implementation Layer)
| 概念层(Conceptual Layer) | 实现层(Implementation Layer in SQL) |
|---|---|
| 关系(Relation) | 表(Table) |
| 属性(Attribute) | 列(Column) |
| 元组(Tuple) | 行(Row) |
| 域(Domain) | 数据类型 + 约束(Data Type + Constraint) |
| 键(Key) | PRIMARY KEY / UNIQUE |
| 外键(Foreign Key) | FOREIGN KEY |
- 概念层关注“数据是什么、彼此如何关联”。
- The conceptual layer focuses on what the data means and how different pieces of data are related.
- 实现层关注“如何在具体 DBMS 中把这些概念落实为表结构和 SQL 语句”。
- The implementation layer focuses on how these concepts are realized as tables and SQL statements in a concrete DBMS.
本节要点(Key Takeaways)
- 数据模型由结构、操作和约束三部分组成。
- A data model consists of structure, operations, and constraints.
- 关系模型中的核心概念包括关系、属性、模式、元组、域、键和外键。
- The core concepts of the relational model include relation, attribute, schema, tuple, domain, key, and foreign key.
- 主键用于唯一标识元组,外键用于表达关系之间的引用联系。
- A primary key uniquely identifies a tuple, while a foreign key expresses references between relations.
- 理论上的“关系”与实现中的“表”密切相关,但并不完全等同。
- A theoretical relation is closely related to, but not exactly the same as, a table in implementation.
SQL 语言基础(SQL Basics: Definition, Query, and Modification)
说明:以下 SQL 示例尽量采用接近标准 SQL 的写法,不同数据库系统(如 MySQL、PostgreSQL、SQLite)在细节上可能略有差异。
Note: The following SQL examples are written in a style close to standard SQL, although specific details may vary across systems such as MySQL, PostgreSQL, and SQLite.
创建数据库(Creating a Database)
- 常用关键字:
CREATE DATABASE - Common keyword:
CREATE DATABASE - 示例:
- Example:
CREATE DATABASE moviedb;
SQL 的两个方面(Two Aspects of SQL)
- 数据定义语言(DDL, Data Definition Language)
- 用于声明数据库模式(Schema),例如创建表、修改表、删除表。
- DDL is used to declare database schemas, for example by creating, altering, or dropping tables.
- 数据操纵语言(DML, Data Manipulation Language)
- 用于查询和修改数据,例如
SELECT、INSERT、UPDATE、DELETE。 - DML is used to query and modify data, for example with
SELECT,INSERT,UPDATE, andDELETE.
- 用于查询和修改数据,例如
表声明(Table Declarations)
CREATE TABLE MovieExec (
name CHAR(30),
address VARCHAR(255),
cert INT PRIMARY KEY,
netWorth INT
);
CREATE TABLE Studio (
name CHAR(50) PRIMARY KEY,
address VARCHAR(255),
presCert INT,
FOREIGN KEY (presCert) REFERENCES MovieExec(cert)
);
CREATE TABLE Movies (
title CHAR(100),
year INT,
length INT,
genre CHAR(10),
studioName CHAR(30),
producerCert INT,
PRIMARY KEY (title, year),
FOREIGN KEY (studioName) REFERENCES Studio(name),
FOREIGN KEY (producerCert) REFERENCES MovieExec(cert)
);
CREATE TABLE MovieStar (
name CHAR(30),
address VARCHAR(255),
gender CHAR(1),
birthdate DATE,
PRIMARY KEY (name)
);
CREATE TABLE StarsIn (
movieTitle CHAR(100),
movieYear INT,
starName CHAR(30),
PRIMARY KEY (movieTitle, movieYear, starName),
FOREIGN KEY (movieTitle, movieYear) REFERENCES Movies(title, year),
FOREIGN KEY (starName) REFERENCES MovieStar(name)
);
-- 以下数据仅用于教学示例,不对应现实世界中的真实人物信息。
INSERT INTO MovieExec VALUES ('George Lucas', 'Oak Rd.', 555, 200000000);
INSERT INTO MovieExec VALUES ('Ted Turner', 'Turner Av.', 333, 125000000);
INSERT INTO MovieExec VALUES ('Stephen Spielberg', '123 ET Road', 222, 100000000);
INSERT INTO MovieExec VALUES ('Merv Griffin', 'Riot Rd.', 199, 112000000);
INSERT INTO MovieExec VALUES ('Calvin Coolidge', 'Fast Lane', 123, 20000000);
INSERT INTO MovieExec VALUES ('Garry Marshall', 'First Street', 999, 50000000);
INSERT INTO MovieExec VALUES ('J.J. Abrams', 'High Road', 345, 45000000);
INSERT INTO MovieExec VALUES ('Bryan Singer', 'Downtown', 456, 70000000);
INSERT INTO MovieExec VALUES ('George Roy Hill', 'Baldwin Av.', 789, 20000000);
INSERT INTO MovieExec VALUES ('Dino De Laurentiis', 'Beverly Hills', 666, 120000000);
INSERT INTO Studio VALUES ('MGM', 'MGM Boulevard', 123);
INSERT INTO Studio VALUES ('Fox', 'Hollywood', 555);
INSERT INTO Studio VALUES ('Disney', 'Buena Vista', 999);
INSERT INTO Studio VALUES ('Paramount', 'Hollywood', 345);
INSERT INTO Studio VALUES ('Universal', 'Hollywood', 789);
INSERT INTO Movies VALUES ('Logan''s Run', 1976, NULL, 'sciFi', 'MGM', 123);
INSERT INTO Movies VALUES ('Star Wars', 1977, 124, 'sciFi', 'Fox', 555);
INSERT INTO Movies VALUES ('Empire Strikes Back', 1980, 111, 'fantasy', 'Fox', 555);
INSERT INTO Movies VALUES ('Star Trek', 1979, 132, 'sciFi', 'Paramount', 345);
INSERT INTO Movies VALUES ('Star Trek: Nemesis', 2002, 116, 'sciFi', 'Paramount', 345);
INSERT INTO Movies VALUES ('Terms of Endearment', 1983, 132, 'romance', 'MGM', 123);
INSERT INTO Movies VALUES ('The Usual Suspects', 1995, 106, 'crime', 'MGM', 456);
INSERT INTO Movies VALUES ('Gone With the Wind', 1939, 238, 'drama', 'MGM', 123);
INSERT INTO Movies VALUES ('Wayne''s World', 1992, 95, 'comedy', 'Paramount', 123);
INSERT INTO Movies VALUES ('King Kong', 2005, 187, 'drama', 'Universal', 789);
INSERT INTO Movies VALUES ('King Kong', 1976, 134, 'drama', 'Paramount', 666);
INSERT INTO Movies VALUES ('King Kong', 1933, 100, 'drama', 'Universal', 345);
INSERT INTO Movies VALUES ('Pretty Woman', 1990, 119, 'comedy', 'Disney', 999);
INSERT INTO MovieStar VALUES ('Jane Fonda', 'Turner Av.', 'F', DATE '1977-07-07');
INSERT INTO MovieStar VALUES ('Alec Baldwin', 'Baldwin Av.', 'M', DATE '1977-06-07');
INSERT INTO MovieStar VALUES ('Kim Basinger', 'Baldwin Av.', 'F', DATE '1979-05-07');
INSERT INTO MovieStar VALUES ('Harrison Ford', 'Beverly Hills', 'M', DATE '1977-07-07');
INSERT INTO MovieStar VALUES ('Carrie Fisher', '123 Maple St.', 'F', DATE '1999-09-09');
INSERT INTO MovieStar VALUES ('Mark Hamill', '456 Oak Rd.', 'M', DATE '1988-08-08');
INSERT INTO MovieStar VALUES ('Debra Winger', 'A Way', 'F', DATE '1978-05-06');
INSERT INTO MovieStar VALUES ('Jack Nicholson', 'X Path', 'M', DATE '1949-05-05');
INSERT INTO MovieStar VALUES ('Kevin Spacey', 'New York Av.', 'M', DATE '1937-12-21');
INSERT INTO StarsIn VALUES ('Star Wars', 1977, 'Carrie Fisher');
INSERT INTO StarsIn VALUES ('Star Wars', 1977, 'Mark Hamill');
INSERT INTO StarsIn VALUES ('Star Wars', 1977, 'Harrison Ford');
INSERT INTO StarsIn VALUES ('Empire Strikes Back', 1980, 'Harrison Ford');
INSERT INTO StarsIn VALUES ('The Usual Suspects', 1995, 'Kevin Spacey');
INSERT INTO StarsIn VALUES ('Terms of Endearment', 1983, 'Debra Winger');
INSERT INTO StarsIn VALUES ('Terms of Endearment', 1983, 'Jack Nicholson');
查询示例(Query Example)
SELECT *
FROM Movies
WHERE studioName = 'Disney' AND year = 1990
ORDER BY length, title;
含义说明:
- 从
Movies表中查询数据。 - This query retrieves data from the
Moviestable. - 条件为:
studioName = 'Disney'且year = 1990。 - The condition requires
studioName = 'Disney'andyear = 1990. - 查询结果按照
length和title升序排序。 - The result is sorted in ascending order by
lengthandtitle.
本节要点(Key Takeaways)
- SQL 不仅用于查询,还用于定义和修改数据库结构。
- SQL is used not only for querying data, but also for defining and modifying database structures.
CREATE DATABASE用于创建数据库。CREATE DATABASEis used to create a database.CREATE TABLE、PRIMARY KEY、FOREIGN KEY分别用于定义表、主键和外键。CREATE TABLE,PRIMARY KEY, andFOREIGN KEYare used to define tables, primary keys, and foreign keys respectively.- 复合主键(Composite Primary Key)说明单个属性不足以唯一标识记录,需要多个属性共同确定一条元组。
- A composite primary key shows that a single attribute is not enough to identify a tuple uniquely, so multiple attributes must be combined.
SQL 数据类型与模式演化(SQL Data Types and Schema Evolution)
数据类型(Data Types)
所有属性都必须指定数据类型(Data Type)。
Every attribute must be assigned a data type.
字符串类型(String Types)
CHAR(n):固定长度字符串。CHAR(n): fixed-length string.VARCHAR(n):可变长度字符串。VARCHAR(n): variable-length string.
位串类型(Bit-String Types)
BIT(n)BIT VARYING(n)
逻辑类型(Boolean Type)
BOOLEAN- Boolean values are typically represented by the
BOOLEANtype. TRUEFALSEUNKNOWN
- Boolean values are typically represented by the
整数类型(Integer Types)
INT/INTEGERSMALLINT- 某些教材中也会出现
SHORTINT这一写法
浮点与数值类型(Floating-Point and Numeric Types)
- 浮点数类型(Floating-Point Types)
FLOATREALDOUBLE PRECISION
- 定点数类型(Exact Numeric Types)
DECIMAL(n, d)NUMERIC(n, d)
其中:
n表示总位数(precision)ndenotes the total number of digits (precision).d表示小数位数(scale)ddenotes the number of digits after the decimal point (scale).
示例:
DECIMAL(6,2)
→ 0123.45
日期与时间类型(Date and Time Types)
DATETIME
示例:
DATE '1948-05-14'
TIME '15:00:02.5'
说明:DATE 和 TIME 虽然在字面量写法上看起来像字符串,但在 DBMS 中它们属于专门的数据类型,具有独立的校验、比较和运算规则。
Although DATE and TIME literals may look like strings, they are special data types in a DBMS with their own validation, comparison, and arithmetic rules.
修改关系模式(Modifying Relation Schemas)
删除表(Dropping a Table)
DROP TABLE R;
添加属性(Adding a Column)
ALTER TABLE MovieStar
ADD phone CHAR(16);
删除属性(Dropping a Column)
ALTER TABLE MovieStar
DROP COLUMN birthdate;
默认值(Default Values)
可以为属性设置默认值(Default Value)。
Default values can be assigned to attributes.
示例:
gender CHAR(1) DEFAULT '?'
note VARCHAR(100) DEFAULT 'N/A'
添加带默认值的新属性:
ALTER TABLE MovieStar
ADD phone CHAR(16) DEFAULT 'unlisted';
本节要点(Key Takeaways)
- 域在 SQL 中通常通过数据类型和约束共同体现。
- In SQL, domains are usually represented through data types together with constraints.
DATE、TIME是专门的数据类型,而不是普通字符串。DATEandTIMEare special data types rather than ordinary strings.ALTER TABLE可以用于向现有表中添加或删除列。ALTER TABLEcan be used to add or remove columns from an existing table.- 默认值可以减少插入数据时的重复输入,并为缺省场景提供统一行为。
- Default values reduce repetitive input and provide consistent behavior for missing values.
术语表(Glossary)
| 中文 | English | 简要说明 |
|---|---|---|
| 数据 | Data | 数据库中存储的基本对象。 |
| 数据库 | Database | 有组织、可共享、长期存储的数据集合。 |
| 数据库管理系统 | Database Management System (DBMS) | 管理数据库的软件系统。 |
| 数据库系统 | Database System (DBS) | 由数据库、DBMS、应用和管理员构成的整体环境。 |
| 数据模型 | Data Model | 描述数据结构、操作和约束的记号体系。 |
| 关系 | Relation | 可视为二维表,本质上更接近元组集合。 |
| 属性 | Attribute | 关系中的列。 |
| 元组 | Tuple | 关系中的一行记录。 |
| 模式 | Schema | 关系的结构定义。 |
| 域 | Domain | 属性允许取值的集合。 |
| 键 | Key | 能唯一标识元组的属性集合。 |
| 主键 | Primary Key | 被选中作为主要标识方式的候选键。 |
| 外键 | Foreign Key | 引用其他关系中键值的属性或属性组。 |
| 数据定义语言 | Data Definition Language (DDL) | 用于定义和修改数据库结构的语言。 |
| 数据操纵语言 | Data Manipulation Language (DML) | 用于查询和修改数据的语言。 |