升级到Django 4.1后，将PostgreSQL ID从串行迁移到身份

原创 X丶 2022-10-27

433

Django 4.1的发布说明中有这样一个简短的说明:

在PostgreSQL上，AutoField、BigAutoField和SmallAutoField现在被创建为标识列，而不是带有序列的串行列。

在这篇文章中，我们将详细介绍这意味着什么，为什么您可能希望将现有列从串行类型更新为标识列，以及更新它们的管理命令。

串行和身份

什么是“身份”和“序列”列?基本上，它们是PostgreSQL创建自动递增ID列的两种不同方法。

最初PostgreSQL只有串行类型，像这样使用:

CREATE TABLE example (
    id serial NOT NULL PRIMARY KEY
);

这些串行类型不是真正的数据类型。相反，它是为其默认值创建列和序列的缩写。
PostgreSQL 10(2017年10月)增加了对sql标准标识列的支持，使用方式如下:

CREATE TABLE example (
    id integer NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
);

(文档在CREATE TABLE下，在页面上搜索“identity”。)

正如语法所示，列有一个数据类型，整数，然后在此之上有“标识”行为。有一个相关的序列来生成值，但PostgreSQL内部管理这个。

就常规使用而言，这两种类型没有太大区别。插入的行获得递增的整数主键，保证唯一。

但是标识列确实有一些优点:

它们是在SQL标准中定义的(因此冗长……)。同样的语法也适用于其他数据库，例如Oracle。

向用户授予权限更容易。使用串行类型，如果用户需要插入到表中，则需要授予他们对表和自动创建的序列的访问权限。通过标识列，可以为您处理这个问题。

管理标识列的下一个值更容易，这可以通过ALTER TABLE实现:

ALTER TABLE core_book ALTER COLUMN id RESTART WITH 1000;

对于串行类型，您必须找到自动创建的序列，然后更新它。

如果你用CREATE table…(LIKE…)，它的标识列序列被复制。串行类型的序列最终与副本共享，导致ID源的混淆，以及后来的权限问题。

标识列有一个“严格”模式，防止不小心使用值而不推进序列。如果你用GENERATED ALWAYS而不是GENERATED BY DEFAULT创建它们，PostgreSQL将阻止插入带有显式值的行——它们必须来自序列。您可以通过向INSERT语句添加子句override SYSTEM VALUE来覆盖该块，但这是显式的而不是隐式的。(不过目前Django还没有办法使用这种模式。)

有关标识列的更多信息，请参见:

PostgreSQL 10身份列由Peter Eisentraut解释，身份列特性的主要作者。(他的名字不再出现在文章中，因为它转移了域名，但在存档的原始文章中出现了。)

等待PostgreSQL 10 -身份列由depesz。

Django 4.1有什么变化

Ticket #30511更新了Django的各种大小的AutoField类，在PostgreSQL上使用标识列。最初，该票据讨论添加替代字段，但最终决定更改现有的字段类。此更改在Django 4.1中发布，与前面提到的发布说明一起发布。

例如，以这个模型类为例:

from django.db import models


class Book(models.Model):
    title = models.TextField()

它有一个典型的隐式id字段，使用BigAutoField(自Django 3.2以来的默认值)。
在4.0版本中，Django将id列设置为bigserial伪类型:

$ ./manage.py sqlmigrate core 0001
BEGIN;
--
-- Create model Book
--
CREATE TABLE "core_book" ("id" bigserial NOT NULL PRIMARY KEY, "title" text NOT NULL);
COMMIT;

But from version 4.1, Django uses the identity column syntax:

$ ./manage.py sqlmigrate core 0001
BEGIN;
--
-- Create model Book
--
CREATE TABLE "core_book" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "title" text NOT NULL);
COMMIT;

是否应该升级以前创建的列?

Django中的更改只影响新创建的列。Django不会迁移之前创建的任何串行列。

因此，现有数据库将为新表使用标识列，同时为已存在的表保留串行列。因此，您的分期和生产数据库最终将以新旧模型之间的类型混合而告终。与此同时，如果您偶尔重新构建本地开发数据库，那么最终只会在那里使用标识列。

这两种类型之间的区别在典型的Django代码中是不可见的。您可以继续执行常规的建模操作，如model .objects.create()，新ID将以相同的方式分配给这两种类型。

但是在将来的某个时候，您可能会遇到上面列出的一些差异。例如，您可能希望重置ID值(在测试中很有用)，因为不同类型之间的语法不同。因此，我认为最好在所有环境的数据库中更新旧的串行类型以标识列。

Simon Charette对这张票评论道:

我想我们还可以在发布说明中包含一个更新脚本，以供希望避免混合使用串行主键和IDENTITY主键的用户使用。

下面是我尝试以管理命令的形式编写的更新脚本。

从串行迁移到标识的管理命令

下面是一个管理命令的源代码，它可以将列从串行更新到标识。我只是把它包含在这篇文章中，因为我认为它不值得作为一个软件包发布。将其复制到您的项目中，例如

example/core/management/commands/migrate_postgres_serial_columns.py.

此代码是在MIT许可下发布的，没有任何形式的保证，等等。如果它坏了，你可以留着闪亮的部分。

该命令循环遍历当前数据库中所有串行查找列，检查它们的相关序列，如果使用——write禁用了“干运行模式”，则更新它们。迁移过程改编自Erwin Brandstetter对Stack Overflow问题的回答“如何将一个表ID从串行更改为标识?”

由于使用了nextval()，即使是演练模式也会提前序列值，留下一个“间隙”。你可以从其他操作中看到间隔，例如当提升一个副本时，你可以看到32个ID的间隔。(如果你需要无间隙ID，可以使用django-sequences。)

为了确保并发插入不会推进串行序列，更新过程在表上使用一个ACCESS EXCLUSIVE锁。这可以防止表上的所有并发活动，但我认为即使对于高并发环境也可以，因为更新很简短。作为预防措施，修改设置了一个短语句超时。

我已经在我的客户silverr的生产数据库上运行了这个命令。我们成功迁移了68个ID列，其中最大的表包含900万行。整个命令不到一秒钟就完成了。

这是代码:

from __future__ import annotations

import argparse
from collections.abc import Callable
from typing import Any

from django.core.management.base import BaseCommand
from django.db import DEFAULT_DB_ALIAS, connections
from django.db.backends.utils import CursorWrapper
from django.db.transaction import atomic


class Command(BaseCommand):
    help = "Migrate all tables using 'serial' columns to use 'identity' instead."

    def add_arguments(self, parser: argparse.ArgumentParser) -> None:
        parser.add_argument(
            "--database",
            default=DEFAULT_DB_ALIAS,
            help='Which database to update. Defaults to the "default" database.',
        )
        parser.add_argument(
            "--write",
            action="store_true",
            default=False,
            help="Actually edit the database.",
        )
        parser.add_argument(
            "--like",
            default="%",
            help="Filter affected tables with a SQL LIKE clause on name.",
        )

    def handle(
        self, *args: Any, database: str, write: bool, like: str, **kwargs: Any
    ) -> None:
        def output(text: str) -> None:
            self.stdout.write(text)
            self.stdout.flush()

        if not write:
            output("In dry run mode (--write not passed)")

        with connections[database].cursor() as cursor:
            cursor.execute(find_serial_columns, (like,))
            column_specs = cursor.fetchall()
            output(f"Found {len(column_specs)} columns to update")

            cursor.execute("SET statement_timeout='3s'")

            for table_name, column_name in column_specs:
                migrate_serial_to_identity(
                    database, write, output, cursor, table_name, column_name
                )


# Adapted from: https://dba.stackexchange.com/a/90567
find_serial_columns = """\
SELECT
     a.attrelid::regclass::text AS table_name,
     a.attname AS column_name
FROM pg_attribute a
WHERE
    a.attrelid::regclass::text LIKE %s
    AND a.attnum > 0
    AND NOT a.attisdropped
    AND a.atttypid = ANY ('{int,int8,int2}'::regtype[])
    AND EXISTS (
        SELECT FROM pg_attrdef ad
        WHERE
            ad.adrelid = a.attrelid
            AND ad.adnum = a.attnum
            AND (
                pg_get_expr(ad.adbin, ad.adrelid)
                =
                'nextval('''
                || (
                    pg_get_serial_sequence(a.attrelid::regclass::text, a.attname)
                )::regclass
                || '''::regclass)'
            )
    )
ORDER BY a.attnum
"""


def migrate_serial_to_identity(
    database: str,
    write: bool,
    output: Callable[[str], None],
    cursor: CursorWrapper,
    table_name: str,
    column_name: str,
) -> None:
    # Adapted from “How to change a table ID from serial to identity?”
    # answer on Stack Overflow:
    # https://stackoverflow.com/a/59233169

    qn = cursor.db.ops.quote_name
    print(f"{qn(table_name)}.{qn(column_name)}", flush=True)

    # Get sequence name
    cursor.execute(
        "SELECT pg_get_serial_sequence(%s, %s)",
        (table_name, column_name),
    )
    sequence_name = cursor.fetchone()[0]
    print(f"    Sequence: {sequence_name}", flush=True)

    with atomic(using=database):
        # Prevent concurrent inserts so we know the sequence is fixed
        if write:
            cursor.execute(
                f"LOCK TABLE {qn(table_name)} IN ACCESS EXCLUSIVE MODE",
            )

        # Get next sequence value
        cursor.execute("SELECT nextval(%s)", (sequence_name,))
        next_value = cursor.fetchone()[0]

        print(f"    Next value: {next_value}", flush=True)

        if write:
            # Drop default, sequence
            cursor.execute(
                f"""\
                ALTER TABLE {qn(table_name)}
                    ALTER COLUMN {qn(column_name)} DROP DEFAULT
                """
            )
            cursor.execute(f"DROP SEQUENCE {sequence_name}")

            # Change column to identity
            cursor.execute(
                f"""\
                ALTER TABLE {qn(table_name)}
                    ALTER {qn(column_name)}
                        ADD GENERATED BY DEFAULT AS IDENTITY (RESTART %s)
                """,
                (next_value,),
            )

            print("    Updated.", flush=True)

示例使用

下面是该命令在一个只有上述Book模型的示例项目上的一些运行。

首先，运行该命令识别可迁移列:

$ ./manage.py migrate_postgres_serial_columns
In dry run mode (--write not passed)
Found 2 columns to update
"django_migrations"."id"
    Sequence: public.django_migrations_id_seq
    Next value: 2
"core_book"."id"
    Sequence: public.core_book_id_seq
    Next value: 1

非托管的django_migrations表也会出现，很狡猾。

如果你想一次只关注一个或几个表，你可以使用——like用SQL like子句过滤表名:

$ ./manage.py migrate_postgres_serial_columns --like 'core_%s'
In dry run mode (--write not passed)
Found 1 columns to update
"core_book"."id"
    Sequence: public.core_book_id_seq
    Next value: 2

使用–write运行实际上会迁移列：

$ ./manage.py migrate_postgres_serial_columns --write
Found 2 columns to update
"django_migrations"."id"
    Sequence: public.django_migrations_id_seq
    Next value: 3
    Updated.
"core_book"."id"
    Sequence: public.core_book_id_seq
    Next value: 2
    Updated.

在此之后重新运行，找不到更多要更新的列：

$ ./manage.py migrate_postgres_serial_columns
In dry run mode (--write not passed)
Found 0 columns to update

因为该命令只作用于使用serial的列，所以在必要时重新运行是安全的。

你可以用psql验证迁移的效果，最简单的方法是通过Django的dbshell命令。在更新之前，你可以在一个自动创建的序列上看到"Default"是nextval():

$ ./manage.py dbshell
psql (14.5 (Homebrew))
Type "help" for help.

example=# \d core_book
                            Table "public.core_book"
 Column |  Type  | Collation | Nullable |                Default
--------+--------+-----------+----------+---------------------------------------
 id     | bigint |           | not null | nextval('core_book_id_seq'::regclass)
 title  | text   |           | not null |
Indexes:
    "core_book_pkey" PRIMARY KEY, btree (id)

更新后，此更改将默认生成为标识：

$ ./manage.py dbshell
psql (14.5 (Homebrew))
Type "help" for help.

example=# \d core_book
                         Table "public.core_book"
 Column |  Type  | Collation | Nullable |             Default
--------+--------+-----------+----------+----------------------------------
 id     | bigint |           | not null | generated by default as identity
 title  | text   |           | not null |
Indexes:
    "core_book_pkey" PRIMARY KEY, btree (id)

原文标题：Migrate PostgreSQL ID’s from serial to identity after upgrading to Django 4.1
原文作者：Adam Johnson
原文地址：https://adamj.eu/tech/2022/10/21/migrate-postgresql-ids-serial-identity-django-4.1/

postgresql 墨力翻译

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者