暂无图片
暂无图片
6
暂无图片
暂无图片
暂无图片

OpenGauss 并发重建索引代码实现

原创 吴松 云和恩墨 2021-09-16
1677

本文主要讲解并发创建索引过程中,索引数据追加部分的原理和代码实现。
先看一下代码中关于这部分功能实现的注释。
/*

  • validate_index - support code for concurrent index builds
  • We do a concurrent index build by first inserting the catalog entry for the
  • index via index_create(), marking it not indisready and not indisvalid.
  • Then we commit our transaction and start a new one, then we wait for all
  • transactions that could have been modifying the table to terminate. Now
  • we know that any subsequently-started transactions will see the index and
  • honor its constraints on HOT updates; so while existing HOT-chains might
  • be broken with respect to the index, no currently live tuple will have an
  • incompatible HOT update done to it. We now build the index normally via
  • index_build(), while holding a weak lock that allows concurrent
  • insert/update/delete. Also, we index only tuples that are valid
  • as of the start of the scan (see IndexBuildHeapScan), whereas a normal
  • build takes care to include recently-dead tuples. This is OK because
  • we won’t mark the index valid until all transactions that might be able
  • to see those tuples are gone. The reason for doing that is to avoid
  • bogus unique-index failures due to concurrent UPDATEs (we might see
  • different versions of the same row as being valid when we pass over them,
  • if we used HeapTupleSatisfiesVacuum). This leaves us with an index that
  • does not contain any tuples added to the table while we built the index.
  • Next, we mark the index “indisready” (but still not “indisvalid”) and
  • commit the second transaction and start a third. Again we wait for all
  • transactions that could have been modifying the table to terminate. Now
  • we know that any subsequently-started transactions will see the index and
  • insert their new tuples into it. We then take a new reference snapshot
  • which is passed to validate_index(). Any tuples that are valid according
  • to this snap, but are not in the index, must be added to the index.
  • (Any tuples committed live after the snap will be inserted into the
  • index by their originating transaction. Any tuples committed dead before
  • the snap need not be indexed, because we will wait out all transactions
  • that might care about them before we mark the index valid.)
  • validate_index() works by first gathering all the TIDs currently in the
  • index, using a bulkdelete callback that just stores the TIDs and doesn’t
  • ever say “delete it”. (This should be faster than a plain indexscan;
  • also, not all index AMs support full-index indexscan.) Then we sort the
  • TIDs, and finally scan the table doing a “merge join” against the TID list
  • to see which tuples are missing from the index. Thus we will ensure that
  • all tuples valid according to the reference snapshot are in the index.
  • Building a unique index this way is tricky: we might try to insert a
  • tuple that is already dead or is in process of being deleted, and we
  • mustn’t have a uniqueness failure against an updated version of the same
  • row. We could try to check the tuple to see if it’s already dead and tell
  • index_insert() not to do the uniqueness check, but that still leaves us
  • with a race condition against an in-progress update. To handle that,
  • we expect the index AM to recheck liveness of the to-be-inserted tuple
  • before it declares a uniqueness error.
  • After completing validate_index(), we wait until all transactions that
  • were alive at the time of the reference snapshot are gone; this is
  • necessary to be sure there are none left with a transaction snapshot
  • older than the reference (and hence possibly able to see tuples we did
  • not index). Then we mark the index “indisvalid” and commit. Subsequent
  • transactions will be able to use it for queries.
  • Doing two full table scans is a brute-force strategy. We could try to be
  • cleverer, eg storing new tuples in a special area of the table (perhaps
  • making the table append-only by setting use_fsm). However that would
  • add yet more locking issues.
    */

以上是代码中的官方注释,可以看出整个并发建索引过程中需要两次 table scan:
第一次获取 snapshot1,然后 scan table 中 snapshot1 可见的 heap tuple,据此构建索引,然后将索引标记为可写。这部分代码相对比较容易理解,主要是 scan table 基于 snapshot 判断 heap tuple 的可见性,然后基于 scan 出的 heap tuple,根据索引类型创建索引。代码实现主要在 index_build 中。

以 B-tree 索引为例,核心代码如下:

bt_build
{
    // table scan 
    // 表扫描,基于 snapshot 判断 heap tuple 可见性
    if (RelationIsGlobalIndex(index)) {
        allPartTuples = GlobalIndexBuildHeapScan(heap, index, indexInfo, btbuildCallback, (void*)&buildstate);
    } else {
        reltuples = tableam_index_build_scan(heap, index, indexInfo, true, btbuildCallback, (void*)&buildstate);
    }
    // 按照索引 key 对 tuple 进行排序
    // 基于排完序的 tuple 构建 btree
    _bt_leafbuild(buildstate.spool, buildstate.spool2);
    ...
}

第二次获取 snapshot2,在索引数据中追加 snapshot1 及 snpashot2 之间插入且不在索引中的数据。做法是首先获取当前索引中索引到的所有tids (用的 bulkdelete callback 而不是 index scan,因为前者速度更快,且不是所有的索引都支持 full-index indexscan),然后 scan table 中 snapshot2 可见的所有 heap tuple,获得 tids’,最后 tids’ 和 tids 的差集就是需要在索引中追加的 heap tuple 的 tids。

唯一索引处理起来要更麻烦一些,在一条数据的多个版本时,不应该误报违反唯一原则,这可能需要在发现违反唯一原则的时候重新做一次检查。

这部分代码的实现是 validate_index,这里列出其中的关键代码

validate_index 
{
    ...
    // scan index and gather all the tids into a tuplesort object
    // 这段代码收集索引中的 tids 走的是 vacuum 流程中扫描索引的流程,是按照 physical order 扫描 index pages,
    // 但在 callback 中只是收集 tids 并不会真正删除任何内容
    state.tuplesort = tuplesort_begin_datum(
        TIDOID, TIDLessOperator, InvalidOid, false, u_sess->attr.attr_memory.maintenance_work_mem, false);
    state.htups = state.itups = state.tups_inserted = 0;
    (void)index_bulk_delete(&ivinfo, NULL, validate_index_callback, (void*)&state);
    /* Execute the sort */
    // 按照 tid 大小排序
    tuplesort_performsort(state.tuplesort);
    /*
     * Now scan the heap and "merge" it with the index
     */
    // 第二次 table scan ,每个 scan 出的 tuple, 如果是在 hot-chain 上则是
    // hot-chain 的 root tuple ,在 索引 scan 出的 tuple 中(已经按照 tid 排序)查找,找不到则说明不在索引中,应该追加到索引中。
    // 调用 index_insert 将这个 heap tuple 的索引数据插入索引
    tableam_index_validate_scan(heapRelation, indexRelation, indexInfo, snapshot, &state);
    ...
}

validate_index_heapscan 的主要代码逻辑如下:

validate_index_heapscan
{
    ...
    // 遍历 heap tuple
    while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL) 
    {
        ...
        // 如果在 hot-chain,用 hot-chain 的 root tuple 的 tid 在索引中查找 
        if (HeapTupleIsHeapOnly(heapTuple)) {
            root_offnum = root_offsets[root_offnum - 1];
            Assert(OffsetNumberIsValid(root_offnum));
            ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
        }
        ...
        // 在 索引的 tids 中查找,由于索引的 tids 是有序的,
        // 当 heap tuple 的 tid 小于索引的 tid 继续查找,否则
        // 1. 在索引中找到(tid相等),不需要再插入索引
        // 2. 不在索引中,需要插入
        while (!tuplesort_empty && (!indexcursor || ItemPointerCompare(indexcursor, &rootTuple) < 0)) {
            ...
        }
        // 没有找到对应的 tid,需要插入索引
        if ((tuplesort_empty || ItemPointerCompare(indexcursor, &rootTuple) > 0) && !in_index[root_offnum - 1]) {
            ...
            // 追加索引
            (void)index_insert(indexRelation,
                values,
                isnull,
                &rootTuple,
                heapRelation,
                indexInfo->ii_Unique ? UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
        }
    }
}

本文主要内容是结合代码详解 并发创建索引 过程中第二次 table scan 追加索引部分的实现,希望能对理解这部分的代码有所帮助。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论