正如我们在索引激增1中所见,我们会在索引的叶子块中失去大量的空间,因为即使在相当低的并发级别上,Oracle也可以引入大量(冗余的?)的ITL条目到叶块中,并且一旦一个叶块有大量的ITL条目,那么每次叶块分裂时,整个ITL都会复制到两个目标块中(译者注:因叶块分裂而产生了两个新叶子块),因此一次短暂的事故可能会产生重大的长期影响。
但是,发生了什么,我们要如何做?
第一件事件是要注意三个条件同时出现时的影响:
CPU是重载的,几个事务同时命中到了一个索引的叶子块上,而索引叶子块必须分裂以容纳这些事务。
我在索引激增1中所创建的样例,使得这种影响非常明显–我运行的并发进程超过了我拥有的CPU数,我使用pl/sql循环持续的插入数据,并且在“高值”端以基于序列的索引为目标。如果系统负载不重,或者没有任何基于时间或序列的索引,则可能永远不会看到此问题。
实际上,与Oracle的许多其他特性一样,您可能会在没有注意到的情况下稍微受到这个问题的困扰。当我开始研究这个问题并看到明显“坏了”的叶块转储时,让我想起来,我以前见过这个问题——大约15年前,在一个可能是Oracle7.3或更早版本的系统上。从那以后,我就再也没有看到过因这类问题索引,而导致的性能问题。
那么,你是如何处理这类问题的呢?以前,我常常在没有时间去弄清楚为什么会坏掉的情况下修复一些东西,并且记录下为什么修复是合适的,答案很明显:有些东西使得ITL在不需要增长的情况下变得非常大——所以需要在索引上设置maxtrans为一个表示最坏情况并发的合理值。
如果这样做,就必须coalesce或rebuild索引一次,以消除极端的ITL,而且设置maxtrans可能会在Oracle本应使用ITL时引入一些意外的争用,但maxtrans解决了眼前的问题。
自应用这个修改后,已经过了很长一段时间了,但是我不记得有人在我做了这个修正后抱怨过,而且它似乎对那个版本的Oracle应用程序没有任何负面影响,但是我们将在以后的贴子中讨论它的副作用以及它是否是最好(或正确)的。首先,让我们看看将maxtrans设置为9对我的测试用例的影响——再次在有两题CPU,运行在XP pro的笔记本上,在10.2.0.3上运行8个并发进程的测试。
这里有几个结果的比较–我在设置maxtrans和不设置maxtrans的情况下运行了三次测试,并记录了各种统计数据、时间和大小(出于各种原因,我在这次测试中有110008行):
With maxtrans set: ------------------ Average DB Time for a process (seconds) 31.59 Average redo size for a process (bytes) 14,088,953 Average leaf blocks 409 Average leaf block len (bytes) 7,832 Index height (blevel + 1) 2 Without setting maxtrans ------------------------ Average DB Time for a process (seconds) 43.23 Average redo size for a process (bytes) 15,211,463 (15.3MB / 14.9MB) Average leaf blocks 743 (671 / 780) Average leaf block len (bytes) 4,176 (3992 / 4544) Index height (blevel + 1) 3
复制
你会注意到我已经为“no maxtrans”测试提供了一对数字。a其中一次运行比另两次运行更不容易造成灾难,它向您提供了一些提示,说明如何通过时间上的微小变化,甚至是连接到实例时碰巧使用的进程ID的变化,来获得显著的结果变化。
如您所见:在我的特定案例,使用我的特定数据集等情况下。当我将maxtrans设置为9时,索引表现得非常好。理想情况下,我希望看到一个基于序列的索引以100%打包(在低并发级别时发生)的方式运行,但是这个索引以67%打包的方式运行,这比典型的B树索引的平均值略低。不过,如果没有maxtrans的限制值,该指数就会退化到35%的块平均利用率左右——索引会比其需要的(空间)在物理上要大得多。
当然,时间可能是最重要的考虑因素,在这个测试中,大多数时间差异是由于log file parallel writes造成的,而一些时间则是由于buffer busy waits等待造成的。redo大小增加了8%,这主要是由于在每个叶块拆分期间,因拆分次数的增加而生成的额外redo。
在这种情况下,特别是在一段时间的预热之后,最好的解决方案是简单地将索引重新创建为一个反向键索引,并将并发性分布到更多的叶块中。这是一个唯一的基于序列的索引,索引的唯一用途是回答查询“select single_row where id=constant”,因此键反转不会改变数据检索速度(前提是允许 buffer cache中有额外的空间来保持索引缓冲)
如果你的索引是基于时间的,那么您可能不希望采用这种方法,因为你更可能运行覆盖时间范围的查询,而且Oracle不会在查询“select many_rows where reversed_column between constant1 and constant2”中使用反转键索引,因此设置maxtrans可能是您的最佳选择。
话虽如此,我可以肯定地说,我偶尔会遇到这类有并发插入的基于时间的索引,所以您可能会想,为什么我没有经常看到这种浪费空间的现象。我会在以后的文章中告诉你为什么。同时,如果你不得不忍受这样的索引,我将讨论最小化潜在风险的策略,我还将告诉你我如何在10g的索引上设置maxtrans…它会忽略maxtrans!
原文地址链接及内容如下:
https://jonathanlewis.wordpress.com/2009/08/11/index-explosion-2/
Index Explosion 2
Filed under: Index Explosion,Indexing,Infrastructure,Performance,Troubleshooting — Jonathan Lewis @ 7:27 pm BST Aug 11,2009
As we saw in part 1 of Index Explosion, we can lose a lot of space in index leaf blocks because
at even fairly low levels of concurrency Oracle can manage to introduce a large number of (redundant ?) ITL entries into a leaf block and …
once a single leaf block has a large number of ITL entries the entire ITL is copied into both target blocks every time the leaf block splits – so one brief accident can have a significant long term effect.
But what’s happening, and what should we do about it ?
The first thing to notice is that the effect appears when three conditions appear simultaneously:
the CPU is heavily loaded,
several transactions hit a single leaf block in the index at the same time,
the index leaf block has to split to handle those transactions
The example I created for part 1 made the effect highly visible – I ran far more concurrent processes than I had CPUs, I was using pl/sql loops to insert data continuously, and I was targetting a sequence-based index at the “high values” end. If your system isn’t heavily loaded, or if you don’t have any time-based or sequence-based indexes, you may never see this problem.
In fact, as with many other features of Oracle, you may be suffering from this problem to a slight degree without even noticing it. When I started investigating the issue and saw the leaf block dump that had obviously “gone bad” it reminded me that I had seen the problem before – about 15 years ago on a system that was probably Oracle 7.3 or earlier. Since then I’ve never seen a performance issue that led me back to an index with this problem.
So how do you deal with the problem ? Back in the days when I used to fix things without having time to work out why they had got broken and documenting why the fix was appropriate the answer was obvious: something was making the ITL grow very large when it should not need to grow – so set maxtrans on that index to a reasonable value that represented the worst case concurrency.
If you do this, you have to coalesce or rebuild the index just once to get rid of the extreme ITLs, and it’s just possible that setting maxtrans will introduce some unexpected contention at the moments when Oracle would otherwise have run away with the ITL – but maxtrans fixes the immediate problem.
It’s a long time since I applied that fix, but I don’t recall anyone complaining after I did it and it didn’t seem to have any negative side effects for that application on that version of Oracle, but we’ll talk about side effects and whether it’s the best (or right) thing to do in a future note. First let’s look at the impact of setting maxtrans to nine on my test case – again running the test on 10.2.0.3 using eight concurrent processes on a laptop with two CPUs running XP Pro.
Here are a few results to compare – I ran the test three times with and without setting maxtrans, and noted various statistics, times, and sizes (for various reasons I had 110,008 rows in this test run):
With maxtrans set: ------------------ Average DB Time for a process (seconds) 31.59 Average redo size for a process (bytes) 14,088,953 Average leaf blocks 409 Average leaf block len (bytes) 7,832 Index height (blevel + 1) 2 Without setting maxtrans ------------------------ Average DB Time for a process (seconds) 43.23 Average redo size for a process (bytes) 15,211,463 (15.3MB / 14.9MB) Average leaf blocks 743 (671 / 780) Average leaf block len (bytes) 4,176 (3992 / 4544) Index height (blevel + 1) 3
复制
You’ll notice that I’ve included pairs of figures for the “no maxtrans“ test. One run was less of a disaster than the other two, giving you some indication of how you can get significant variation in results from a tiny variation in timing, or even a variation in the process ID that you happen to use when you connect to the instance.
As you can see: in my specific case, with my specific data set etc. the index behaved quite nicely when I set maxtrans to nine. Ideally I’d like to see a sequence-based index running at 100% packing – which happens at low levels of concurrency – but this one ran at about 67% packing, which is a little below average for a typical B-tree index. Without a limiting value for maxtrans, though, the index degenerated to something like 35% average utilisation in a block – the index was physically much larger than it needed to be.
Time, of course, is likely to be the most important consideration and in this test it happened that most of the time difference was due to log file parallel writes, with some time lost due to buffer busy waits. The 8% increase in the redo size was basically due to the extra redo generated during each leaf block split combined with an increase in the number of splits.
In this case, especially after a little warm-up time, the best solution to the problem is simply to recreate the index as a reverse-key index and spread the concurrency across more leaf blocks. It’s a unique sequence-based index and the only use for the index is to answer the query “select single_row where id = constant” so key reversal isn’t going to change the data retrieval speed (provided you allow extra space in the buffer cache to keep the index buffered.)
If your index is time-based you might not want to take this approach because you’re more likely to want to run queries that cover a time range – and Oracle won’t use a reverse-key index on the query “select many_rows where reversed_column between constant1 and constant2” – so setting maxtrans may be your best bet.
Having said that, I can safely say that I have occasionally come across indexes that are exactly this type of time-based index with concurrent inserts – so you might wonder why I haven’t seen the space-wasting phenomenon more often. I’ll tell you why in a future article. At the same time, I’ll talk about strategies for minimising the potential for damage if you have to put up with an index like this, and I’ll also tell you how I set maxtrans on an index in 10g … which ignores maxtrans!