JL Computer Consultancy
Howard Rogers says I’m wrong !!!!! ** |
Oct 2004 |
I’ve just come across an article by Howard Rogers discussing methods of identifying useful, or useless, indexes. At one point he quotes the following paragraph from my book (Practical Oracle 8i) pointing out that the theory it proposes is totally sound – but follows up with two test cases showing that the theory doesn’t work in practice:
Rebuild
suspect indexes in their own tablespaces and leave them there for a few weeks,
checking v$filestat regularly. If
the tablespaces appear to experience a high proportion of writes (more than
approximately 30%) then there is a chance that the index is not used for data
access.
At this point I could become incoherent with rage and threaten to sue, of course. Or perhaps I could publish an article on the Internet about some wet behind the ears, fresh out of college, student with no practical experience complaining about a piece of advice I wrote 20 years ago when everything was different – it would be a slight distortion of the truth of course, but who’s going to know ? (See the second paragraph of this article, for example, which appeared just a few short weeks after this exchange, you can’t help wondering if that “brand new Oracle DBA” was Howard Rogers.).
On the other hand, Howard hasn’t just suggested that I’m wrong – he’s provided a couple of test cases to show exactly how the theory fails to deliver – so the only rational response is to examine the proof. After all, I have been known to make mistakes, and I’d prefer to find out about my errors when I make them, and not just go on repeating them mindlessly. (And if Howard thinks that there’s an issue that needs to be addressed, it’s probably worth checking).
Test Case 1:
After building a table with a few thousand rows, and adding two indexes (one high-precision one, and one low-precision one), and demonstrating that Oracle uses one but not the either in a simple query involving equality on the indexed columns, the test case gives us a procedure to run, with sample results that show remarkably little difference in the I/O patterns on the two tablespaces holding the indexes. My quoted suggestion does, indeed, seem to be irrelevant.
In fact, Howard goes on to complete the extract from my book, quoting the comment:
It is
possible that the index is used so frequently that it is read and permanently
buffered
And then he points out that his test case targets this issue fairly precisely and that he is proving the correctness of the caveat, rather than an inherent error in the theory. This is close to true, but as I read the article two thoughts immediately came to my mind:
First – the second sentence he has quoted was sufficiently sloppy as to be nearly meaningless – or at best incomprehensibly ambiguous. It should have said something like: “It is possible that a good index is used so frequently that virtually every block is physically read once and permanently buffered thereafter. Since updated blocks are always eventually written, a good index could therefore appear to be a bad index if this suggestion is applied thoughtlessly”.
Secondly - there is a defect with the test case: it didn’t run for the few weeks required by my original suggestion; it also happens to be a very small test case with no concurrent activity. This, of course, highlights another defect in the original statement – it was a casual throwaway with no explanation offered and no indications of assumptions made about the underlying system. So with his open criticism and test case, Howard Rogers has brought into sharp relief the need to produce a relevant test case, or at least an explanation of why time, scale, and concurrency may make a difference – and also indicated how much else you have to know about the system before you apply this test.
(One very common problem of short tests, of course, is that dbwr only wakes up occasionally – for example, there is a timeout every three seconds – to see if it should write to disc. And even on the three-second timeout there may be no requirement to write dirty blocks to disk, so a short test is unlikely to produce a realistic picture of how disk-writes would appear in a steady-state production system).
Test Case 2:
Again, the test case is based on a mechanically generated test set and comes with a procedure that can be run to emulate some work. Again the sample results show that my original suggestion is inappropriate for this test case. Obviously this test has the same defect as the previous test – it doesn’t last a few weeks; but this time we can be fairly confident that we could keep a test like this running for a few weeks and see no material difference in the results.
So what’s wrong with this test? Nothing! It’s just picked another boundary condition where my suggestion breaks down. In this case, it has broken down “the other way round”. The index which is the “useful” index is showing the characteristic high percentage of writes that I’ve given as a possible indicator of “bad” indexes – and it’s not due to the buffering that I mentioned as a possibility in the book.
Of course, if you look carefully at the test, you realise that something a little unusual is going on. For every row the code selects, it changes every single index entry in the corresponding index twice (once on the update, once on the rollback). The hypothesis underlying my original suggestion is that the I/O characteristics of good and bad indexes will be different, and in this test case, the two indexes show exactly the same I/O characteristic. But there is a reason for that – it’s because both the indexes are probably bad indexes.
Again, there is an underlying assumption that I did not state in the book. When you create indexes, you are always making a trade-off between the benefit you get from the indexed query path and the cost of maintaining that index. Even if an index appears to be a perfect way to resolve a query, you might decide that it should not exist because the maintenance cost is too high.
In this test case we have an index which is perfect for the query, but the cost/benefit analysis suggests that we definitely do not want it to exist. Ideally, we should have worked that one out before we ever got to fiddling with tablespaces – even so, the tablespace test helps to confirm our assumption: the I/O figures do indeed suggest that the cost of maintaining the index is large compared to the use we make of the index. The index really is “useless”, even though (to paraphrase my original statement) the index is used for data access.
Conclusion
By referencing this paragraph from my book, Howard has helped other people who may have applied the suggestion in the wrong circumstances and taken some inappropriate action as a consequence – the criticism was therefore a good thing.
By supplying test cases to demonstrate the point he has also given me the chance to review what I said and how I said it and made me realise that the statement needed further explanation, better wording, and some justification – the criticism was therefore well done.
By supplying reproducible test cases, he has given me the opportunity to examine the quality and validity of the tests, and explain why they are examples where the trick of moving indexes into their own tablespaces wouldn’t really help you come to any conclusion about whether the index is useful or not – the criticism has therefore led to further enlightenment (pardon the pretentiousness, I couldn’t think of a better word).
The ultimate effect of Howard Rogers’ article is that some more people will have a better understanding of how Oracle works. The fact that he has highlighted some inadequacy in my work doesn’t upset me – I’m not perfect and I know I make mistakes, but my blood-pressure is good, I can cope with criticism, and I’d rather be corrected than carry on repeating misleading statements.
And, in passing, I don’t think Howard let me know that he had written this article – but that didn’t bother me. Think about it this way: Howard can only pose a threat to my reputation if he is good at what he does – but if he is good at what he does, then what he says is worth hearing even (especially) if he says I’ve made a mistake. On the other hand if I thought he wasn’t any good at what he does then I wouldn’t care about what he said anyway.
So keep on reviewing, Howard. When you’re right I’ll applaud you and correct my mistakes, when you’re wrong I’ll explain why.
** Footnote: See Maskerade by Terry Pratchett for commentary on people who use too many exclamation marks.