Clustering a Table on an Index

You may also like...

5 Responses

  1. Norman Wong says:

    Benefits of clustering depend a lot on how data is accessed. There are some natural places this occurs:
    – If there is RI, then the child tables would benefit by being clustered by the parent key.
    – Many applications do not declare RI, but the relationship in the data still exist
    – Batch processes that do Merge type function. For example, take a sorted input file and update the master table. If both are sorted in the same field, enormous amounts of IO can be eliminated. Dynamic pre-fetch kicks in and all your data pages are ready before you need them.

    In the old days, clustering was also used to avoid hot spots on physical disk. You would pick a very random column for the clustering index. This caused inserts to be spread all over the table. This is not a problem today with SANs.

    There is a penalty to declaring a clustering index. Inserting data takes more work as DB2 tries to find the best place for the row. If inserts are not off the end, then additional reorgs are required to re-cluster the table. It’s a great tool to use, but declaring clustering indexes for no reason hides the cases where clustering is important.

    Norm

  2. Ravi says:

    Ember:

    Brilliant post. I read your article in IBM developer works on a similar topic (You have given the link to it above as well). Also, I was at IDUG listening to your presentation on this topic 🙂 . Now, I have two questions:

    1) When DB2 does not have the distribution statistics, does it assume equal or normal distribution?

    2) Cluster indices need to be maintained with Reorgs which can be expensive on big tables, Do you think an MDC (if appropriate) is better(provided we can spare extra storage)? Especially if we have multiple medium card indices.

    Thanks for the informative blogs as always.

    Ravi

    • Ember Crooks says:

      1) It assumes equal distribution. “Normal” would be a bell curve, and DB2 does not do that.
      2) I love the idea of MDC tables for this kind of thing, should your application allow that. WebSphere Commerce does not allow MDC. Obviously you don’t want to choose a dimension with too high of a cardinality for MDC for fear of wasting space.

  3. Koen Vannisseroy says:

    Hello,

    if i create a table and i create 2 indexes on them and none of them has the cluster keyword. In which order tries db2 to maintain the data?
    Is that in the order of the first index created?

    • Ember Crooks says:

      DB2 does not try to maintain the data in any order. It is likely to be roughly in order of when it was inserted, but that’s not guaranteed. While SQL server may cluster on PK by default, DB2 definitely does NOT.

Leave a Reply

Your email address will not be published. Required fields are marked *