Using TSA/db2haicu to automate failover – Part 2: How it looks if it goes smoothly

This is my second post in my TSA/db2haicu series. The first porst, Using TSA/db2haicu to automate failover – Part 1: The Preparation, is a must-read before trying anything lined out in this post.

What You Should Have Ready After Part 1 of the Series

If you’ve done the preparation properly, you have the following already defined/done:

  1. HADR is set up and running, using hostnames(either fully qualified or short names)
  2. Properly configured Hosts file
  3. preprpnode run on both servers
  4. Public IP addresses for both servers
  5. Fully qualified host names for both servers
  6. IP Address of the Quorum Device
  7. (Optional) Virtual IP address and Subnet mask of that VIP
  8. (Optional) Private IP addresses if they’re being used

In our example, I’ll be using:

  • Fully qualified host names for both servers:
174.13.101.192  spp05db01r 4032312-Prod-db1.adomainl.com
174.13.101.193  spp05db02r 4032313-Prod-db2.adomainl.com
  • IP Address of the Quorum Device
174.13.101.2
  • Virtual IP
 174.13.101.231
  • We are not using a Private Netwok

Actually Running db2haicu on the Standby

The idea here is that db2haicu asks you questions, and you answer them using the information you’ve prepared ahead of time. This seems like a simple approach once you’ve done it a time or two, but can be kind of intimidating the first time.

Ok, so you start out running db2haicu on the standby database server. What it looks like is below, with notes. Inputs are highlighted in red, because they can be hard to pick out.  I’ve taken this output from a real setup that I did, but may have changed ip addresses and host names to protect the innocent:

[db2inst1@4032313-Prod-db2 ~]$ db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file called db2diag.log. Also, you can use the util
ity called db2pd to query the status of the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see the topic called 'DB2 High Availability Ins
tance Configuration Utility (db2haicu)' in the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is db2inst1. The cluster configuration that follows will apply to t
his instance.

db2haicu is collecting information on your current setup. This step may take some time as db2haicu will need to activate all data
bases for the instance to discover all paths ...
When you use db2haicu to configure your clustered environment, you create cluster domains. For more information, see the topic 'C
reating a cluster domain with db2haicu' in the DB2 Information Center. db2haicu is searching the current machine for an existing
active cluster domain ...
db2haicu did not find a cluster domain on this machine. db2haicu will now query the system for information about cluster nodes to
create a new cluster domain ...

db2haicu did not find a cluster domain on this machine. To continue configuring your clustered environment for high availability,
you must create a cluster domain; otherwise, db2haicu will exit.

Create a domain and continue? [1]
1. Yes
2. No
1
Create a unique name for the new domain:
prod_db2ha
Nodes must now be added to the new domain.
How many cluster nodes will the domain prod_db2ha contain?
2
Enter the host name of a machine to add to the domain:
4032312-Prod-db1.adomainl.com
Enter the host name of a machine to add to the domain:
4032313-Prod-db2.adomainl.com
db2haicu can now create a new domain containing the 2 machines that you specified. If you choose not to create a domain now, db2h
aicu will exit.

Create the domain now? [1]
1. Yes
2. No
1
Creating domain prod_db2ha in the cluster ...
Creating domain prod_db2ha in the cluster was successful.
You can now configure a quorum device for the domain. For more information, see the topic "Quorum devices" in the DB2 Information
Center. If you do not configure a quorum device for the domain, then a human operator will have to manually intervene if subsets
of machines in the cluster lose connectivity.

Configure a quorum device for the domain called prod_db2ha? [1]
1. Yes
2. No
1
The following is a list of supported quorum device types:
  1. Network Quorum
Enter the number corresponding to the quorum device type to be used: [1]
1
Specify the network address of the quorum device:
174.13.101.2
Configuring quorum device for domain prod_db2ha ...
Configuring quorum device for domain prod_db2ha was successful.

 

Ok, so the one thing that’s out of the ordinary with this setup,  you will see in the next section. There are two network cards on each of these servers. We will not be using the ones called ‘eth2’, but only the ones called ‘bond0’. To do this through db2haicu, we have to initially say “yes” to adding the network card to a network we’re configuring, and then say “no” to the confirmation. Notice there’s no “no” option in the first place. This is a bit counter-intuitive, but that’s how it works.

 

The cluster manager found 4 network interface cards on the machines in the domain. You can use db2haicu to create networks for th
ese network interface cards. For more information, see the topic 'Creating networks with db2haicu' in the DB2 Information Center.

Create networks for these network interface cards? [1]
1. Yes
2. No
1
Enter the name of the network for the network interface card: eth2 on cluster node: spp05db01r
1. Create a new public network for this network interface card.
2. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card eth2 on cluster node spp05db01r to the network db2_public_network_0? [1]
1. Yes
2. No
2
Enter the name of the network for the network interface card: bond0 on cluster node: spp05db02r
1. Create a new public network for this network interface card.
2. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond0 on cluster node spp05db02r to the network db2_public_network_0? [1]
1. Yes
2. No
1
Adding network interface card bond0 on cluster node spp05db02r to the network db2_public_network_0 ...
Adding network interface card bond0 on cluster node spp05db02r to the network db2_public_network_0 was successful.
Enter the name of the network for the network interface card: eth2 on cluster node: spp05db02r
1. db2_public_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
2
Are you sure you want to add the network interface card eth2 on cluster node spp05db02r to the network db2_public_network_1? [1]
1. Yes
2. No
2
Enter the name of the network for the network interface card: bond0 on cluster node: spp05db01r
1. db2_public_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond0 on cluster node spp05db01r to the network db2_public_network_0? [1]
1. Yes
2. No
1
Adding network interface card bond0 on cluster node spp05db01r to the network db2_public_network_0 ...
Adding network interface card bond0 on cluster node spp05db01r to the network db2_public_network_0 was successful.
Retrieving high availability configuration parameter for instance db2inst1 ...
The cluster manager name configuration parameter (high availability configuration parameter) is not set. For more information, se
e the topic "cluster_mgr - Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the hi
gh availability configuration parameter?
The following are valid settings for the high availability configuration parameter:
  1.TSA
  2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance db2inst1 to TSA.
Adding DB2 database partition 0 to the cluster ...
Adding DB2 database partition 0 to the cluster was successful.
Do you want to validate and automate HADR failover for the HADR database WC005P01? [1]
1. Yes
2. No
1
Adding HADR database WCSP01 to the domain ...
The HADR database WCSP01 has been determined to be valid for high availability. However, the database cannot be added to the cl
uster from this node because db2haicu detected this node is the standby for the HADR database WCSP01. Run db2haicu on the prima
ry for the HADR database WCSP01 to configure the database for automated failover.
All cluster configurations have been completed successfully. db2haicu exiting ...

At least once, I’ve seen this final message and thought that there was a failure. This is the message we expect to see.

Actually Running db2haicu on the Primary

Once you’ve gotten db2haicu to run successfully on the Standby, you also need to run it on the primary. Here’s what that looks like:

[db2inst1@4032312-Prod-db1 ~]$ db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file called db2diag.log. Also, you can use the util
ity called db2pd to query the status of the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see the topic called 'DB2 High Availability Ins
tance Configuration Utility (db2haicu)' in the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is db2inst1. The cluster configuration that follows will apply to t
his instance.

db2haicu is collecting information on your current setup. This step may take some time as db2haicu will need to activate all data
bases for the instance to discover all paths ...
When you use db2haicu to configure your clustered environment, you create cluster domains. For more information, see the topic 'C
reating a cluster domain with db2haicu' in the DB2 Information Center. db2haicu is searching the current machine for an existing
active cluster domain ...
db2haicu found a cluster domain called prod_db2ha on this machine. The cluster configuration that follows will apply to this doma
in.

Retrieving high availability configuration parameter for instance db2inst1 ...
The cluster manager name configuration parameter (high availability configuration parameter) is not set. For more information, se
e the topic "cluster_mgr - Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the hi
gh availability configuration parameter?
The following are valid settings for the high availability configuration parameter:
  1.TSA
  2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance db2inst1 to TSA.
Adding DB2 database partition 0 to the cluster ...
Adding DB2 database partition 0 to the cluster was successful.
Do you want to validate and automate HADR failover for the HADR database WCSP01? [1]
1. Yes
2. No
1
Adding HADR database WCSP01 to the domain ...
Adding HADR database WCSP01 to the domain was successful.
Do you want to configure a virtual IP address for the HADR database WCSP01? [1]
1. Yes
2. No
1
Enter the virtual IP address:
174.13.101.231
Enter the subnet mask for the virtual IP address 174.13.101.231: [255.255.255.0]
255.255.255.0
Select the network for the virtual IP 174.13.101.231:
1. db2_public_network_0
Enter selection:
1
Adding virtual IP address 174.13.101.231 to the domain ...
Adding virtual IP address 174.13.101.231 to the domain was successful.
All cluster configurations have been completed successfully. db2haicu exiting ...

 Verification After Running db2haicu

There are several ways to look at things to see how tsa is functioning. You’ll get intimately familiar with these if you don’t follow the proper procedures for stopping/starting a db2haicu/tsa cluster(as defined in 7.4 of this white paper: http://download.boulder.ibm.com/ibmdl/pub/software/dw/data/dm-0908hadrdb2haicu/HADR_db2haicu.pdf). But when I’m starting tsa/db2haicu up for the first time, I always copy the output of these to my build document so that if there are problems later, I can go back and see if they were always there or if they were introduced later. First is my favorite method of looking at things, though it does require root:

[root@4032312-Prod-db1 ~]# lssam
Online IBM.ResourceGroup:db2_db2inst1_4032312-Prod-db1.adomainl.com_0-rg Nominal=Online
        '- Online IBM.Application:db2_db2inst1_4032312-Prod-db1.adomainl.com_0-rs
                '- Online IBM.Application:db2_db2inst1_4032312-Prod-db1.adomainl.com_0-rs:spp05db01r
Online IBM.ResourceGroup:db2_db2inst1_4032313-Prod-db2.adomainl.com_0-rg Nominal=Online
        '- Online IBM.Application:db2_db2inst1_4032313-Prod-db2.adomainl.com_0-rs
                '- Online IBM.Application:db2_db2inst1_4032313-Prod-db2.adomainl.com_0-rs:spp05db02r
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_WCSP01-rg Nominal=Online
        |- Online IBM.Application:db2_db2inst1_db2inst1_WCSP01-rs
                |- Online IBM.Application:db2_db2inst1_db2inst1_WCSP01-rs:spp05db01r
                '- Offline IBM.Application:db2_db2inst1_db2inst1_WCSP01-rs:spp05db02r
        '- Online IBM.ServiceIP:db2ip_174_13_101_231-rs
                |- Online IBM.ServiceIP:db2ip_174_13_101_231-rs:spp05db01r
                '- Offline IBM.ServiceIP:db2ip_174_13_101_231-rs:spp05db02r

One of the nice things about this method is that assuming you’re on Linux, this will be color-coded on your screen, with problems showing up in yellow or red, which is nice at a glance, and when you’re getting to know this stuff. If you have something showing “Pending-Online” status, you may be in trouble. I plan to write a complete post on this end of things in this series, so be on the lookout for that. The above is how it should look if things are OK.

Another way to check that I have yet to get used to looking at and interpreting can be done from the db2 instance owner’s command line:

[db2inst1@4032312-Prod-db1 ~]$ db2pd -ha
           DB2 HA Status
Instance Information:
Instance Name                  = db2inst1
Number Of Domains              = 1
Number Of RGs for instance     = 2

Domain Information:
Domain Name                    = prod_db2ha
Cluster Version                = 2.5.1.4
Cluster State                  = Online
Number of nodes                = 2

Node Information:
Node Name                     State
---------------------         -------------------
4032313-Prod-db2.adomainl.com Online
4032312-Prod-db1.adomainl.com Online

Resource Group Information:
Resource Group Name            = db2_db2inst1_db2inst1_WCSP01-rg
Resource Group LockState       = Unlocked
Resource Group OpState         = Online
Resource Group Nominal OpState = Online
Number of Group Resources      = 2
Number of Allowed Nodes        = 2
   Allowed Nodes
   -------------
   4032312-Prod-db1.adomainl.com
   4032312-Prod-db2.adomainl.com
Member Resource Information:
   Resource Name                  = db2_db2inst1_db2inst1_WCSP01-rs
   Resource State                 = Online
   Resource Type                  = HADR
   HADR Primary Instance          = db2inst1
   HADR Secondary Instance        = db2inst1
   HADR DB Name                   = WCSP01
   HADR Primary Node              = 4032312-Prod-db1.adomainl.com
   HADR Secondary Node            = 4032313-Prod-db2.adomainl.com

   Resource Name                  = db2ip_174_13_101_231-rs
   Resource State                 = Online
   Resource Type                  = IP

Resource Group Name            = db2_db2inst1_4032312-Prod-db1.adomainl.com_0-rg
Resource Group LockState       = Unlocked
Resource Group OpState         = Online
Resource Group Nominal OpState = Online
Number of Group Resources      = 1
Number of Allowed Nodes        = 1
   Allowed Nodes
   -------------
   4032312-Prod-db1.adomainl.com
Member Resource Information:
   Resource Name                  = db2_db2inst1_4032312-Prod-db1.adomainl.com_0-rs
   Resource State                 = Online
   Resource Type                  = DB2 Partition
   DB2 Partition Number           = 0
   Number of Allowed Nodes        = 1
      Allowed Nodes
      -------------
      4032312-Prod-db1.adomainl.com

Network Information:
Network Name                  Number of Adapters
-----------------------       ------------------
db2_public_network_0          2

   Node Name                     Adapter Name
   -----------------------       ------------------
   spp05db02r                    bond0
   spp05db01r                    bond0

Quorum Information:
Quorum Name                                  Quorum State
------------------------------------         --------------------
Fail                                         Offline
db2_Quorum_Network_174_13_101_2:16_28_43     Online
Operator                                     Offline

 

My next post in this series will cover some of the errors and some different things to try.

Other Posts In This Series

This series consists of four posts:
Using TSA/db2haicu to automate failover – Part 1: The Preparation
Using TSA/db2haicu to automate failover – Part 2: How it looks if it goes smoothly
Using TSA/db2haicu to Automate Failover Part 3: Testing, Ways Setup can go Wrong and What to do.
“Using TSA/db2haicu to automate failover Part 4: Dealing with Problems After Setup

Search this blog on “TSA” for other posts on TSA issues and tips.

You may also like...

10 Responses

  1. Leopoldo says:

    Hi
    Excelent article, congratulations!!

    Best regards from mexico

  2. Frank Ning says:

    Expecting your part 3. (I wrote the db2cptsa:-))

    • Ember Crooks says:

      Funny, I was just working on it yesterday after months of neglect. It is coming. It seems that about every 3-4 months I end up doing or helping with about 3 HADR/TSA setups in less than an month, and then nothing for another 3-4 months. Strange pattern.

  3. Dieter says:

    Hi Ember,
    when will a TSA troubleshooting part be available?
    Regards
    Dieter

    • Ember Crooks says:

      I hope to finish writing it the last week of August. I’m not quite keeping up due to a vacation that I start tomorrow.

  4. Gerry says:

    I too look forward to part 3. I’m doing a HADR/TSA POC (first time trying TSA) and it’s fighting me every step! Latest issue is that the db2haicu just stops here:
    Create the domain now? [1]
    1. Yes
    2. No
    1
    Creating domain hadr_melaitludbpp01_domain in the cluster …
    Creating domain hadr_melaitludbpp01_domain in the cluster was successful.
    Without progressing to the Quorum.
    If I try to rerun it errors and if I try db2haicu -delete it errors a la…
    FUNCTION: DB2 UDB, oper system services, sqloInvokeVendorFunction, probe:50
    MESSAGE : ZRC=0x870F0009=-2029060087=SQLO_EOF “the data does not exist”
    DIA8506C Unexpected end of file was reached.
    DATA #1 : String, 46 bytes
    Pipe read from vendor process was interrupted.

    • Ember Crooks says:

      I hope to finish writing it the last week of August. I’m a bit behind at the moment. Double check in excruciating detail the prep work that I outlined in part 1. There are scripts that come with each fixpack and the base code for uninstalling and reinstalling sam – I’ve had good luck with them.

  5. kapik says:

    Hi,

    I have setup the HaDR and TSA in the same way as in your article. Everything seems fine, but I am unable to perform a takeover.. I tried using db2 takeover, and rgreq ,but both failed. Actually db2 takeover is at least trying to swich roles, but after switching in shows and error and Primary comes back to the previous node.
    Takeover was working fine before configuring TSA. During configuration of TSA takeover was successful as well, but it was the last time….
    Do you have any suggestions

    • Ember Crooks says:

      If you’re on 10.5, make sure you’re on Fixpack 5. There were some issues in earlier fixpacks.

      Also, call IBM support – they should be able to help you. I have seen this issue before at my old employer, but left before it was resolved. I think it was attributed to Fixpack 3 or 4 on 10.5 in that scenario.

  1. May 29, 2014

    […] series consists of four posts: Using TSA/db2haicu to automate failover – Part 1: The Preparation Using TSA/db2haicu to automate failover – Part 2: How it looks if it goes smoothly Using TSA/db2haicu to Automate Failover Part 3: Testing, Ways Setup can go Wrong and What to do. […]

Leave a Reply

Your email address will not be published. Required fields are marked *