HADR Tools: the HADR Simulator

I have not made extensive use of the HADR Tools that IBM offers in the past. Most of my HADR setups to date have either been same-data-center using NEARSYNC or have used ASYNC to copy data between data centers. I haven’t had much cause to tweak my network settings or change my SYNCMODE settings based on hardware/networking.

However, I have a chance to make use of these tools in several scenarios now, so I thought I would share what I’m finding. I do not claim to be the foremost expert on these tools. And there is an incredible amount of details on them available from IBM. For the full technical specifications and details on using the HADR tools, see:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/
http://www.ibm.com/developerworks/data/library/techarticle/dm-1310db2luwhadr/index.html?ca=dat-

I thought I would share my own journey with these tools to help others. Comments, corrections, additions, are all welcome in the comments form below.

What are the HADR tools?

IBM provides three major HADR tools on a devloperWorks wiki site.

The HADR Simulator is used to look both at disk speed and network details around HADR. It can be used in several different ways, including helping you to troubleshoot the way HADR does name resolution.

The DB2 Log Scanner is used to look at log files and report details about your DB2 Workload. The output is a bit cryptic, and this tool is best used in conjunction with the HADR Calculator. This does require real log files from a real workload, so if you’re setting up a new system, you will need to have actual work on the system before you can use it. Also, IBM will not provide the tool they use internally to uncompress automatically compressed log files, so if you want to use it, you’ll have to turn automatic log compression off. I tried to get the tool, they would not give it to me.

The HADR Calculator takes input from the DB2 Log Scanner, and values that you can compute using the HADR Simulator, and tells you which HADR SYNCMODEs make the most sense for you.

These three tools do NOT require that you have DB2 on a server to run – they are fully standalone. There are versions of the first two for each operating system. The third requires that you have perl, but can be run anywhere, including on a laptop or personal computer. This allows you flexibility in considering details of a network or server you are thinking of using before actually using it. And allows you to analyze log files without adding workload to a server.

Using the HADR Simulator

In this post, I’m going to focus on the HADR simulator.

First of all, the download and details can be found at: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20simulator. Note there are some child pages and links there with good detail there.

The HADR Simulator is a stand-alone tool. This means that you do not need DB2 on the servers in question. It is a binary executable. To use it, you simply download it from the link above to one or more servers. You can simulate primary-standby network interaction by running it on two servers at the same time. You can also run it on one server alone to look at things like disk performance.

Simulating HADR with the HADR Simulator

To use it in the main intended way, you download the right version for your OS, place it on each of the severs in question, make sure you have execute permission on it and execute it like this:
Primary:

 simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -role primary -syncmode NEARSYNC -t 60

Standby:

simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -role standby

The ports in the above should be the ports you plan to use for HADR. However, you cannot use the same ports that HADR is currently running on if you happen to already be running HADR on the servers. If you try that, you will get output like this:

+ simhadr -lhost host1 -lport 18819 -rhost host2 -rport 18820 -role primary -syncmode NEARSYNC -t 60

Measured sleep overhead: 0.000004 second, using spin time 0.000004 second.
flushSize = 16 pages

Resolving local host host1 via gethostbyname()
hostname=host1
alias: host1.domain
address_type=2 address_length=4
address: 000.000.000.000

Resolving remote host host2 via gethostbyname()
hostname=host2
alias: host2.domain
address_type=2 address_length=4
address: 000.000.000.000

Socket property upon creation
BlockingIO=true
NAGLE=true
TCP_WINDOW_SCALING=32
SO_SNDBUF=262144
SO_RCVBUF=262144
SO_LINGER: onoff=0, length=0

Binding socket to local address.
bind() failed on local address. errno=67, Address already in use

You should be passing in the host names as you would use them with HADR. This allows the tool to show you how the names are resolving. The HADR calculator can be used for that purpose alone if you’re having name resolution issues. The ports that you pass in must be numbers – /etc/services or its equivalent is not consulted for port names if you’re using that.

The output from the HADR Simulator, invoked using the syntax above looks something like this:
Primary:

+ simhadr -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -role primary -syncmode NEARSYNC -t 60

Measured sleep overhead: 0.000004 second, using spin time 0.000004 second.
flushSize = 16 pages

Resolving local host host1.domain via gethostbyname()
hostname=host1.domain
alias: host1.domain.local
address_type=2 address_length=4
address: 000.000.000.000

Resolving remote host host2.domain via gethostbyname()
hostname=host2.domain
alias: host2.domain.local
address_type=2 address_length=4
address: 000.000.000.000

Socket property upon creation
BlockingIO=true
NAGLE=true
TCP_WINDOW_SCALING=32
SO_SNDBUF=262144
SO_RCVBUF=262144
SO_LINGER: onoff=0, length=0

Binding socket to local address.
Listening on local host TCP port 18821

Connected.

Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
TCP_WINDOW_SCALING=32
SO_SNDBUF=262088
SO_RCVBUF=262088
SO_LINGER: onoff=0, length=0

Sending handshake message:
syncMode=NEARSYNC
flushSize=16
connTime=2014-06-15_18:24:42_UTC

Sending log flushes. Press Ctrl-C to stop.

NEARSYNC: Total 18163171328 bytes in 60.000131 seconds, 302.718861 MBytes/sec
Total 277148 flushes, 0.000216 sec/flush, 16 pages (65536 bytes)/flush

Total 18163171328 bytes sent in 60.000131 seconds. 302.718861 MBytes/sec
Total 277148 send calls, 65.536 KBytes/send,
Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion

Total 4434368 bytes recv in 60.000131 seconds. 0.073906 MBytes/sec
Total 277148 recv calls, 0.016 KBytes/recv

Distribution of log write size (unit is byte):
Total 277148 numbers, Sum 18163171328, Min 65536, Max 65536, Avg 65536
Exactly      65536      277148 numbers

Distribution of log shipping time (unit is microsecond):
Total 277148 numbers, Sum 59711258, Min 175, Max 3184, Avg 215
From 128 to 255               263774 numbers
From 256 to 511                13335 numbers
From 512 to 1023                  23 numbers
From 1024 to 2047                 15 numbers
From 2048 to 4095                  1 numbers

Distribution of send size (unit is byte):
Total 277148 numbers, Sum 18163171328, Min 65536, Max 65536, Avg 65536
Exactly      65536      277148 numbers

Distribution of recv size (unit is byte):
Total 277148 numbers, Sum 4434368, Min 16, Max 16, Avg 16
Exactly         16      277148 numbers

Standby:

+ simhadr -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -role standby

Measured sleep overhead: 0.000004 second, using spin time 0.000004 second.

Resolving local host host2.domain via gethostbyname()
hostname=host2.domain
alias: host2.domain.local
address_type=2 address_length=4
address: 000.000.000.000

Resolving remote host host1.domain via gethostbyname()
hostname=host1.domain
alias: host1.domain.local
address_type=2 address_length=4
address: 000.000.000.000

Socket property upon creation
BlockingIO=true
NAGLE=true
TCP_WINDOW_SCALING=32
SO_SNDBUF=262144
SO_RCVBUF=262144
SO_LINGER: onoff=0, length=0

Connecting to remote host TCP port 18821
connect() failed. errno=79, Connection refused
Retrying.

Connected.

Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
TCP_WINDOW_SCALING=32
SO_SNDBUF=262088
SO_RCVBUF=262088
SO_LINGER: onoff=0, length=0

Received handshake message:
syncMode=NEARSYNC
flushSize=16
connTime=2014-06-15_18:24:42_UTC

Standby receive buffer size 64 pages (262144 bytes)
Receiving log flushes. Press Ctrl-C on primary to stop.
Zero byte received. Remote end closed connection.

NEARSYNC: Total 18163171328 bytes in 59.998903 seconds, 302.725057 MBytes/sec
Total 277148 flushes, 0.000216 sec/flush, 16 pages (65536 bytes)/flush

Total 4434368 bytes sent in 59.998903 seconds. 0.073907 MBytes/sec
Total 277148 send calls, 0.016 KBytes/send,
Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion

Total 18163171328 bytes recv in 59.998903 seconds. 302.725057 MBytes/sec
Total 613860 recv calls, 29.588 KBytes/recv

Distribution of log write size (unit is byte):
Total 277148 numbers, Sum 18163171328, Min 65536, Max 65536, Avg 65536
Exactly      65536      277148 numbers

Distribution of send size (unit is byte):
Total 277148 numbers, Sum 4434368, Min 16, Max 16, Avg 16
Exactly         16      277148 numbers

Distribution of recv size (unit is byte):
Total 613860 numbers, Sum 18163171328, Min 376, Max 65536, Avg 29588
From 256 to 511                  166 numbers
From 1024 to 2047              55614 numbers
From 2048 to 4095               8845 numbers
From 4096 to 8191              18028 numbers
From 8192 to 16383             34458 numbers
From 16384 to 32767           227758 numbers
From 32768 to 65535           264416 numbers
From 65536 to 131071            4575 numbers

Ok, that’s great, right, but what do I do with that?

Well, here’s one thing – you can tune your send and recieve buffers using this information. Run this process several times using different values for those like this:

./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 65536 -sockRcvBuf 65536 -role primary -syncmode NEARSYNC -t 60
./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 65536 -sockRcvBuf 65536 -role standby
./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 131072 -sockRcvBuf 131072 -role primary -syncmode NEARSYNC -t 60
./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 131072 -sockRcvBuf 131072 -role standby
./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 262144 -sockRcvBuf 262144 -role primary -syncmode NEARSYNC -t 60
./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 262144 -sockRcvBuf 262144 -role standby
./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 524288 -sockRcvBuf 524288 -role primary -syncmode NEARSYNC -t 60
./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 524288 -sockRcvBuf 524288 -role standby

In the line of output that looks like this:

NEARSYNC: Total 14220328960 bytes in 60.000083 seconds, 237.005155 MBytes/sec

Pull out the MBytes per second, and graph it like this:
hadr_throughput

In this example, it is clear that the throughput levels off at a buffer size of 128 K. Your results are likely to vary. To allow additional space, in this example, we would choose values of 256 KB, and set them using this syntax:

db2set DB2_HADR_SOSNDBUF=262144
db2set DB2_HADR_SORCVBUF=262144

This is the kind of thing I might never have gone into detail on if I didn’t blog. And yet it led to me changing parameters used and improving what I’m doing at work.

I am also interested in what I might do with some of the disk information supplied here. I sometimes have trouble getting disk information from hosting providers and, depending on the situation, there might be numbers here that I could use.

I’m really disappointed that IBM won’t share their internal log uncompression tool to use the log scanner – I’m not sure I can justify running with manually compressing logs just to run the logs scanner. Automatic log compressions is one of my favorite recent features. If I get the opportunity, I’ll play with that tool and blog about it too.

You may also like...

1 Response

  1. December 12, 2014

    […] The DB2 Log Scanner is one of three HADR tools provided by IBM on developerWorks. For more details on the other(s), see: HADR Tools: the HADR Simulator […]

Leave a Reply

Your email address will not be published. Required fields are marked *