Failover

EPM-ULEPM-L can be configured to provide backup user request processing and activity logging. This capability is referred to as failover.

Configure Backup Request Processing

To configure backup request processing, specify the backup policy server hosts by using the submitmasters keyword in /etc/pb.settings on the submit host. The order of attempts is either the order in the submitmasters keyword list, or the order specified by RNS, or possibly random if the randomizesubmitmasters keyword is set.

Failover for submitmasters means that if a host doesn’t respond within masterdelay milliseconds, a second connection is attempted to the next host in the (possibly randomized) list. Once a connection is made, the EPM-ULEPM-L protocol is negotiated. Those negotiations include SSL/Kerberos/networkencryption and other protocol information.

  • If those negotiations succeed, that connection is used for request approval and further failover does not happen.
  • If the connection is not answered, or the negotiations do not succeed, more hosts are tried in succession until one is successful or the list is exhausted.

Connections are attempted in order, however depending on the masterdelay value and the speed/busyness of the network and hosts, may finally succeed to a server specified later in the list.

Configure Backup Logging

To configure backup logging, specify the backup log hosts by using the logservers keyword in /etc/pb.settings on the policy server host. Previous EPM-ULEPM-L versions behave similarly to the policy server connection, in which the relevant keywords are logservers, logserverdelay, and randomizelogservers. Connection attempts are made until the EPM-ULEPM-L protocol negotiations succeed.

EPM-ULv22.3 introduces There is a second level of failover for logservers that takes place after the EPM-ULEPM-L protocol negotiations are successful, but then an error happens (for example, disk full or insecure log file/directory). This mechanism tries the listed logservers in succession until a logserver reports that it has successfully logged the event. When randomizelogservers is used, only the first logserver attempt is randomized. Diagnostic messages indicating the failed servers are logged. The transparentfailover keyword controls whether the end user sees the diagnostic messages.

For more information on failover, see the following:

Fine Tuning Policy Server and Failover Connection Timing

masterdelay

  • Version 4.0.0 and later: masterdelay setting available.

When a request is submitted, the policy server hosts that are listed in the submitmasters line are tried in the order they appear, from left to right. The masterdelay setting enables the administrator to adjust the amount of time between failover attempts.

Without a specified time-out, the client tries the first policy server host on the submitmasters line. If it does not receive a response within 500 milliseconds, then the client adds the second policy server host. If neither responds in the next 500 milliseconds, then the client adds the third policy server host, and so on. By specifying a masterdelay, you can change the 500 millisecond waiting period before the client goes on to the next policy server host.

With a masterdelay of 0 milliseconds, you get the fastest possible connection, but the policy server you connect to may not be predictable. You might also increase network traffic, depending on the number of connections that are opened.

With a larger masterdelay, you can increase the predictability, but you might also increase the time needed to form a failover connection. The longer the delay, the more predictable the sequence is.

masterdelay 200
Default
masterdelay 500
Used on

Submit hosts

masterprotocoltimeout

  • Version 4.0.0 and later: masterprotocoltimeout setting available.

After a connection is established, the programs perform some protocol checks to verify a proper and working connection. Some types of protocol failures could take a long time to determine (for example, wrong service running on the policy server port, or mismatched encryption types/keys).

The masterprotocoltimeout setting enables the administrator to control the maximum time to wait for protocol completion. If a protocol step does not complete within the specified number of milliseconds, then the client continues to try the next policy server host in sequence. A value of -1 indicates no protocol timeout.

masterprotocoltimeout 2000
Default
masterprotocoltimeout 500
Used on
  • Policy server hosts
  • Run hosts
  • Submit hosts

Fine Tuning Log Servers and Failover Connection Timing

logserverdelay

  • Version 4.0.0 and later: logserverdelay setting available.

When a log request is processed, the log servers that are listed in the logservers line are tried in the order they appear, from left to right. The logserverdelay setting enables the administrator to adjust the amount of time between failover attempts.

Without a specified time-out, the logging program (for example, pbrun, pbmasterd, pblocald, etc.) tries the first log server on the logservers line. If it does not receive a response within 500 milliseconds, then it adds the second log host. If neither responds in the next 500 milliseconds, then it adds the third log host, and so on. By specifying a logserverdelay, you can change the 500 millisecond waiting period before the logging program goes on to the next log server.

With a logserverdelay of 0 milliseconds, you get the fastest possible connection, but the log server that you connect to may not be predictable. You might also increase network traffic, depending on the number of connections that are opened.

With a larger logserverdelay you can increase the predictability, but you might also increase the time needed to form a failover connection. The longer the delay, the more predictable the sequence is.

logserverdelay 2500
Default
logserverdelay 500
Used on
  • Policy server hosts
  • Run hosts
  • Submit hosts by pbksh and pbsh when a policy server is not available

logserverprotocoltimeout

  • Version 4.0.0 and later: logserverprotocoltimeout setting available.

After a connection is established, the programs perform some protocol checks to verify a proper and working connection. Some types of protocol failures can take a very long time to determine. For example, the wrong service running on the log server port, or mismatched encryption types/keys.

The logserverprotocoltimeout setting enables the administrator to control the maximum time to wait for protocol completion. If a protocol step does not complete within the specified number of milliseconds, then the logging program continues to try the next log server in sequence. A value of -1 indicates no protocol timeout.

If the iologack setting is used, then the logserverprotocoltimeout setting also controls how long a submit host should wait for an acknowledgment from the log host.

logserverprotocoltimeout 2000
Default
logserverprotocoltimeout 500
Used on
  • Log hosts
  • Policy server hosts
  • Run hosts
  • Submit hosts by pbksh and pbsh when a policy server host is not available

For more information, see iologack.

randomizelogservers

  • Version 9.2.0 and earlier: randomizelogservers setting not available.
  • Version 9.3.0 and later: randomizelogservers setting available.

The randomizelogservers setting forces the policy server/submit host/run host to choose a log server host at random, rather than choosing the first available log server host that is specified in the logservers setting. This feature balances the load among multiple log server hosts.

The use of randomizelogservers can cause accept and finish events to be located on different log servers if the log servers are configured with eventdestinations set to a flat file (authevt=<file>) or an SQLite Database (authevt=db). However, if eventdestinations is set to authevt=<DSN> (same ODBC Oracle or MySQL database on all the log servers), then the accept and finish events are stored on the same Oracle or MySQL server. The default randomizelogservers setting is no.

The randomizelogservers keyword should not be used with the use of DNS SRV lookups. The randomizelogservers keyword can result in accept and finish events logged on different logservers, causing the need to merge iologs.

randomizelogservers yes
Default
randomizelogservers no
Used on
  • Submit hosts
  • Run hosts
  • Policy servers

Acknowledge Failovers

transparentfailover

  • Version 5.1.1 and earlier: transparentfailover setting not available.
  • Version 5.1.2 and later: transparentfailover setting available.

A transparentfailover occurs when an initial connection to a policy server or logserver host has failed and the program performs a failover to another available policy server or logserver host in the list. To acknowledge that a user failover has occurred, error messages from the failed connection are displayed to the user.

The transparentfailover setting enables you to suppress the following failover error messages:

  • Any Kerberos initialization error
  • 3084 initMangle failure during startup
  • 3089 Could not send initial protocol header to Policy Server
  • 3090 Did not receive initial protocol header from Policy Server
  • 8534 Policy Server on %s is not SSL enabled
  • 1913 Invalid Policy Server daemon on Policy Server host %s

When transparentfailover is set to yes, failover error messages listed above are suppressed. To display failover error messages, set transparentfailover to no in the pb.settings file.

transparentfailover yes
Default
transparentfailover yes
Used on

Submit hosts