VERIFIED SOLUTION i
X

Vault Load Balancing, Fault Tolerance, and High Availability

UPDATED: October 18, 2017


Vault Load Balancing, Fault Tolerance, and High Availability

In environments with large amounts of request traffic you may want to deploy multiple rendering engines to handle the load. Alternatively, it may be desirable to deploy multiple rendering engines to make the system more fault tolerant. In such situations, request traffic must be directed to one of the installed Rendering Engines based on availability.  Several options exist to accomplish this requirement.  A hardware-based Network Load Balancer such as an f5networks BigIP can be used to expose a Virtual IP Address and/or Virtual Hostname onto the network, and forward traffic to the downstream Rendering Engines as appropriate.  If such hardware is already present, it may be preferable to leverage this capability and not to use the Vault Router.  Alternatively, the Vault Router can be used as a software-based solution to meet this requirement.  The Vault Router has the benefit of adding an additional layer of request/response caching, which may reduce response times and reduce overall system load.

The Vault Router (e2Routerd) is a software application that manages connections to many Rendering Engines and spreads the search and transformation load intelligently across the network of Rendering Engines it knows about. In a simple Vault configuration, a front end application sends Vault specific search or rendering commands to a single rendering engine for execution, and a Vault Router is not required. When the Vault Router is used, a front end application sends Vault commands to a Router instead, which then distributes incoming requests across a set of connected rendering engines based on current activity. If a particular connection fails, traffic is rerouted to other connections automatically.
 
The Vault Router uses an algorithm to distribute requests which is based on 4 weights, in order of priority:
 
1. failed requests since last “heartbeat” (~30 sec).
2. hold ticks (holds come from previous “heartbeats” with failures).
3. running commands.
4. completed commands.
 
Priorities 1 and 2 manage connection failures. Connections that fail during the current heartbeat are avoided. Connections with failures during the last heartbeat get increased hold ticks which act as a delay until the connection is retried. Connections that do not fail outright can be deemed as failed if a transaction to the connection exceeds the wait timeout and retry thresholds for that connection. 
 
Priority 3 is the main rule, causing commands to be distributed to rendering engines running the fewest commands. It doesn't know the costs of each command, just the number of commands that were sent and have not completed.
 
Priority 4 acts to distribute the load across connections under light load. This helps keep the “live-ness” information about connections fresh. This produces a round robin effect under light load.
 
Failure modes aside, the Vault Router distributes requests based on the number of requests each rendering engine is currently working on.
 
Note: The router does not directly know about load from other sources that might be sent directly to the attached rendering engines. It will influence how long request take and thus how many requests are in flight at any given time.
 
Planning a Vault Router
 
At its simplest, the Vault Router is a listener and a series of connection endpoints (renderers) where traffic will be distributed to. Each connection must have an endpoint and there must be enough threads configured to keep the message traffic flowing. A typical router configuration will consist of a [router1] component and a series of [connectionN] components. A [pool1] component can be added to improve thread count and throughput behavior where there are many connections or where the traffic load is expected to be high.
 
Configuration settings by component
 
[server1]
 
 
[router1]
 
count=n                                                The number of connections to be defined (mandatory)
           
debug=0                                   Log debug level. Default is 0 (off). It is generally recommend setting debug=1 initially to validate the expected behavior of the load distribution. It should be reset to 0 to reduce the size of the router log file once the router is in normal production.  
 
[pool1]
minthreads=                                         Minimum number of threads allowed. The default value is 4.  This number is not likely to be sufficient in a heavily utilized environment.  See Note: below.
maxthreads=                                        Maximum number of threads allowed. The default value is 16
startthreads=                                        Number of threads created when the router starts. Set this based on the traffic you expect to see when the router starts up. The actual thread count will be adjusted between the minthreads and maxthreads values based on load over time. The default value is 8.
Note: Set up min and max threads based on the number of connections and the anticipated traffic. There should be one thread for each connection plus one thread for the listening connection and a number of worker threads that will handle the messages. The worker thread count should be determined based on the maximum number of concurrent messages you would expect to be active. This includes searches and rendering requests. Note that thread count capacity can be limited by the amount of system and process memory available.
 
[connection1]
service=IP address or hostname:port     The connection point for the first renderer
 
maximumcapacity=size in bytes             The maximum size of the received message from the render connection. The value should be large enough to accommodate the largest rendered document you expect to receive. The default value is 64 Mbytes.
 
waittimeout=time in seconds                  The maximum time (in seconds) to wait for a request reply before a connection is deemed to be timed out. This applies to an individual request, not the use of the router to renderer connection. The default value is 600.
 
idletimeout=time in seconds                  The number of seconds without activity after which the router will close a connection to a renderer. The default value is 300.
 
retrycount=                                           The number of times to retry a failed/timed out request on this connection before declaring the connection has failed. The default value is 3.
 
 
[connection2]
service=IP address/hostname:port         The connection point for the second renderer
 

 
 
 
Sample Configurations:
Scenario 1:
A Vault router connected to two render instances on the same Vault platform. One renderer is listening on port 6003, the other on 7003. The  router will listen on port 8003.
[server1]
service=*:8003
 
[router1]
count=2
                debug=0
 
                [connection1]
                service=vault.render.pvt:6003
 
                [connection2]
                service= vault.render.pvt:7003
 
Scenario 2:
A Vault router connected to two render instances on the same Vault platform. One renderer is listening on port 6003, the other on 7003. The router will listen on port 8003. The expected load is 40 concurrent searches per second with an average response time of 60 msec. The peak load is 80 concurrent searches per second. There can be load as soon as the router is started. The longest render time expected is 1 minute,
[common1]
# allow for longest render doubled
                waittimeout=120
 
[pool1]
#  min concurrent searches + connection count + input port
minthreads=43
# max concurrent searches + connection count + input port
                maxthreads=86
                # allow for 57 concurrent searches in first time window
                startthreads=60
 
[server1]
service=*:8003
 
[router1]
count=2
                debug=0
 
                [connection1]
                service=vault.render.pvt:6003
                inherit=comon1
 
                [connection2]
                service= vault.render.pvt:7003
inherit=comon1
 
Scenario 3:
A larger Vault environment with:
  • 2 e2serverd (S1,S2) and 1 e2loaderd instance (L1)
    • all connected to the SAN containing the data store
  • 6 e2renderd instances
    • R1,R2,R3 connect to S1
    • R4,R5,R6 connect to S2
  • 3 front end application servers (e.g. Tomcat or JBOSS), A1,A2,A3
    • some sort of external load balancing in front of the app servers
  • 3 e2routerd instances, X1,X2,X3
    • one local to each app server (A1 uses X1, A2 uses X2,…)
    • each connecting to routers R1 to R6
Each e2routerd.ini would list all 6 rendering engine hosts:
 
[server1]
service=*:8003
 
 
 
[router1]
count=6
debug=1
 
[connection1]
service=r1.company.pvt:6003
 
[connection2]
service=r2.company.pvt:6003
 

 
[connection6]
service=r6.company.pvt:6003
 
[pool1]
minthreads=30
maxthreads=90
startthreads=30
debug=1
 

Downloads

  • No Downloads