Failover Capability
In any mission critical communications system it is imperative to have built in redundancy. Telex PTToC supports this function with a Failover capability. Failover by definition is the ability to automatically switch over from a primary working server to a secondary backup server should there be any catastrophic failure in operation of the primary server or its associated network.
Implementation
In the primary server, the IP address of the secondary server is configured. When the primary server starts, a connection to the secondary server is immediately established. Through this connection the primary server shares its licensing information to the secondary server. Additionally, the primary server conveys any operational changes to the system configuration to secondary server in real time so that the system configurations remain synchronized. This connection will remain active as long as both servers are running. If this connection is lost, the secondary server will immediately assume it is an active server allowing Telex PTToC clients to connect. However in some cases even if the connection is not lost, the complexities of some network failures may still warrant the secondary server becoming the active server. When any Telex PTToC client logs into the primary server, the secondary server IP address is automatically provided to it. In the event that communications with the primary server is lost, the client will automatically attempt to connect to the secondary server. If the secondary server is available and active, the Telex PTToC client will log into the secondary server.
Once the secondary server becomes the active server, switching back to the primary server generally will require a manual authorization as the condition that caused the failover would need to be properly evaluated to ensure there is no possibility for reoccurrence of that event which would unnecessarily disrupt active communications. The manual switchover can be controlled through the System Administration application.
Failover Criteria
In normal operations, the primary server is always the active server. In general as long as the communications link between the primary and secondary server is connected, the primary server will remain as the active server. When a server is not the active server, logins will not be allowed. There are many different scenarios that can result in a failover event. The most common are as follows:
Communication link between primary and secondary servers lost due to
primary server failure:
In this simplest scenario, the secondary server would recognize the loss of the
primary server and immediately become the active server. All clients would also
recognize the loss of the primary server and would immediately reconnect to the
secondary server.
Communication link between primary and secondary servers lost due to
failure of network infrastructure:
In this scenario, the secondary server would recognize the loss of the primary
server and immediately become the active server. Since the primary server is
still running, it too would also still consider itself to be the active server.
However, if the network failure also resulted in the simultaneous loss of the
majority of connected Telex PTToC clients the primary server will deactivate itself
forcing all remaining clients to connect to the secondary server.
Communication link between primary and secondary servers is not lost but
partial failure of network infrastructure:
In this scenario, the partial network failure may result in the loss of a large
portion of the Telex PTToC clients. In this case the primary server will inform the
secondary server to activate allowing connections to be made. If the secondary
server reports the client connection were established the primary server will
deactivate itself forcing all remaining clients to connect to the secondary server.
Summary
The Telex PTToC Failover support is an integral part of any mission critical communications solution. While its operation will never be visible to the end user, it availability in the case of an unforeseen catastrophic primary server failure will quickly restore communications capability.