For all the complexities surrounding Shibboleth clustering including the difficulties in using DNS round robin we'd highly recommend the following documentation:

https://wiki.shibboleth.net/confluence/display/SHIB2/IdPClusterIntro

It gives clear and detailed information regarding the pros and cons in this space, which is really worth understanding before going deeper into this process.

Previously, IdPs could be configured using Terracotta, for high availability.  However, from March 2013, support for the open source Terracotta DSO has been discontinued (see http://www.terracotta.org/confluence/pages/viewpage.action?pageId=37129634). The Shibboleth community has been quiet on this issue. The only other discussions on alternate clustering options we've found extend from this thread:

https://groups.google.com/forum/?fromgroups#!searchin/shibboleth-users/hazelcast/shibboleth-users/UyP0v1NJJGA/QwfYsYvZS3kJ

to the following page:

https://wiki.shibboleth.net/confluence/display/SHIB2/Memcached+StorageService

At this time, the AAF has not used this plugin. It is contributed by a third party, so you may have some difficulties, or need to contact the authors directly, if you decide to go this way.

There are plans for version 3 of the Shibboleth IdP to address the problem of locally cached session information, but this is not available yet, and the Shibboleth developers don't yet have specific plans in their road map for how this will be solved.

Given this, the AAF's current recommendation for high availability for IdP is:

1. Optomise the IdP configuration as mentioned previously to do things like:

* resolver caching
* hourly metadata and attribute filter release updates
* optimal delivery of login pages by minimising css/js and ordering correctly for delivery to the browser.

2. Size your hardware for the server running your IdP to ensure it meets the load you anticipate. No one in the federation is currently coming anywhere even close to fully utilizing all the resources of one server to do their authentication. By undertaking step 1, you would probably be surprised in how little resources you actually need to handle a lot of logins. You could setup something like Apache Bench or similar to do some load tests to ensure you're comfortable.

3. Run a warm spare with duplicate configuration. This way you can fail over for scheduled/unscheduled outages and use it to test upgrades.

4. Make sure the other service components your IdP relies upon (e.g. your LDAP directory and/or database) are architected to be highly available.