With large deployments, its important to make good architectural choices about nodes in your environment and during deployment, none is more critical than the provisioning node itself.
This is because in TripleO the undercloud node doesn’t just push out images to nodes but also orchestrates configuration of the entire cluster. In order to do this, it needs to run a bunch of OpenStack services like nova, neutron, ironic, heat, keystone, glance etc. It also runs two databases, a messaging bus and web server.
All of this means that the provisioning node needs to be quite a powerful piece of kit as the provisioning process involves lots of disk and network I/O amongst other things. So its important to specify a fast disk, plenty of memory, a goodly amount of cores and a quick nic.
But sometimes, even with all of the above, you need to tweak things because with the best will in the world, the undercloud installation and configuration applies “best guess” values when it comes to things like threads, processes, timeouts and retries.
Each service has a tonne of configurable options and its important to understand the implications of each one and the impact this will have in order to get the best performance out of the node. Its also important to understand what changes will help in response to any particular bottleneck.
Specifically, we found that tuning the process and thread count for WSGI processes and increasing haproxy maxconn values caused the node to handle load with greater efficiency. A patch has been merged to address this. Red Hat produce a guide on tuning the undercloud (Director in their commercial parlance).