This week I’ve been starting to get to grips with Red Hat’s Director cloud deployment tool. It leverages Ironic to provision baremetal machines and introduces the concept of an undercloud and overcloud – there are essentially two clouds, the undercloud is a basic OpenStack environment with just the tools needed to get the main job done. The Overcloud is the cloud your users interact with and run their whatever on.
Its clear some serious engineering time has gone into trying to make this as easy as possible and the good news is that so far, it seems to be working well. Installation of the undercloud was a simple as defining a few variables like network range, interface etc. I had tried RDO Manager (Director’s upstream product) and had a fairly torrid time. They were going through some major infrastructure changes at the time however so perhaps that was part of the problem. Meh.
Anyway, pretty picture time.
So now there are various roles are ready to be deployed. There are the usual compute and control as well as Ceph (no surprise as this is a Red Hat product), Cinder and Swift.
Initial Overcloud deployments haven’t completed yet – this is more due to me being a tool and not following the instructions rather than any particular bug in the software.
The biggest issue so far has been hardware related. I’ve been using an ancient Nortel (yes, remember them!) switch was taking an eon to bring up network links, I think, due to a buggy STP (Spanning Tree Protocol) implementation. Director uses iPXE rather than PXElinux for some reason (UEFI maybe?) and although it downloaded the NBP file fine, when it came to get a DHCP lease, it completely timed out. It was only when I attached the second interface to try and boot from that that it became apparent that the link was taking a long time to come up. So I guess STP was at fault here but never debugged, just replaced the switch (for a good old HP ProCurve) and it worked fine.
In the course of the above issue I’ve learnt plenty of things about PXE booting like how when the logs say “error 8 User aborted the transfer” you can actually ignore it because its normal, even if journalctl flags it in big red letters. Apparently this is an initial check to see what protocols the client supports before initiating the download proper.
Other problems included discovering that there needs to be a default flavor called “baremetal” so unassigned nodes know where to live and enabling SELinux – RDO Manager/Director won’t actually install if it is disabled.