So you need a management network quick-smart?

TripleO deployments can be deployed with an optional Management VLAN. You can use this to run ansible playbooks, monitoring systems and manage your cloud, hence the name.

However this requires configuration during deployment. So what happens if you have a cloud that doesn’t have a management vlan? You can use the provisioning network. But the problem is that this doesn’t have fixed addresses, only dynamic. However these rarely change so to perform a quick playbook run or a cluster-wide config with pdsh for example, you can use OpenStack’s cli to create a hosts file as follows:

openstack server list -f value --column Networks --column Name | sed 's/ ctlplane=/ /g' | awk '{ print $2 " " $1}'

This converts the output of your ironic nodes to a format you can cat into a hosts file.

This avoids having to add your management node to another network (e.g. storage) and use an existing network.

Its not big, its not clever but it does work.

Manually re-setting failed deployments with Ironic

OpenStack commands have some odd naming conventions sometimes – just take a look at the whole evacuate/host-evacuate debacle in nova for example – and ironic is no exception.

I’m currently using tripleo to deploy various environments which sometimes results in failed deployments. If you take into account all the vagaries of various ipmi implementations I think it does a pretty good job. Sometimes though, when a stack gets deleted, I’m left with something like the following:

[stack@undercloud ~]$ nova list
+—-+——+——–+————+————-+———-+
| ID | Name | Status | Task State | Power State | Networks |
+—-+——+——–+————+————-+———-+
+—-+——+——–+————+————-+———-+

[stack@undercloud ~]$ ironic node-list
+————————————–+————-+————————————–+————-+——————–+————-+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+————————————–+————-+————————————–+————-+——————–+————-+

| 447ffea5-ae3f-4796-bfba-ce44dd8a84b7 | compute4 | 26843ce8-e562-4945-ad32-b60504a5bca3 | power on | deploy failed | False |

So an instance is still associated with the baremetal node.

In this case, it isn’t obvious but after some digging:

ironic node-set-provision-state compute4 deleted

should result in the node being set back to available. I’m still not clear if this re-runs the clean steps but it gives me what I want to re-run deployment.