OpenStack Release Notes with Reno

I’m currently trying to get a patch submitted to the Puppet Keystone project which implements the ability to turn “chase referrals” on or off for deployments that use Active Directory.

One comment came back from the initial patch:

please add release note

Ok. So of course this being OpenStack it turns out to be complicated. You need to use “Reno”, a tool that has been used since Liberty (I think) to document changes to OpenStack. The HUGE irony is that the documentation for OpenStack’s documentation tool is sparse and pretty hopeless. It recommends running:

tox -e venv — reno new slug-goes-here

which gives the error: ERROR: unknown environment ‘venv’

Of course. Thankfully some kind soul in the Manila documentation project has added the missing clue for the clueless:

If reno is not installed globally on your system, you can use it from venv of your manila’s tox. Run:

source .tox/py27/bin/activate

py27 needed replacing with “releasenotes” for some obscure reason in the puppet-keystone directory but then it worked and I could finally run:

reno new implement-chase-referrals

and the release note was created.

Advertisements

OpenStack Tempest on RDO Mitaka

There are two main tools for testing a deployed cloud, Rally and Tempest.

I have been looking into verifying functionality in a private cloud once it has been created (using TripleO) and the documentation is, as usual, abysmal. Its the usual rabbit warren of developer docs, stuff relating to releases from 3 years back, blueprints which mention “the upcoming Havana release” etc etc.

So for reference (mine mostly), here are the steps to get OpenStack Tempest working on the RDO Mitaka stable release:

  1. Ensure you have a neutron network called “nova”
    $ neutron net-create nova --router:external --provider:network_type flat --provider:physical_network datacentre
    $ neutron subnet-create --name nova --enable_dhcp=False --allocation-pool=start=10.1.1.51,end=10.1.1.250 --gateway=10.1.1.1 nova 10.1.1.0/24
  2. Check that you have a role called “heat_stack_owner”. If not, create one:
    $ openstack role create heat_stack_owner
  3. Create your tempest directory and change into it
    $ mkdir ~/tempest && cd ~/tempest
  4. Initialize the directory by running
    $ /usr/share/openstack-tempest-10.0.0/tools/configure-tempest-directory
  5. Configure tempest
    $ tools/config_tempest.py --deployer-input ~/tempest-deployer-input.conf \
    --create identity.uri $OS_AUTH_URL identity.admin_password $OS_PASSWORD
  6. Run tempest (NOT with tools/run-tests.sh)
    $ ./run_tempest.sh
  7. Answer yes to the prompt to initialise your virtual environment. This will download required libraries etc.

Depending on environment the tests will take about an hour to run. So go make a brew and get ready to debug the failures. 🙂

Source: https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/director-installation-and-usage/85-validating-the-overcloud

Shellinabox and serial consoles

TripleO is in fairly dire need of something similar to conserver/wcons/rcons in xCAT. Just so you can see what the heck the node’s console is doing instead of having to fire up your out of band web interface, log in, launch web console and that is *if* you have the license for it.

CLI console access in Ironic is currently under development after I filed an RFE:

https://bugs.launchpad.net/ironic/+bug/1536572

but in the meantime I decided to try and get serial console access through shellinabox working.

It’s not too hard and the following is a good start:

http://docs.openstack.org/developer/ironic/deploy/install-guide.html#configure-node-web-console

The key thing to understand is the terminal_port value which varies according to ipmi driver.

Once configured this gives a nice view with a decent amount of scroll-back.

Its a pity all this is manual – I guess it would be fairly easy to script as part of an undercloud install to enable serial consoles but its enough of a security risk to discourage this but not making it so easy perhaps!

Removing orphaned instances when all else fails…

Working on OpenStack is complex and working on older versions of OpenStack is even more complex. If your instance is spawning but the shared storage hosting the ephemeral disk or block storage oopses/offlines/panics then you can be left with orphaned instances that exist in the database but nowhere else. You try to delete them using nova delete but this doesn’t work because OpenStack can’t locate the files it wants to delete and you get into a real mess.

Some articles indicate that all you need to do is run some variation on:

mysql -D nova -e "delete from instances where instances.uuid = '$uuid'"

but this is bad because it leaves all sorts of information relating to the VM in existence. It appears to have been fixed in later versions of OpenStack – Kilo hasn’t exhibited this problem yet – so what follows is Icehouse-specific, for those people still running this release.

Most of the database info I have stolen from the URL in the comments, I have just added input so you don’t need to drop to a mysql prompt. Feed it your VM UUID and you’re done. If you have reached this page then you’re probably in not such a great place so the usual warnings apply about random bits of bash script on the internet apply. And remember that reset-state is your friend and you should have tried lots of other stuff first.

#! /bin/bash
# IMPORTANT - READ ME
# This is an Icehouse-specific script
# to remove an instance that is not consuming ANY resources
# ie. It only exists in the database. You need to be VERY
# sure of this fact before using so as not to leave disks
# orphaned or instances running. Use as a last resort after
# deletion and reset-state nova options have failed. Use nova show to
# inspect libvirt xml prior to using.
# Source for db schema: https://raymii.org/s/articles/

read -p "Please enter the UUID of the vm you need to clear from the database:" uuid
mysql -D nova -e "select display_name from instances where instances.uuid = '$uuid'"
read -p "Are you sure this is the instance you are looking for? y/n: " response
if [ $response == y ]; then
mysql -D nova -e "delete from instance_faults where instance_faults.instance_uuid = '$uuid'"
mysql -D nova -e "delete from instance_id_mappings where instance_id_mappings.uuid = '$uuid'"
mysql -D nova -e "delete from instance_info_caches where instance_info_caches.instance_uuid = '$uuid'"
mysql -D nova -e "delete from instance_system_metadata where instance_system_metadata.instance_uuid = '$uuid'"
mysql -D nova -e "delete from security_group_instance_association where security_group_instance_association.instance_uuid = '$uuid'"
mysql -D nova -e "delete from block_device_mapping where block_device_mapping.instance_uuid = '$uuid'"
mysql -D nova -e "delete from fixed_ips where fixed_ips.instance_uuid = '$uuid'"
mysql -D nova -e "delete from instance_actions_events where instance_actions_events.action_id in (select id from instance_actions where instance_actions.instance_uuid = '$uuid')"
mysql -D nova -e "delete from instance_actions where instance_actions.instance_uuid = '$uuid'"
mysql -D nova -e "delete from virtual_interfaces where virtual_interfaces.instance_uuid = '$uuid'"
mysql -D nova -e "delete from instances where instances.uuid = '$uuid'"
echo "Ok, done"
else
echo "Quitting, no changes made"
fi

Understanding salt errors

For a project I’m currently working on we use salt to manage configuration across the cluster. This is something I’ve had to learn quickly but thankfully it is reliable and robust … until something goes wrong.

Last week I hit the following error when trying to replicate a client’s setup in-house and running a manual salt-call on one of the nodes.

Rendering SLS ‘base:service.keepalived.cluster’ failed: Jinja variable list object has no element 0

I’m not a programmer. I once made a very average pass at Java but that was about a decade ago. So the stuff about list objects not having an element 0 wasn’t helpful and isn’t really very good error output. This is a fairly old version of salt (2014.7) so perhaps this has been addressed since.

You can turn on debugging when running salt calls with -l and an option, e.g. debug, info, all etc.

This indicated it was running a dns lookup in the previous command:

dig +short db-cluster-1.test.cluster A

The difference here between the working setup and my failing salt run was that the output only gave the hostname on mine whereas the working config returned the host IP as well.

Once this was added into the bind db and the daemon restarted the errors stopped.

Configuring OpenStack to use jumbo frames (MTU 9000)

Controller nodes

Disable puppet :

# systemctl stop puppet
# systemctl disable puppet

Place a given controller into standby mode :

# pcs cluster standby $(hostname)

Update the MTU for all physical NICs being used by either provider or tenant networks :

# echo MTU=9000 >> /etc/sysconfig/network-scripts/ifcfg-eth0

Update the various Neutron related configuration files :

Note that if tenant networks are being used then we need to allow for the overhead of VXLAN and GRE.

# echo “dhcp-option-force=26,8900” > /etc/neutron/dnsmasq-neutron.conf
# openstack-config –set /etc/neutron/dhcp_agent.ini DEFAULT dnsmasq_config_file /etc/neutron/dnsmasq-neutron.conf
# openstack-config –set /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini agent veth_mtu 8900
# openstack-config –set /etc/neutron/l3_agent.ini DEFAULT network_device_mtu 9000
# openstack-config –set /etc/nova/nova.conf DEFAULT network_device_mtu 9000

Reboot to ensure everything persists.

# reboot

Unstandby the node and repeat on the remaining controllers :

# pcs cluster unstandby $(hostname)

Compute nodes

Disable puppet :

# systemctl stop puppet
# systemctl disable puppet

Update the MTU for all physical NICs being used by either provider or tenant networks :

# echo MTU=9000 >> /etc/sysconfig/network-scripts/ifcfg-eth0

Update the OVS plugin configuration file :

# openstack-config –set /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini agent veth_mtu 8900
# openstack-config –set /etc/nova/nova.conf DEFAULT network_device_mtu 9000

Reboot to ensure everything persists.

# reboot

Source: https://access.redhat.com/solutions/1417133