Understanding salt errors

For a project I’m currently working on we use salt to manage configuration across the cluster. This is something I’ve had to learn quickly but thankfully it is reliable and robust … until something goes wrong.

Last week I hit the following error when trying to replicate a client’s setup in-house and running a manual salt-call on one of the nodes.

Rendering SLS ‘base:service.keepalived.cluster’ failed: Jinja variable list object has no element 0

I’m not a programmer. I once made a very average pass at Java but that was about a decade ago. So the stuff about list objects not having an element 0 wasn’t helpful and isn’t really very good error output. This is a fairly old version of salt (2014.7) so perhaps this has been addressed since.

You can turn on debugging when running salt calls with -l and an option, e.g. debug, info, all etc.

This indicated it was running a dns lookup in the previous command:

dig +short db-cluster-1.test.cluster A

The difference here between the working setup and my failing salt run was that the output only gave the hostname on mine whereas the working config returned the host IP as well.

Once this was added into the bind db and the daemon restarted the errors stopped.