Allotment and apiary monitoring system

2020 has allowed me a small amount of “hobby” time. I’ve been contemplating a remote IoT device for sending pictures from our allotment, specifically of my apiary. This is part learning exercise, part “what is going on on the plot/apiary”.

I defined the following requirements:

  • Solar-powered – there is no power on the site so this is the only reliable source
  • Battery-backed – Needs to be able to smooth out issues supplying power to the device
  • Very-low-power SoC device – Sheffield does not get much sun, especially in winter
  • Camera
  • 3G GSM modem compatible – needs to be able to connect to the internet!
  • Operating System with low disk usage, preferably booting to RAM

I ultimately arrived at the following:

The choice of Tiny Core was driven largely because it boots to RAM. The documentation is patchy and getting it working consisted of lots of googling the forums. However the project does seem to be active with frequent recent releases.

Hardware

I had to solder the Pi Zero with a 40 pin header. I got some bread boards for practice and when I was happy enough I went for it. Fairly happy with the result.

The ZeroCam is really fiddly to insert and not something you want to repeat on a regular basis. Also the screws to connect the PiJuice Zero HAT to the Pi Zero are not great and one thread was stripped with ease so its missing – see below.

Here’s how things look at the moment.

Current state of play

Things to note:

  • The LiPo battery incorrectly reported 246 degrees C (!) so hit a temperature violation and therefore doesn’t charge. Disabling Temperature Sense in the software resolved this.
  • The USB splitter is temporary for console access. I am using a micro USB to Ethernet adapter for testing at my desk whilst I await the SIM (and so as to not consume data over 3G unnecessarily).
  • The HDMI stuff will obviously go once in the field.
  • There is no case. I’m probably going to use a clear plastic food container with the click down sides, drill a small hole for the camera and then stick some silica gel sachets inside.

Software

There are three elements to this:

  1. PiCore – The OS running on the Pi Zero
  2. The PiJuice control software
  3. Website backend software to receive and display the uploaded images.

Installing PiCore requires some manual steps:

  1. Download and extract the image from http://tinycorelinux.net/12.x/armv6/releases/RPi/
  2. Write the img file to the sdcard – I have a USB-C card reader and used Image Writer
  3. Extend the second partition, doing something like:
1) Start fdisk partitioning tool as root:

   sudo fdisk -u /dev/mmcblk0

   Now list partitions with 'p' command and write down the starting and
   ending sectors of the second partition.

2) Delete second partition with 'd' than recreate it with 'n' command.
   Use the same starting sector as deleted had and provide end
   sectore or size greater than deleted had having enough free space
   for Mounted Mode. When finished, exit fdisk with 'w' command. Now
   partition size increased but file system size is not yet changed.

3) Reboot piCore. It is necessary to make Kernel aware of changes.

4) After reboot expand file system to the new partition boundaries with 
   typing the following command as root:

   resize2fs /dev/mmcblk0p2

Now you are ready to use the bigger partition.

I needed to install additional software packages. These are squashfs images as TCZ’s. However out of the box, there is no support for USB ethernet or wifi. I therefore remounted the card and copied these modules manually onto the sd card, before rebooting the Pi and loading the modules.

tce-load -i /mnt/mmcblk0p2/tce/optional/net-usb-5.4.51-piCore.tcz

I was then able to load the remaining modules I need going forward:

tc@box:~$ cat /mnt/mmcblk0p2/tce/onboot.lst 
openssh.tcz
my-modules.tcz
ffmpeg.tcz
net-usb-5.4.51-piCore.tcz
kmaps.tcz
ppp.tcz
ppp-modules-5.4.51-piCore.tcz
usb-serial-5.4.51-piCore.tcz
curl.tcz
ntp.tcz

I also needed to create a custom squashfs kernel module package – listed above as my-modules.tcz – this is for ffmpeg support to enable the camera. In the end I took the lazy option and just grabbed staging/ and media/ from the complete modules tarball.

tce-load -wi squashfs-tools
mkdir my-modules
mkdir -p my-modules/lib/modules/5.4.51-piCore/kernel/drivers/
cp -r drivers/staging/ my-modules/lib/modules/5.4.51-piCore/kernel/drivers/
cp -r drivers/media/ my-modules/lib/modules/5.4.51-piCore/kernel/drivers/
mksquashfs my-modules/ my-modules.tcz
sudo cp my-modules.tcz /mnt/mmcblk0p2/tce/optional/

You also need to mount the first partition and configure config.txt file to ensure the camera loads on boot:

tc@box:~$ mount /mnt/mmcblk0p1/
tc@box:~$ tail -n 6 /mnt/mmcblk0p1/config.txt 
[all]
#dtoverlay=vc4-fkms-v3d
#Enable camera
start_x=1
gpu_mem=128
disable_camera_led=1

My startup script looks something like this:

date # debugging for RTC, will drop once figured this out
sleep 5
ifconfig eth0 up # manually bring up interface
udhcpc -i eth0 # get an address
sleep 5 # wait for a bit before trying to get the time
ntpdate -s time.nist.gov
date # this should be the correct date now

loadkmap < /usr/share/kmap/qwerty/uk.kmap # load the UK keyboard map

TIMENOW=$(date +%Y-%m-%dT%H:%M:%S) # format the date to something WordPress will accept

# ffmpeg is a bit noisy so add some flags to quieten it down
ffmpeg -i /dev/video0 -frames:v 1 /home/tc/$TIMENOW.jpg -hide_banner -loglevel panic -y

METADATA="A picture taken at $TIMENOW" # A string for adding some basic text

# WordPress accepts images then you can apply metadata to that image so do some nasty awk to get the ID returned. Yes there are better ways to do this.
IMG_ID=`curl -sk --request POST \
                 --url https://grumpybeeman.com/wp-json/wp/v2/media \
                 --header "cache-control: no-cache" \
                 --header "content-disposition: attachment; filename=$TIMENOW.jpg" \
                 --user username:redacted_application_password \
                 --header "content-type: image/jpg" \
                 --data-binary "@/home/tc/$TIMENOW.jpg" \
                 --location | awk -F : '{ print $2 }' | cut -f1 -d","`

# Now we grab the IP address. The idea is that the images will have this as the description so I can access the pi remotely if I need to. How this will actually work in practice is anyone's guess.
IP_ADDR=`ifconfig eth0 | grep inet | awk -F : '{ print $2 }' | cut -f1 -d " "`
curl -sk --request POST \
         --url https://grumpybeeman.com/wp-json/wp/v2/media/$IMG_ID \
         --user username:redacted_application_password \
         --header "content-type: application/json" \
         -d '{"title":"'"$METADATA"'", "caption":"'"$METADATA"'", "description":"'"$IP_ADDR"'", "alt_text":"'"$METADATA"'", "date":"'"$TIMENOW"'"}'

# I can toggle this file between 1 and 0 to control whether the Pi turns off immediately after taking the picture or not. If its on then I can shell in and kill the poweroff process before it completes.
POWER=$(curl https://grumpybeeman.com/control) 
if [ $POWER == 1 ]; then
    poweroff -d 300
elif [ $POWER == 0 ]; then
    poweroff
else                
    poweroff -d 60 # handle errors
fi

At each change, PiCore requires you to run:

filetool.sh -b

This persists changes to the disk.

The second part of this blog will be wiring in the solar panel, moving to GSM connectivity and initial field trials!

Making a custom RHEL ISO with a kickstart and EFI

An interesting problem came up today whereby we (and by we I mean someone else so we all gathered round and googled the problem) needed to:

  1. Create a RHEL 7.6 ISO
  2. Make it bootable on a UEFI system
  3. Include an unattended kickstart file to automate deployment
  4. Actually make it work

Discovering how to do this was tougher than it sounds but thanks to a colleague at Red Hat (shout out to Pushpendra Madhukar Chavan) who provided the following example:

# mkisofs -o /tmp/test.iso -b isolinux/isolinux.bin -J -R -l -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e images/efiboot.img -no-emul-boot -graft-points -V “RHEL-7.6 Server.x86_64” .
# isohybrid –uefi /tmp/test.iso

This worked like a charm so I leave this here so that others might benefit.

This is derived from this Red Hat kb article, for the most part.

OpenStack, Heat and HAProxy

I’ve had an interesting experience debugging (or failing to debug) an issue in RHEL OSP 10.

Stacks were failing to complete:

status: CREATE_FAILED
status_reason: |
Error: resources.repo_definition_repovol_attach: Failed to attach volume x to server y - Unknown Error (HTTP 504)

From the logs it was obvious that nova was booting the instance fine and cinder was attaching the volume ok, even after the stack create failed.

I’ve mostly done OpenStack deployments so debugging operations is new to me. Thankfully Red Hat has some really good people who are used to chasing errors through the system.

One of the support chaps chased this through the system and determined it was due to a HAProxy timeout (I had no idea HAProxy affected Heat in this way) so we bumped the settings:

timeout http-request 20s
timeout queue 2m
timeout connect 20s
timeout client 10m
timeout server 10m
timeout check 20s

After applying this and restarting haproxy, the stack create completed. It turned out some instances were taking ~7 minutes to complete. My gut instinct is that there are storage or instance issues at play here. 7 minutes to wait for a single instance to boot and attach it’s volumes is too long. But then I’m used to GPFS and Ceph…

UPDATE: Yeah, so both Ceph and GPFS/Spectrum Scale support Copy-On-Write hence stuff happens VERY quickly. I just wasn’t used to traditional storage.

Cloudforms provider creation via API

In my new job, I’ve been working with other cloud technologies apart from OpenStack. Ansible is used heavily and now some version of this technology runs through a large percentage of Red Hat’s products. Cloudforms positions itself as a single pane of glass through which to control not just traditional infrastructure providers like RHEV and VMware but also OpenStack, AWS, Satellite 6, Ansible Tower and a multitude of other tools.

So I have only a small amount of experience with the above, OpenStack aside. Documentation is generally pretty good but I have spent some time reading the API runes to determine how to automatically create providers within Cloudforms (note that this should work fine for ManageIQ as well). Ansible does have a manageiq provider module but its far from complete.

NB: The following is appropriate for MY usage on one environment, you WILL need to set and adjust parameters to suit. This should just be used to understand what parameters you need, not how to set them. This was using the Ansible uri module in 2.4 against Cloudforms 4.5.

RHEV providers are pretty simple:

- name: Create RHEV Provider
  uri:
    url: "https://{{ inventory_hostname }}/api/providers"
    method: POST
    user: "{{ vault_cfme_user }}"
    password: "{{ vault_cfme_password }}"
    body:
      type: "ManageIQ::Providers::Redhat::InfraManager"
      name: "{{ cloudforms.rhev_name }}"
      hostname: "{{ inventory_hostname }}"
      credentials:
        userid: "{{ vault_rhev_user }}"
        password: "{{ vault_rhev_password }}"
    status_code: 200
    body_format: json
    validate_certs: no

Satellite 6 too (but notice URL is different):

- name: Create Satellite Provider
  uri:
    url: "https://{{ inventory_hostname }}/api/providers?provider_class=provider"
    method: POST
    user: "{{ vault_cfme_user }}"
    password: "{{ vault_cfme_password }}"
    body:
      type: "ManageIQ::Providers::Foreman::Provider"
      name: "{{ cloudforms.satellite_name }}"
      url: "{{ inventory_hostname }}"
      credentials:
        userid: "{{ vault_satellite_user }}"
        password: "{{ vault_satellite_password }}"
    status_code: 200
    body_format: json
    validate_certs: no

OpenStack – note that you have to set BOTH security_protocol and verify_ssl here, at least if you are needing to set those. This would not be appropriate outside of dev/PoC yada-yada-yada:

- name: Create OpenStack Provider
  uri:
    url: "https://{{ inventory_hostname }}/api/providers"
    method: POST
    user: "{{ vault_cfme_user }}"
    password: "{{ vault_cfme_password }}"
    body:
      type: "ManageIQ::Providers::Openstack::CloudManager"
      verify_ssl: "false"
      security_protocol: "Non-SSL"
      name: "{{ cloudforms.openstack_name }}"
      hostname: "{{ inventory_hostname }}"
      credentials:
        userid: "{{ vault_openstack_user }}"
        password: "{{ vault_openstack_password }}"
    status_code: 200
    body_format: json
    validate_certs: no

Ansible Tower – pretty simple but again, note the specific “provider_class” URL:

- name: Create Ansible Tower Provider
  uri:
    url: "https://{{ inventory_hostname }}/api/providers?provider_class=provider"
    method: POST
    user: "{{ vault_cfme_user }}"
    password: "{{ vault_cfme_password }}"
    body:
      type: "ManageIQ::Providers::AnsibleTower::Provider"
      name: "Ansible Tower"
      url: "{{ inventory_hostname }}"
      credentials:
        userid: "{{ vault_tower_user }}"
        password: "{{ vault_tower_password }}"
    status_code: 200
    body_format: json
    validate_certs: no

Finally OpenShift, the most complex but not that much to it. You just need to note that here we pass an array to endpoint_configurations of both the OpenShift and Hawkular endpoints. Plus we are using a token here. And again, be sure to set both ssl options otherwise the provider is created but doesn’t work.

- name: Create OCP Provider
  uri:
    url: "https://{{ inventory_hostname }}/api/providers"
    method: POST
    user: "{{ vault_cfme_user }}"
    password: "{{ vault_cfme_password }}"
    body:
      type: "ManageIQ::Providers::Openshift::ContainerManager"
      name: "OpenShift"
      port: "8443"
      connection_configurations:
      - endpoint:
          role: "default"
          hostname: "{{ inventory_hostname }}"
          port: "8443"
          verify_ssl: "false"
          security_protocol: "ssl-without-validation"
        authentication:
          authtype: "bearer"
          auth_key: "{{ vault_ocp_token }}"
      - endpoint:
          role: "hawkular"
          hostname: "{{ cloudforms.hawkular_hostname }}"
          port: "443"
          verify_ssl: "false"
          security_protocol: "ssl-without-validation"
        authentication:
          authtype: "hawkular"
          auth_key: "{{ vault_ocp_token }}"
  status_code: 200
  body_format: json
  validate_certs: no

RDO and Clustering-as-a-Service

In a very old life, I maintained some packages for the Fedora project – an audio editor, a map rendering system, a lazy logger, that kind of thing.

Senlin is a clustering service for OpenStack. My employer has a vested interest in HPC clusters and I enjoy side projects which contribute upstream and allow users to consume tooling. Mostly my approach is:

  1. Produce ham-fisted hack which, at best, partially fixes the problem
  2. This annoys the developer enough to complete the patch
  3. Winning/PROFIT

Evidence of this is support in Elasticluster for Keystone v3

Anyway, I digress. RDO didn’t have Senlin packaged for easy consumption so I decided to apply my “skills” in order to fix that particular problem.

  1. I started by following the new package documentation
  2. Its the start of a long journey….
  3. The Senlin service review alone took 40 reviews. Yes, 40.
  4. The client was a bit better.

Naming is hard but some of the terminology and naming in RDO is a bit odd. DLRN, Weirdo. There is a steep on-ramp for a new contributor and it feels like a bit like a walled garden at times. But the folks in IRC are helpful and most of all, patient. Special thanks goes to Alfredo Moralej, Haïkel Guémar, Chandan Kumar and Javier Peña for their limitless patience in the face of blundering git commits.

The good news is, both service and client are now available in RDO in time for the Pike release. I’m hoping this will increase usage of Senlin in OpenStack as I think lack of packaging for what is often very good code is a barrier to overall adoption. There are plenty of options for spinning up clusters in OpenStack not limited to Senlin or Elasticluster but its good to be able to nudge these along where time allows.

Whats next? Knowing very little about writing puppet code, I’m next intending to write the puppet module for Senlin. Apparently it involves something called a cookie cutter.

We have some users who elected to deploy RDO and others using Red Hat’s OpenStack Platform for commercial support. I’m hopeful that by understanding the RDO process better, we can help support both sets of users equally. Most deployments we do necessitate the odd patch or two going upstream and hopefully this will be easier from here on in.

If you are intending to use Senlin please get in touch.

Scaling issues with the TripleO undercloud node

With large deployments, its important to make good architectural choices about nodes in your environment and during deployment, none is more critical than the provisioning node itself.

This is because in TripleO the undercloud node doesn’t just push out images to nodes but also orchestrates configuration of the entire cluster. In order to do this, it needs to run a bunch of OpenStack services like nova, neutron, ironic, heat, keystone, glance etc. It also runs two databases, a messaging bus and web server.

All of this means that the provisioning node needs to be quite a powerful piece of kit as the provisioning process involves lots of disk and network I/O amongst other things. So its important to specify a fast disk, plenty of memory, a goodly amount of cores and a quick nic.

But sometimes, even with all of the above, you need to tweak things because with the best will in the world, the undercloud installation and configuration applies “best guess” values when it comes to things like threads, processes, timeouts and retries.

Each service has a tonne of configurable options and its important to understand the implications of each one and the impact this will have in order to get the best performance out of the node. Its also important to understand what changes will help in response to any particular bottleneck.

Specifically, we found that tuning the process and thread count for WSGI processes and increasing haproxy maxconn values caused the node to handle load with greater efficiency. A patch has been merged to address this[1]. Red Hat produce a guide on tuning the undercloud (Director in their commercial parlance)[2].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1330980

[2] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/director_installation_and_usage/chap-troubleshooting_director_issues#sect-Tuning_the_Undercloud

So you need a management network quick-smart?

TripleO deployments can be deployed with an optional Management VLAN. You can use this to run ansible playbooks, monitoring systems and manage your cloud, hence the name.

However this requires configuration during deployment. So what happens if you have a cloud that doesn’t have a management vlan? You can use the provisioning network. But the problem is that this doesn’t have fixed addresses, only dynamic. However these rarely change so to perform a quick playbook run or a cluster-wide config with pdsh for example, you can use OpenStack’s cli to create a hosts file as follows:

openstack server list -f value --column Networks --column Name | sed 's/ ctlplane=/ /g' | awk '{ print $2 " " $1}'

This converts the output of your ironic nodes to a format you can cat into a hosts file.

This avoids having to add your management node to another network (e.g. storage) and use an existing network.

Its not big, its not clever but it does work.

Exporting Amazon EC2 instances into OpenStack

I had a requirement to get some workloads running on EC2 (which I’m a huge fan of, I just hate the vendor lock-in) imported into OpenStack.

Tools to help you get anything out of AWS are almost non-existent. I did try ec2-create-instance-export-task from AWS API tools but this has so many hurdles to jump through that it became slightly farcical. In the end it wouldn’t let me export the image because it wasn’t an imported image in the first place. Hmmm.

Despite what the general consensus online, this turns out to be fairly straightforward. The problem appears to come if you’ve used Amazon Linux AMI’s with their custom kernel. Thankfully, these were Ubuntu 16.04 images.

Step 1. Boot an instance from your AMI. Use SSD and a decent instance size if you’re feeling flush and in a hurry.

Step 2. Snapshot the instance and attach that snapshot to the running instance

Step 3. On your OpenStack environment, dd the attached disk, gzip and pipe over an ssh tunnel because, y’know Amazon egress charges. E.g.:

ssh -i chris.pem ubuntu@my.amazon.v4.ip “sudo dd if=/dev/xvdf | gzip -1 -” | dd of=image.gz

Step 4. Unzip the image, upload it to OpenStack and boot it.

Step 5 (For those with Amazon kernels). Fudge around replacing the Amazon kernel with something close to the same version. YMMV.

OpenStack Release Notes with Reno

I’m currently trying to get a patch submitted to the Puppet Keystone project which implements the ability to turn “chase referrals” on or off for deployments that use Active Directory.

One comment came back from the initial patch:

please add release note

Ok. So of course this being OpenStack it turns out to be complicated. You need to use “Reno”, a tool that has been used since Liberty (I think) to document changes to OpenStack. The HUGE irony is that the documentation for OpenStack’s documentation tool is sparse and pretty hopeless. It recommends running:

tox -e venv — reno new slug-goes-here

which gives the error: ERROR: unknown environment ‘venv’

Of course. Thankfully some kind soul in the Manila documentation project has added the missing clue for the clueless:

If reno is not installed globally on your system, you can use it from venv of your manila’s tox. Run:

source .tox/py27/bin/activate

py27 needed replacing with “releasenotes” for some obscure reason in the puppet-keystone directory but then it worked and I could finally run:

reno new implement-chase-referrals

and the release note was created.

Manually re-setting failed deployments with Ironic

OpenStack commands have some odd naming conventions sometimes – just take a look at the whole evacuate/host-evacuate debacle in nova for example – and ironic is no exception.

I’m currently using tripleo to deploy various environments which sometimes results in failed deployments. If you take into account all the vagaries of various ipmi implementations I think it does a pretty good job. Sometimes though, when a stack gets deleted, I’m left with something like the following:

[stack@undercloud ~]$ nova list
+—-+——+——–+————+————-+———-+
| ID | Name | Status | Task State | Power State | Networks |
+—-+——+——–+————+————-+———-+
+—-+——+——–+————+————-+———-+

[stack@undercloud ~]$ ironic node-list
+————————————–+————-+————————————–+————-+——————–+————-+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+————————————–+————-+————————————–+————-+——————–+————-+

| 447ffea5-ae3f-4796-bfba-ce44dd8a84b7 | compute4 | 26843ce8-e562-4945-ad32-b60504a5bca3 | power on | deploy failed | False |

So an instance is still associated with the baremetal node.

In this case, it isn’t obvious but after some digging:

ironic node-set-provision-state compute4 deleted

should result in the node being set back to available. I’m still not clear if this re-runs the clean steps but it gives me what I want to re-run deployment.