Dynamic Inventories in Ansible

introduction

If you've been following along at all, you've probably noticed that TigerIQ is a big fan of ansible and Ansible Tower. Recently I encountered a little problem. We were dealing with an environment with both Windows and Linux machines. With the CloudForms integration with Tower, once you configure the provider, you get a nice little dynamic inventory set up for you. The issue we were facing is that the default method of accessing those machines is SSH, and we needed to set the Windows machines to use WinRM.

what we're after

We want Windows boxes to use WinRM and Linux boxes to use the default SSH, and we have one dynamic inventory. We could, of course, set that up at the host level, but we want a way to automate the full process of deploying either a Linux or Windows machine and including Tower actions as a part of the process, without changing any variables manually on the Tower side.

the code

So it turns out, the way Tower handles this is pretty awesome. Not a big surprise if you've played with Ansible in the past. As I mentioned above, the CloudForms integration with Ansible gets you a dynamic inventory for free. You can see this by looking under the provider in the configuration menu. It should look like this:
inventory

This matches what we would find in Tower:

ansible_tower_inventories But if you look closely at one of the hosts, you'll see something else. Here is an example of what you might find under "variables" for any given host. This is what CloudForms passes to Tower, and this is what Tower uses to add the host to the dynamic inventory:

ansible_ssh_host: XX.XXX.XX.XX  
cloudforms:  
  ansible_ssh_host: XX.XXX.XX.XX
  boot_time: "2017-02-01T11:42:23Z"
  cloud: false
  connection_state: connected
  cpu_limit: -1
  cpu_reserve: 0
  cpu_reserve_expand: false
  cpu_shares: 2000
  cpu_shares_level: normal
  created_on: "2017-02-01T11:42:12Z"
  ems_cluster_id: 1000000000002
  ems_id: 1000000000001
  ems_ref: vm-124
  ems_ref_obj: vm-124
  evm_owner_id: 1000000000001
  fault_tolerance: false
  guid: 78644a08-e873-11e6-a817-0050569bf95e
  host_id: 1000000000005
  href: "https://my-awesome-cfme/api/vms/1000000000025"
  id: 1000000000025
  ipaddresses:
    - XX.XXX.XX.XX
  last_perf_capture_on: "2017-02-02T08:19:20Z"
  last_scan_attempt_on: "2017-02-01T11:42:22Z"
  last_scan_on: "2017-02-01T11:42:51Z"
  last_sync_on: "2017-02-01T11:42:51Z"
  linked_clone: true
  location: ansible-test/ansible-test.vmx
  memory_limit: -1
  memory_reserve: 0
  memory_reserve_expand: false
  memory_shares: 40960
  memory_shares_level: normal
  miq_group_id: 1000000000002
  name: ansible-test
  power_state: "on"
  previous_state: poweredOff
  raw_power_state: poweredOn
  standby_action: checkpoint
  state_changed_on: "2017-02-01T11:42:23Z"
  storage_id: 1000000000003
  tags:
    - href: "https://my-awesome-cfme/api/vms/1000000000025/tags/1000000000132"
      id: 1000000000132
      name: "/managed/folder_path_blue/datacenters:dc01:vm"
    - href: "https://my-awesome-cfme/api/vms/1000000000025/tags/1000000000131"
      id: 1000000000131
      name: /managed/folder_path_yellow/datacenters
  template: false
  tenant_id: 1000000000001
  tools_status: toolsOk
  type: "ManageIQ::Providers::Vmware::InfraManager::Vm"
  uid_ems: 421bb126-13b6-0369-13c9-fc1c5ba51110
  updated_on: "2017-02-02T08:19:36Z"
  vendor: vmware

Pay close attention to this bit:

tags:  
    - href: "https://my-awesome-cfme/api/vms/1000000000025/tags/1000000000132"
      id: 1000000000132
      name: "/managed/folder_path_blue/datacenters:dc01:vm"
    - href: "https://my-awesome-cfme/api/vms/1000000000025/tags/1000000000131"
      id: 1000000000131
      name: /managed/folder_path_yellow/datacenters

So now we know that CloudForms passes tags to Tower. But here's the really cool part. Tower automatically compartmentalizes the dynamic inventory into groups based on these tags. So any tag that Ansible picks up in the variables will create a new group in the dynamic inventory. You can view and create groups in the Tower UI:
groups This is great news, because if you didn't know, we can override variables at the group level; variables like the ansible_connection, which determines whether to use SSH or WinRM to connect to the target host.

So now we have a game plan. We need to:

  • apply a tag to every Windows host that gets created
  • make sure that tag creates a new group in the dynamic inventory
  • override the ansible_connection parameter at that group's level

First let's set up the tag. You can name this anything you want of course, but in this example we're going to create a tag category called operating_system and a tag called windows. To do this, navigate to the top right Configuration menu and then select the Region at the top of the accordion on the left. Then select [My Company] Categories. Click on the top row to create a category, name it operating_system, give it a description of Operating System, and click add. You should see something like this:

category

Now move one tab over to [My Company] Tags. Select your fancy new tag category from the drop down, click New Entry, and name the tag windows with a description of Windows Service. Click Add. It should look like this:

fancy new tag

Now we need to make sure that any Windows VM that gets provisioned by CloudForms gets this tag applied to it during the process, before Ansible Tower picks it up in the dynamic inventory. We want this to apply to VMs that are provisioned via the Lifecycle menu, and anything provisioned through a Service Catalog, so first we need to copy the "Provision VM from Template" instance and the Methods class from the ManageIQ domain to our own, custom, domain. The instance is found here:
Infrastructure/VM/Provisioning/StateMachines/VMProvision_VM/Provision VM from Template (template)

and the class is found here:
Infrastructure/VM/Provisioning/StateMachines/Methods

Highlight these, one at a time, and select Configuration/Copy this [Instance|Class]. Make sure you leave the "Copy to same path" box checked.

Under the Methods class we need an instance and a method. Highlight it and select "Add a New Instance". Name it "AddTags". We inherited the schema from the Methods class we copied, so now we need to edit the common_meth1 entry. Enter "add_tags" for the value. It should look like this:

AddTags

Click "Save".

Now we need a method. Highlight the Methods class again and on the right, click the second tab, Methods, and select Configuration/Add a New Method. Name the new method "add_tags". In the Data field, we need to put the actual code. It should look like this:

#
# Add a tag to create a new group in Ansible that enables winrm as the default ansible_connection
#

def add_tags_for_ansible_groups(vm) unless vm.nil?  
  vm.tag_assign("operating_system/windows") if vm.platform == "windows"
end

begin

vm = $evm.root['miq_provision'].vm  
add_tags_for_ansible_groups(vm)

end  

The whole thing should look like this:
add_tags Validate and Click "Save".

It's probably worth mentioning, that if you don't fully understand the line:
vm = $evm.root['miq_provision'].vm it would be worth your time to read about the lifecycle of the Request and Task objects during the automate process. These concepts are explained in detail in Peter McGowan's indispensable book.

Now we need to point to our new instance in our state machine. Edit the schema of the "Provision VM from Template" instance and add a new state, "AddTags". We can either set the value as a default value in the schema or by editing the instance itself. We're going to do the former, so in the value, enter:
/Infrastructure/VM/Provisioning/StateMachines/Methods/AddTags

Save this, and then edit the Schema sequence and put "AddTags" right between the "CheckProvisioned" and "PostProvision" states.The end result should look like this:
state_machine_schema

Now we should make sure it works, we'll do that by provisioning a Windows VM. Once that completes, we can navigate to the VM information page and if all went as planned we should see our new tag referenced in the Smart Management section in the bottom right of the page. Something like this:
we_did_it

Huzzah.

The CloudForms/ManageIQ portion of our journey is complete. On to Ansible Tower.

If we go back to our groups inside the Dynamic Inventory, we see a new one, created automatically based on our new tag, _managed_operating_system_windows:
new_group If we edit that new group, we see that we can set variables at the group level:
group_variables There are a few things going on here but the important line is line 2:
ansible_connection: winrm Now the default connection for everything in this group will be WinRM instead of SSH. So if we have a Job Template in Tower that runs a playbook that installs an Internet Information Services (IIS) server, and we associate that job with a service bundle in CloudForms that provisions a Windows server, everything should work without any issue right?
IIS

some other things

WinRM can be a little tricky, you might need to do some troubleshooting. If you get timeout or other errors while testing this, the winping module in Tower is really useful. WinRM is also notoriously terse in its error messages, so it's a good idea to bump up the verbosity. If you edit the Job Template in Tower, you can change the verbosity to 5(WinRM Debug). This should give you some better clues. Keep in mind that the issue is very likely a problem with the target host, like not having "basic auth" enabled or something similar. Troubleshooting Windows boxes is beyond the scope of this post, but there are tons of WinRM troubleshooting tips online. This blog, however, was particularly useful.