Update to the latest version. Ansible 2.0 is slower than Ansible 1.9 because it included an important change to the execution engine to allow any user to choose the execution algorithm to be used. In the versions that followed, and mostly in 2.1, big optimizations have been done to increase execution speed, so be sure to be running the latest possible version.
The best way I’ve found to time the execution of Ansible playbooks is by enabling the profile_tasks
callback. This callback is included with Ansible and all you need to do to enable it is add callback_whitelist = profile_tasks to the [defaults] section of your ansible.cfg:
callback_whitelist = profile_tasks
pipelining = True
You’ll also need to make sure that requiretty is disabled in /etc/sudoers on the remote host, or become won’t work with pipelining enabled.
Enable Mitogen for Ansible
Enabling Mitogen for Ansible is as simple as downloading
and extracting the plugin, then adding 2 lines to the [defaults] section of your ansible.cfg:
strategy_plugins = /path/to/mitogen-0.2.5/ansible_mitogen/plugins/strategy
strategy = mitogen_linear
The first thing to check is whether SSH multiplexing is enabled and used. This gives a tremendous speed boost because Ansible can reuse opened SSH sessions instead of negotiating new one (actually more than one) for every task. Ansible has this setting turned on by default. It can be set in configuration file as follows:
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
But be careful to override ssh_args — if you don’t set ControlMaster and ControlPersist while overriding, Ansible will “forget” to use them.
To check whether SSH multiplexing is used, start Ansible with -vvvv option:
ansible test -vvvv -m ping
UseDNS is an SSH-server setting (/etc/ssh/sshd_config file) which forces a server to check a client’s PTR-record upon connection. It may cause connection delays especially with slow DNS servers on the server side. In modern Linux distribution, this setting is turned off by default, which is correct.
It is an SSH-client setting which informs server about preferred authentication methods. By default Ansible uses:
So if GSSAPI Authentication is enabled on the server (at the time of writing this it is turned on in RHEL EC2 AMI) it will be tried as the first option, forcing the client and server to make PTR-record lookups. But in most cases, we want to use only public key auth. We can force Ansible to do so by changing ansible.cfg:
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o PreferredAuthentications=publickey
At the start of playbook execution, Ansible collects facts about remote system (this is default behaviour for ansible-playbook but not relevant to ansible ad-hoc commands). It is similar to calling “setup” module thus requires another ssh communication step. If you don’t need any facts in your playbook (e.g. our test playbook) you can disable fact gathering:
Until this moment we discussed how to speed up playbook execution on a given remote host. But if you run playbook against tens or hundreds of hosts, Ansible internal performance becomes a bottleneck. For example, there’s preconfigured number of forks – number of hosts that can be interacted simultaneously. You can change this value in ansible.cfg file:
The default value is 5, which is quite conservative. You can experiment with this setting depending on your local CPU and network bandwidth resources.
Another thing about forks is that if you have a lot of servers to work with and a low number of available forks, your master ssh-sessions may expire between tasks. Ansible uses linear strategy by default, which executes one task for every host and then proceeds to the next task. This way if time between task execution on the first server and on the last one is greater than ControlPersist then master socket will expire by the time Ansible starts execution of the following task on the first server, thus new ssh connection will be required.
When module is executed on remote host, Ansible starts to poll for its result. The lower is interval between poll attempts, the higher is CPU load on Ansible control host. But we want to have CPU available for greater forks number (see above). You can tweak poll interval in ansible.cfg:
internal_poll_interval = 0.001
If you run “slow” jobs (like backups) on multiple hosts, you may want to increase the interval to 0.05 to use less CPU.
Hope this helps you to speed up your setup. Seems like there are no more items in environment check-list and further speed gains only possible by optimizing your playbook code.
Asynchronous Actions and Polling
By default tasks in playbooks block, meaning the connections stay open until the task is done on each node. This may not always be desirable, or you may be running operations that take longer than the SSH timeout.
To avoid blocking or timeout issues, you can use asynchronous mode to run all of your tasks at once and then poll until they are done.
The behaviour of asynchronous mode depends on the value of poll.
Avoid connection timeouts: poll > 0
When poll is a positive value, the playbook will still block on the task until it either completes, fails or times out.
In this case, however, async explicitly sets the timeout you wish to apply to this task rather than being limited by the connection method timeout.
To launch a task asynchronously, specify its maximum runtime and how frequently you would like to poll for status. The default poll value is 15 seconds if you do not specify a value for poll:
- hosts: all
-name:simulate long running op (15 sec), wait for up to 45 sec, poll every 5 sec
Concurrent tasks: poll = 0
When poll is 0, Ansible will start the task and immediately move on to the next one without waiting for a result.
From the point of view of sequencing this is asynchronous programming: tasks may now run concurrently.
The playbook run will end without checking back on async tasks.
The async tasks will run until they either complete, fail or timeout according to their async value.
If you need a synchronization point with a task, register it to obtain its job ID and use the async_status
module to observe it.
You may run a task asynchronously by specifying a poll value of 0:
- hosts: all
-name:simulate long running op, allow to run for 45 sec, fire and forget
By enabling this value we’re telling Ansible to keep the facts it gathers in a local file. You can also set this to a redis cache. See the documentation for details.
Fact_caching is what happens when Ansible says, “Gathering facts” about your target hosts. If we don’t change our targets hardware (or virtual hardware) very often this can be very helpful. Enable it by adding this to your ansible.cfg file:
Enable facts caching mechanism
If you still need some of the facts groups, but at the same time the gathering process is still slow for you, you could try use fact caching.
Caching enables Ansible to cache the facts for a given host in some kind of backend.
Currently the caching plugin supports the following cache backend:
More information on the caching plugin, could be found here:
This is an example configuration of facts caching in json files
gathering = smart
fact_caching_connection = /tmp/facts_cache
fact_caching = jsonfile
# The timeout is defined in seconds
# This is 2 hours
fact_caching_timeout = 7200