cluster

Parallel command execution – Linux Cluster

The pdsh parallel shell tool allows you and lets you run a shell command across multiple nodes in a cluster.

This is a high performance, parallel pdsh shell remote shell utility for admins. Chaos Pdsh is a multithreaded remote shell client which executes commands on multiple remote hosts in parallel.  A parallel shell permits your clusters Linux Ubuntu RedHat to run the same similar command on many designated hosts or nodes within the hadoop cluster. In this case you do not have to really log in to each node individually.

High-performance and parallel remote shell utility with dshgroup module allows dsh on pdsh (or otherwise known as Dancer’s shell sudo) files from /etc/dsh/group directory. Now download Parallel Distributed Shell free of charge.

What is pdsh?

pdsh is a variant of the rsh(1) command. Unlike rsh(1), which runs commands on a single remote host, pdsh can run multiple remote commands in parallel. pdsh uses a “sliding window” (or fanout) of threads to conserve resources on the initiating host while allowing some connections to time out.

When pdsh receives SIGINT (ctrl-C), it lists the status of current threads. A second SIGINT within one second terminates the program. Pending threads may be canceled by issuing ctrl-Z within one second of ctrl-C. Pending threads are those that have not yet been initiated, or are still in the process of connecting to the remote host.

If a remote command is not specified on the command line, pdsh runs interactively, prompting for commands and executing them when terminated with a carriage return. In interactive mode, target nodes that time out on the first command are not contacted for subsequent commands, and commands prefixed with an exclamation point will be executed on the local system.

The core functionality of pdsh may be supplemented by dynamically loadable modules. The modules may provide a new connection protocol (replacing the standard rcmd(3) protocol used by rsh(1)), filtering options (e.g. removing hosts that are “down” from the target list), and/or host selection options (e.g., -a selects all hosts from a configuration file.). By default, pdsh must have at least one “rcmd” module loaded. See the RCMD MODULES section for more information.

Installing pdsh

Debian based:

apt install pdsh

RHEL based:

yum install pdsh

Running

The following command installs telegraf on all 4 nodes in cluster02

Running multiple commands

Pipe redirection

 

Example

 

When using ssh for remote execution, expect the stderr of ssh to be folded in with that of the remote command. When invoked by pdsh, it is not possible for ssh to prompt for passwords if RSA/DSA keys are configured properly, etc.. For ssh implementations that suppport a connect timeout option, pdsh attempts to use that option to enforce the timeout (e.g. -oConnectTimeout=T for OpenSSH), otherwise connect timeouts are not supported when using ssh. Finally, there is no reliable way for pdsh to ensure that remote commands are actually terminated when using a command timeout. Thus if -u is used with ssh commands may be left running on remote hosts even after timeout has killed local ssh processes.

Output from multiple processes per node may be interspersed when using qshell or mqshell rcmd modules.

The number of nodes that pdsh can simultaneously execute remote jobs on is limited by the maximum number of threads that can be created concurrently, as well as the availability of reserved ports in the rsh and qshell rcmd modules. On systems that implement Posix threads, the limit is typically defined by the constant PTHREADS_THREADS_MAX.

Leave a Reply

Your email address will not be published. Required fields are marked *