Process Management

Read the Policy First!

Users who plan to run background processes should read the Acceptable Use Policy of the CLAS Linux Group. In particular, please note these specific policies:

Run background jobs on the machines in the bigjobs alias.
Do not run more than two concurrent background jobs on the entire managed network.
Do not run multiple jobs on the same machine. Instead, start the two jobs on two different machines.
Run background jobs at a low priority - a nice level of 10 or higher. Jobs that are not run at a low priority will be killed.
Do not run any server programs without first getting approval from the CLAS Linux Group staff.
Do not run non-class-related, resource-intensive programs during a term. Permission to run these programs during breaks may be granted on a case by-case basis. Open a ticket in the Help Request Trouble Ticket System to request permission.

Avoid Common Mistakes

Before going any further, review the job you want to run, and make sure it adheres to these guidelines:

Do NOT use your home directory for files that the job accesses frequently.
Use /var/tmp for files that the job frequently reads or writes.
Be aware that /var/tmp is not backed up, so, periodically copy any output you want to save from /var/tmp into your home directory. Additionally, the data in /var/tmp is deleted if it is unused for more than 30 days or if the machine is reinstalled.
Limit memory usage. Configure the job so that it uses the least amount of memory possible.

Using bigjobs Machines

For information about finding, connecting to and using the bigjobs machines, see bigjobs.

Launching Background Processes

To launch a background process, append an ampersand - the & character - to your command line. You probably will want to redirect your process's output to a file, so that the output from the background process does not clutter your command window. So, your command line should look something like this:

In the POSIX, Bourne or Bash shell:

./mycommand > myoutput 2>&1 &

In this example, the > redirects the output of mycommand to the file myoutput. The 2>&1 redirects STDERR into STDOUT, so that all output, including error messages, is redirected into myoutput. The ampersand, &, makes the command run in the background.

In the C or tcsh shell, the command will be somewhat different:

./mycommand >& myoutput &

You should become familiar with shell redirection for STDIN, STDOUT, and STDERR (0, 1, 2) and the shell file descriptors. See the REDIRECTION section of the sh man page (man sh). Shell redirection is more flexible under the sh or Bash shell than the C or tcsh shell.

Using nice to Run Politely

The nice value of a process determines the priority at which the process runs. The higher the nice value, the lower the priority of the process. If a process has a low nice value, and thus a high priority, it can monopolize the resources of the workstation on which it runs. Users who are logged into the workstation may notice that the machine is responding very slowly to their commands. Thus, any background process you start should be run at a high nice value.

NOTE: All long-running background jobs must be run at a nice value of 10 or higher or they will be killed by the CLAS Linux Group staff!!!

When a process starts, it inherits the nice value of the process that spawned it. When you start a process from the command line, it inherits the nice value of your shell. Most likely, this is the default nice value of 0 on Linux workstations.

If you are running a background job, you must start the background process with a higher nice value than your shell by using the /bin/nice command on Linux machines. For example, to run the command mycommand in the background using the POSIX, Bourne or Bash shell, on Linux machines you would type:

/bin/nice -n 10 ./mycommand > myoutput 2>&1 &

To use the C or tcsh shell, you would type:

nice +10 ./mycommand >& myoutput &

This starts a background process with a nice value of 10 on Linux machines.

IMPORTANT NOTE: Use the full path for the nice command - /bin/nice on the Linux machines. If you don't use the full path, you probably will be executing your shell's nice command, which may be buggy.

Do man nice to learn more about the nice command.

Launching Persistent Processes

Background processes die when the shell that launched them terminates--when you logout, for example. To keep the process running after you logout, you need to use the nohup command. In the POSIX, Bourne or Bash shell, type:

/usr/bin/nohup /bin/nice ./mycommand > myoutput 2>&1 &

In the C or tcsh shell, type:

/usr/bin/nohup /bin/nice ./mycommand >& myoutput &

For more information about the nohup command, do man nohup.

Finding Processes

To view all the processes running on the workstation you are currently logged into, use the ps command with the arguments -ef:

ps -ef

The output from ps -ef is displayed in columns. On RedHat Linux workstations, the columns have these headings:

UID PID PPID C STIME TTY TIME CMD

The interesting columns are the ones labeled UID, PID, and CMD or COMMAND. The UID entry contains the username of the account that owns the process. The PID field is the process ID of the process. The COMMAND or CMD entry is the name of the command that spawned the process.

The output of ps -ef can be huge. You can pipe the output through more so that you can page through it:

ps -ef | more

To view only the processes that you own, you can pipe the output through grep:

ps -ef | grep username

where username is the username of your account.

You can combine more and grep so that you can page through your processes:

ps -ef | grep username | more

(You can use the man command to learn more about ps, more, and grep. Do man ps, man more, and man grep.)

Examining Processes

The output of ps -ef contains other information that can tell you how your processes are behaving. The STIME field displays the time the process started. If it started more than 24 hours ago, this will be the starting date. Using the STIME field, you can tell how long the process has been running.

The TIME field shows the cumulative execution time for the process. This shows how much time the process has actually spent working. It indicates how much the process is using the workstation's processor.

To learn more about your processes, you will need to use the -efl command-line argument with ps:

ps -efl

Like ps -ef, you may pipe the output from ps -efl through grep and/or more.

The output of ps -efl is displayed in columns. On Linux workstations, the columns are labeled:

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

The output of ps -efl is more than 80 columns wide, so you may want to widen your window to view it comfortably.

The output of ps -efl has all the columns of ps -ef, plus a few others. The other columns of interest are NI and SZ. The NI field shows the nice value of the process. The nice value determines the priority of the process. The higher the value, the lower the priority - the "nicer" the process is to other processes. The default nice value is 0 on Linux workstations.

The SZ field displays the size of the process in memory. The value of the field is the number of pages the process is occupying. On Linux machines, a page is 4,096 bytes.

Resetting the nice Value

You may want to reset the nice value of a process that is already running. You can do that with the renice command. First, you need to find the process ID of the process. Do a ps -ef or ps -efl command and find the number in the process's PID field. That is the process ID for the process. Then run renice. On the Linux machines, you must supply the nice level. For example, this command will set the nice value of the process with the process ID of process_id to 19:

renice 19 -p process_id

(Do man renice to learn more about renice.)

Stopping a Process

To stop a process, you must use the kill command. The kill command sends a signal to a process. To use kill, you first have to discover the process ID of the process. Do a ps -ef or ps -efl command and find the number in the process's PID field. That is the process ID for the process. Then run kill:

kill process_id

where process_id is the process ID of the process.

Use ps -ef or ps -efl to check that the process really has stopped. If it hasn't, then use the -9 option with kill:

kill -9 process_id

(Do man kill to learn more about kill.)