Queuing Network Models

Queuing network models are used to analyze performance of computer systems. Here are some examples to introduce queuing (or queueing) networks.

EXAMPLE 1

OTP, Inc. is producing a simple online transaction processing system with two disk files. Fixed-length account records are read and updated using a computed offset into the account file, and a transaction log is written by appending to the log file. The marketing and sales departments want a system that supports 80 user terminals and allows operators to complete transactions in average time of 5 seconds or less.

Research shows an average transaction takes about 100 milliseconds (ms.) of MYTHICAL CPU time (Although MYTHICAL can multitask copies of the same program, the first example uses one program). Three disk I/O's average 50 ms. each. Operators are able to key and send transactions in 5 seconds.

Factors such as I/O buffering, data transfer rates, and datacomm line speed are ignored, to keep this simple.

                     CURRENT OPERATION      THINK TIME   CPU TIME   I/O TIME

                    Operator think and send   5,000          0         0
                    CPU process message           0         35         0
                    Read account                  0          0        50
                    Process transaction           0         30         0
                    Write log to disk             0          0        50
                    Complete processing           0         10         0
                    Rewrite account to disk       0          0        50
                    Send response to operator     0         25         0
                                              _____       ____      ____
                    TOTALS:                   5,000 ms.    100 ms.   150 ms.

Some questions about the "one operator" scenario:

How long to process one transaction?
_ Operator think time 5,000 ms. + CPU 100 ms. + disk I/O 150 ms. = 5.25 seconds.
How many transactions per minute?
_ 60 seconds / 5.25 = 11.4 transactions / minute
What do the CPU and disk do while the operator thinks and keys?
_ They are idle, available for other processes.

EXAMPLE 2

A queuing network is a system of one or more service queues, so let's look at a queue.

After observing a service queue for an amount of time (T), a number of jobs arrive (A), the servicing device is busy for some time (B), and a number of jobs are completed (C), and leave.

                                        *********
                              ---------*         *
                               Waiting *   Job   *
                   Job ---->   queue   * service *  ----> Job Completions
                Arrivals               *  device *              (C)
                   (A)        ---------*         *
                                        *********

                      T = Observation Time      B = Device Busy time

Formulas for Arrival rate, Completion rate, Utilization, and Mean Service time, Table assumes Arrivals equal Completions: Basic formulas:

                Arrival rate:       a = A/T, Arrivals / Observation Time
                Completion rate:    X = C/T, Completions / Observation Time
                Utilization:        U = B/T, Busy Time / Observation Time
                Mean Service time:  S = B/C, Busy Time / Completions

Apply formulas to Example 1, as a system:

                T = 1 minute, B = 1 minute, A = C = 11.4 per minute

                Arrival rate:       a = 11.4/1, 11.4 per minute
                Completion rate:    X = 11.4/1, 11.4 per minute
                Utilization:        U = 1/1, 100 percent
                Mean Service time:  S = 1/11.4 = .0877 minute = 5.263 seconds

The last three formulas are equivalent to the following useful formula for "peak rate":

                System U = X * S
                (Because U = B/T = C/T * B/C)

For peak theoretical transaction rate, set Utilization to 1 (100%), and plug in the mean service time:

                1 = X * 11.4
                X = 1 / 11.4 = peak rate .0877 per minute (1 every 5.263 seconds)
                So, U = 11.4 * .0877, or ~ 100 percent

                Note: The Operator, CPU and Disk are also queues. So, Utilization
                for CPU, Disk and Operator fit to U = Busy Time / Observation Time:

                U of CPU = .0100 / 1 = 1 percent
                U of Disk = .0150 / 1 = 1.5 percent
                U Operator = .95 / 1 = 95 percent

EXAMPLE 3

Suppose we add more terminal operators, to capacity. Given enough operators, think time is less relevant.

How long to process one transaction?
_ CPU 100 ms. + disk I/O 150 ms. = 250 ms. to process one transaction.
How many transactions per minute?
_ 60 seconds / 0.250 seconds = 240 transactions / minute
What does the CPU do while the disk is busy, and vice versa?
_ There is some overlap, but CPU and disk spend time waiting for queued activity to clear.
What amounts of CPU and disk are utilized?
_ CPU 100 ms. out of 250 ms. = 0.4 or 40 percent use. Disk I/O 150 ms. of 250 ms. = 0.6 or 60 percent.

Since the system is using only 40 percent of the CPU and 60 percent of the disk, we should be able to process more transactions per minute by running two or more copies of the program. Running multiple program copies may require extra effort recovering from abort conditions, which many database systems handle.

Running multiple program copies should boost the disk usage closer to 100%, increasing throughput. Then, adding another disk can spread the I/O and help with the disk bottleneck, again, increasing throughput. With the data already gathered, queuing network models can analyze the results of adding another program copy, and adding another disk.

               SUMMARY OF DATA FOR MODELING ONE DISK WITH MULTIPLE PROGRAM COPIES

                         NUMBER OF         ROUTING PROBABILITY          MEAN SERVICE
            DEVICE        VISITS     TO CPU, TO DISK, LEAVE SYSTEM     TIME,  SECONDS
            _________________________________________________________________________
         
             CPU            4          0       0.75      .25               0.025
             DISK           3          1          0        0               0.050

The following model takes into account routing probabilities.

                               ONE DISK WITH MULTIPLE PROGRAM COPIES

      Transactions        *********                                     Transactions
           in       -----*         *       25 percent of CPU vists          out
      ------------>      *         * ----------------->---------------------------->
                         *   CPU   *
              ---->      *         * ----. 75 percent of   Service time
             /      -----*         *      \  CPU visits     .050 Sec.
             |            *********        \               *********
             ^           Mean service       \        -----*         *
             |          time .025 sec.       \            *         *
             |                                `----->     *   DISK  * ---->.
             |                                            *         *      |
             ^                                       -----*         *      |
             |                                             *********       |
             |               100 percent of disk visits                    V
              `<-------------------------<------------------------<--------

A transaction requires 4 CPU visits and 3 disk visits. Number of visits times mean service shows the disk is used more, and is a bottleneck:

                  CPU transaction time: 4 jobs * 0.025 = 0.1
                  Disk transaction time: 3 jobs * 0.050 = 0.150

Use of a device (U) = Completion rate (X) * Mean service time (S). Try setting disk use (U) to 100%:

                      1 (U) = I/O completion rate (X) * 0.050 seconds (S)
                  I/O completion rate - 1 / 0.050 seconds = 20 per second

Twenty per second is the peak disk job completion rate (not the system transaction rate). With the disk use (U) set to 100%, let's work back to the CPU utilization. Noting visits to disk must equal 75% of visits to CPU:

                        20 disk jobs/seconds = 0.75 * CPU jobs/second
             CPU jobs/seconds = 20 jobs/second divided by 0.75 = 26.6 jobs /second

                    CPU use = CPU completion rate * CPU mean service time
                                       (U = X * S)

                   CPU use = 26.6/second * 0.025 seconds = 0.66 = 66 percent

Jobs leaving the system must equal 25 percent of jobs leaving the CPU:

                       System jobs/second = 0.025 * CPU jobs/second
             System transaction rate = 0.25 * 26.6/second = 6.6 transactions/second

EXAMPLE 4

Earlier, a model was shown where one disk, with one program copy, used 40% of CPU and 60% of disk, supporting up to 21 operators. With one disk and multiple program copies, disk use gets close to 100% and CPU use increases to around 66 percent. How many terminals will the multiple program scenario support? A closed-loop terminal capacity model is shown next:

                                    TERMINAL CAPACITY MODEL

                              Terminal one *******
                             .<------------*     *<------------. 
                            /              *******              \
                           /                                     \
                          /                *******                \
                         /<----------------*     *<----------------\
                 .<-----<                  *******                  |<----. 
                 |       \                    .                    /      |
                 |        \            THE    .  OPERATORS        /       |
                 |         \                  .                  /        |
                 |          \  Terminal N  *******              /         |
                 |           .<------------*     *<------------'          |
                 |                         *******                        |
                 |                      N = number of terminals,          |
                 |                          to be determined              |
                 |                                                        |
                 |                        *********                       |
                 |                  -----*         *                      |
                 `---------------->      * System  * -------------------->'
                                    -----*         *
                                          *********

N = Number of terminals to be determined
R = Mean system response time, estimated five seconds
Z = Mean think time
X = System throughput rate, 6.6 transactions/second, one disk, multi-copy

Assuming job arrivals equal job completions, response time R = (N / X) - Z. Also, number of terminals N = (R + Z) * X, and X = N / (R + Z).

               Terminal capacity, N = (R + Z) * X
                                    = (5 + 5) * 6.6, or 10 * 6.6 = 66 terminals

EXAMPLE 5

The 66 terminal limit does not meet goals, and the system is limited by maximum use of the single disk drive. Adding a disk drive should increase system throughput. The account file is placed on Disk A, the log file on Disk B. Visit ratios are recomputed for two disks. From CPU, two of four visits go to disk A, one of four to Disk B, and one leaves the system.

                                          TWO DISK MODEL

                                                                     Transactions
      Transactions        *********     25 percent of CPU vists          out
           in       -----*         * ----------------->----------------------------->
      ------------>      *         *
                         *   CPU   * ------.
              .--->      *         *        \  50 percent of
             /      -----*         * --- .   \   CPU visits
             |            *********        \  \                  *********
             ^           Mean service       \  \           -----*         *
             |          time .025 sec.       \  \               *  DISK   *
             |                                \  `------->      *    A    * ---->.
             |                                 \                *  accts. *      |
             ^                                  \          -----*         *      |
             |                                   \               *********       |
             |                                    \                              |
             |                    25 percent of    \             *********       |
             |                      CPU visits      \      -----*         *      |
             |                                       \          *  DISK   *      |
             |                                        `--->     *    B    *      |
             |                                                  *   log   *      |
             |                                             -----*         *      |
             |                                                   *********       |
             |                                                                   |
             |               100 percent of disk visits                          V
              `<-------------------------<------------------------<--------------'

Disk A would appear to be the bottleneck here. Working back from disk A try setting its' utilization to 100 percent and apply formulas, taking into account mean service time at disk A:

                              1 = X * 0.050, X = 20 per second at A
                                           (U = X * S)

Arrivals equaling completions, disk A throughput must equal 50 percent of CPU throughput:

                               20/second = 0.50 * CPU throughput
                 CPU throughput = (20/second) / 0.050 - 40 per second at CPU
                 Disk A use = (40/second * 0.025) - 1, max 100%, theoretically

System throughput must equal 25 percent of CPU throughput:

                  System throughput = (0.25 * 40) / second = 10 transactions/second

Next, determine the terminal capacity of the two-disk system:
N = Number of terminals to be determined
R = Mean system response time, estimated five seconds
Z = Mean think time
X = System throughput rate, 6.6 transactions/second, one disk, multi-copy

                      N = (R + Z) * X
                      N = (5 + 5) * 10, = 10 * 10 = 100 terminals


                                  SUMMARY OF RESULTS

                                            TRANSACTION      UTILIZATION    TERMINALS
      CONFIGURATION (All single CPU)         RATE LIMIT      CPU    DISK    SUPPORTED

  One program, one operator. one disk       0.19/second      0.01  0.015         1
                                            11.4/minute

  One program, multi-operator, one disk       4/second =     0.4     0.6        40
                                            240/minute

  Multi-program, multi-operator, one disk   6.6/second =     0.66    1.0        66
                                            396/minute

  Multi-program, multi-operator, two disks   10/second =     1.0   A 1.0       100
                                            600/minute             B 0.5

EXERCISE

The new LEGEND CPU is twice as fast as the MYTHICAL. Disks for the LEGEND are improved to provide 40 ms. mean access time. Three disks are used as follows: Log file assigned to one disk, even numbered accounts are on a second disk, odd numbered accounts are on a third disk. Develop models to predict transaction rate, utilizations, and terminals supported.

CAUTIONS AND NOTES

The material presented is simple and theoretical. Datacomm line times, varieties of file organization, disk transfer rates, and other factors were ignored. The models can help predict theoretical limits, that can't be met, but which can't be exceeded. For example, the models given show hypothetical 100% use of devices, which does not happen in real life.

The formulas require determining "correct comparable units of measurement"; Numbers must be converted to comparable units. The models generally require "job flow balance": job arrivals equal job completions for all devices and the system. Where feasible, use measurements on real systems to check model assumptions.

Two criticisms of computer network queuing models: 1) I/O overlap is not adequately addressed, and 2) using mean service times ignores the fact that service times are not homogeneous. Nevertheless, the models can provide useful information. These ideas are addressed in the articles mentioned next. Finally, queuing network modeling software is available from schools and commercial sources, to simplify the modeling task.

BIBLIOGRAPHY

BUZEN, J.P., and DENNING, P.J. "The Operational Analysis of Queuing Network Models" ACM Computing Surveys, Vol. 10, No. 3 (Sep. 1978, pp. 225-261)

ANDERSON, G.E. "The Coordinated use of Five Performance Evaluation Methodologies" Communications of the ACM, Vol 27, No. 2 (Feb. 1984, pp. 119-125)

queueing theory	Raincomplex's State Transition Graph	data transfer rate	disc vs. disk
millisecond	Myth	queue	online
multitasking	CPU	performance	She always was all tea and oranges that came all the way from China
Sleeping barber problem	multitask	backstage	Offset
Log	Legend	mythical	disk
I/O	Network

Category: