OpenMP is a parallel programming language based on a shared memory model for distributed systems. Basically it is a commonly used and agreed upon method for inserting compiler directives into C,C++,and Fortran code to enable a specialized compiler to run serial code on multiple threads which may or may not be on multiple different machines.

Pioneered by SGI, yet developed with cooperation from other parallel programming vendors and resarchers, OpenMP was recently adapted as an industry standard (1997) for shared memory parallel programming. Earlier languages attempting to provide parallelism did not have the flexibility or power OpenMP provides. Other languages include Intel's iPSC, CMFortran, C*, High Performance Fortran (HPF), and X3H5. A committee was formed to create a fully functional, industry standard, powerful and flexible language. That group became the OpenMP Architectual Review Board (ARB) and they developed OpenMP off of X3H5. The OpenMP specification as well as more information can be found at

OpenMP consists of compiler directives and runtime libraries that offer a full assortment of parallel options for a programmer. The entire idea is to be able to take serial code and fairly easily explain to the compiler how it is to be run in parallel.

Execution usually proceeds as follows, with serial code to intialize the program and set up the calculation, then parallelization of a main calculation or loop, and more serial code to collect and distribute the results.

 | <-master thread executes serial setup code.
 |   Serial
---------  <-parallel directive is encountered,
| | | | |     creates slave threads.
| | | | | 
| | | | | Parallel
---------  <- Implicit Barrier waits for all to finish
 |   Serial
 V  <- master thread executes remaining serial code.
Common Compiler Directives
These compiler directives are preceeded with a "!$omp" (fortran) or a "#pragma omp" (C,C++) at the beginning of a line. They then express what type of parrallelism is allowed withing the region following the directive. Common compiler directives include:
  • !$omp parallel
    structured block to be run in parallel
    !$omp end parallel
  • !$omp parallel do
    do loop to be run in parallel
    !$omp end parallel do
  • !$omp critical
    block to be run by one thread at a time
    !$omp end critical
  • !$omp single
    block to be run by one thread only
    !$omp end single
  • !$omp master
    block to be run by the master thread only
    !$omp end master
  • !$omp flush (list)
    specifies a required consistent view of memory
       at this point

Example Code
Here is a brief C++ MPI code example of a common way to parallelize matrix multiplication. Break a matrix up into smaller submatrices and perform the multiplication on two smaller submatrices (one from appropriate row, one from appropriate column) then multiply and sum up those results.

/* Written by Oeq1st1 */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
#define SUB_N 8 /*define a matrix size (easily divisible)*/

/* Prints an array. Call it with the size of the array N, and what
 * array you want printed.  */
void print_array(int n, double **X);

/* n is number of threads running in parallel, multiply A*B -> student */
void do_submatrix_multiplication(int n, double **A, double **B, double **student) {
	int i, j, k, x, y, z, n_prime = n/SUB_N;

	if(n % SUB_N != 0){
		printf("ERROR: Matrix must be an even multiple of %d\n",SUB_N);

/*example assume 64X64 Matrix broken into 8X8 Matrices*/
/*iterate along outer bloks*/
#pragma omp parallel for private(i,j,k,x,y,z)
	for (x=0 ; x<n_prime ; x++ ){ /*0 to 7*/
	  for (y=0 ; y<n_prime ; y++){
	    for (z=0 ; z<n_prime ; z++ ){

		  /*row by column matrix multiplication*/
		  for (i=x*SUB_N ; i < (x+1)*SUB_N ; i++ ){ /*0-7, then 8-15..etc */
		    for (j=y*SUB_N ; j < (y+1)*SUB_N ; j++){
		      for (k=z*SUB_N ; k < (z+1)*SUB_N ; k++ ){

			student[i][j] += A[i][k] * B[k][j];
			/*accumulate results into students*/



The parallel for compiler directive dynamically or statically issues iterations of the loop to different threads to be run in parallel. The private command insures that each thread has their own values of the variables listed. It is extremely important to analyze data dependencies in order to locate any race conditions that may exist in a parallel for statement. Not any for loop can be correctly parallelized as above.

There are two functions in omp.h to retrieve the number of threads running in parallel, and each threads assigned number. They are:

Which return the number of threads and this particular threads assigned number, which are often very usefull.

This is a brief explanation of how OpenMP works. For more thorough information visit or try "Parallel Programming in OpenMP" by Rohit Chandra, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon.

Log in or register to write something here or to contact authors.