OpenMP - Everything2.com

OpenMP is a parallel programming language based on a shared memory model for distributed systems. Basically it is a commonly used and agreed upon method for inserting compiler directives into C,C++,and Fortran code to enable a specialized compiler to run serial code on multiple threads which may or may not be on multiple different machines.

History
Pioneered by SGI, yet developed with cooperation from other parallel programming vendors and resarchers, OpenMP was recently adapted as an industry standard (1997) for shared memory parallel programming. Earlier languages attempting to provide parallelism did not have the flexibility or power OpenMP provides. Other languages include Intel's iPSC, CMFortran, C*, High Performance Fortran (HPF), and X3H5. A committee was formed to create a fully functional, industry standard, powerful and flexible language. That group became the OpenMP Architectual Review Board (ARB) and they developed OpenMP off of X3H5. The OpenMP specification as well as more information can be found at http://www.OpenMP.org

Format
OpenMP consists of compiler directives and runtime libraries that offer a full assortment of parallel options for a programmer. The entire idea is to be able to take serial code and fairly easily explain to the compiler how it is to be run in parallel.

Execution usually proceeds as follows, with serial code to intialize the program and set up the calculation, then parallelization of a main calculation or loop, and more serial code to collect and distribute the results.

 | <-master thread executes serial setup code.
 |   
 |   Serial
 V
---------  <-parallel directive is encountered,
| | | | |     creates slave threads.
| | | | | 
| | | | | Parallel
V V V V V
---------  <- Implicit Barrier waits for all to finish
 |
 |   Serial
 |
 V  <- master thread executes remaining serial code.

Common Compiler Directives
These compiler directives are preceeded with a "!$omp" (fortran) or a "#pragma omp" (C,C++) at the beginning of a line. They then express what type of parrallelism is allowed withing the region following the directive. Common compiler directives include:

!$omp parallel
structured block to be run in parallel
!$omp end parallel

!$omp parallel do
do loop to be run in parallel
!$omp end parallel do

!$omp critical
block to be run by one thread at a time
!$omp end critical

!$omp single
block to be run by one thread only
!$omp end single

!$omp master
block to be run by the master thread only
!$omp end master

!$omp flush (list)
specifies a required consistent view of memory
at this point

Example Code
Here is a brief C++ MPI code example of a common way to parallelize matrix multiplication. Break a matrix up into smaller submatrices and perform the multiplication on two smaller submatrices (one from appropriate row, one from appropriate column) then multiply and sum up those results.

/* Written by Oeq1st1 */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
#define SUB_N 8 /*define a matrix size (easily divisible)*/

/* Prints an array. Call it with the size of the array N, and what
 * array you want printed.  */
void print_array(int n, double **X);

/* n is number of threads running in parallel, multiply A*B -> student */
void do_submatrix_multiplication(int n, double **A, double **B, double **student) {
	int i, j, k, x, y, z, n_prime = n/SUB_N;

	if(n % SUB_N != 0){
		printf("ERROR: Matrix must be an even multiple of %d\n",SUB_N);
		return;
	}

/*example assume 64X64 Matrix broken into 8X8 Matrices*/
/*iterate along outer bloks*/
#pragma omp parallel for private(i,j,k,x,y,z)
	for (x=0 ; x<n_prime ; x++ ){ /*0 to 7*/
	  for (y=0 ; y<n_prime ; y++){
	    for (z=0 ; z<n_prime ; z++ ){

		  /*row by column matrix multiplication*/
		  for (i=x*SUB_N ; i < (x+1)*SUB_N ; i++ ){ /*0-7, then 8-15..etc */
		    for (j=y*SUB_N ; j < (y+1)*SUB_N ; j++){
		      for (k=z*SUB_N ; k < (z+1)*SUB_N ; k++ ){

			student[i][j] += A[i][k] * B[k][j];
			/*accumulate results into students*/

		      }
		    }
		  }    


	    }
	  }
	}

}

The parallel for compiler directive dynamically or statically issues iterations of the loop to different threads to be run in parallel. The private command insures that each thread has their own values of the variables listed. It is extremely important to analyze data dependencies in order to locate any race conditions that may exist in a parallel for statement. Not any for loop can be correctly parallelized as above.

There are two functions in omp.h to retrieve the number of threads running in parallel, and each threads assigned number. They are:

omp_get_num_threads();
omp_get_thread_num();

Which return the number of threads and this particular threads assigned number, which are often very usefull.

This is a brief explanation of how OpenMP works. For more thorough information visit www.openmp.org or try "Parallel Programming in OpenMP" by Rohit Chandra, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon.

parallel programming	parallel programming languages	Instruction Level Parallelism	MPI
data dependency	Dead baby jokes	HyperThreading	TeraGrid
Mille Bornes	Shared memory	FORTRAN	NEC Earth Simulator
Myrias Research Corporation	High Performance Fortran	Itanium	Kai
BT	Race condition	Cray	Intel
C++