Here is a simpler solution that is also one-pass with n swaps. Start with 0 and place the element there where it should be (i -> i+m) with a swap. Repeat until you'd wrap around. You've now moved all of the string into place but the last m pieces. These are almost in place, but have been shifted n % m (remainder) places. So if necessary, shift them back recursively using the same algorithm.

I guess if m << n this version should have fairly nice cache-properties, be easy to unroll and work well enough with tail recursion optimization.

In Python:

def swap(a,i,j):
   a[i], a[j] = a[j], a[i]
def rotate(a,m,start=0):
   n=len(a)-start
   m = m % n
   if m != 0:
      for i in range(start,start+n-m):
         swap(a, i, i+m)
      rotate (a,m-n%m,start+n-m)

a = range(10)
rotate(a,3)
print a