Assembly is much more fun with RISC processors. For example, here is some MIPS assembly to do multiplication. Yes, I realise that MIPS assemblers provide a multiply pseudo-instruction, which uses the hardware multiplier and is thus much faster than this code. It was for a class, okay?

# multiply:  Multiply two signed 16-bit integers using Booth's
#            algorithm.  The multiplier and multiplicand should
#            be in $a0 and $a1; they should be no larger than
#            16 bits.  The 32-bit product is placed in $v0.
#

multiply:
        ## $v0 holds the product.  $a0 and $a1 hold a and b, the
        ## multiplicand and multiplier.  $t0 holds i, $t1 holds
        ## the bit most recently shifted off result (called
        ## `shifted').  $t2 holds a in the upper half, 0 in the
        ## lower half; for efficiency reasons, we compute this
        ## value once and save it, rather than computing it each
        ## time through the loop.  Finally, $t3 and $t4 are used
        ## as temporary registers.
        
        ## First set the low 16 bits of result to the multiplier
        move $v0, $a1              # $v0 (result)
        andi $v0, $v0, 0x0000ffff  # clear top 16 bits

        move $t1, $0            # $t1 = shifted
        sll $t2, $a0, 16        # shift a into $t2's high 16 bits

        move $t0, $0            # initialise i
loop:   slti $t3, $t0, 16       # $t3 = (i < 16)
        beq $t3, $0, end_l      # if (i >= 32) goto end_l

        andi $t3, $v0, 1        # $t3 = LSB of result

        ## Do arithmetic based on the shifted-off bit and the
        ## LSB of result.  We perform the operations:
        ##
        ## LSB shifted   action
        ## -----------------------------
        ##  0     0      none
        ##  0     1      add a to result's high 16 bits
        ##  1     0      subtract a from result's high 16 bits
        ##  1     1      none
        ##
        ## Note that we add a to the high 16 bits of result by
        ## adding $t2 (a << 16) to result.  Because $t2's low 16
        ## bits are zero, the low 16 bits of result are not
        ## affected.
        
        ## LSB > shifted:  bit pattern 10: subtract
        slt $t4, $t1, $t3       # $t4 = (shifted < lsb)
        beq $t4, $0, skip1      # if (shifted >= lsb) goto skip1
        sub $v0, $v0, $t2       # subtract a from high half of result
        j skip2                 # next test will always fail
skip1:
        ## LSB < shifted:  bit pattern 01: add
        slt $t4, $t3, $t1       # $t4 = (lsb < shifted)
        beq $t4, $0, skip2      # if (lsb >= shifted) goto skip2
        add $v0, $v0, $t2       # add a to high half of result
skip2:
        ## LSB = shifted:  bit pattern 00 or 11: do nothing

        # We now perform an arithmetic right shift of result by
        # one bit.  The lost bit is stored in shifted ($t1).

        andi $t1, $v0, 1        # shifted = low bit of result
        sra $v0, $v0, 1         # shift result by 1 bit

        # End of loop
        add $t0, $t0, 1         # increment i
        j loop                  # repeat
end_l:
        
        jr $ra
The references to `i', `a', `b', etc. refer to the C version, which i do not include here.

The excessive documentation is due to the fascist instructor. He informed us that we had to comment every single line. This is why EE professors should not teach programming.