The 16-bit version of the venerable 6502. Laid out entirely by hand by William Mensch (and with a few hardware bugs as a result - MVN takes 3 cycles longer per byte than it should), the 65816 has 3 registers, a 24 bit address space, and no waiting :-)
Used in the Apple IIGS and the Super Nintendo. Many SNES games had slowdown because only former C64 and Apple II coders knew how to make it sing.

A 6502 Programmer's Introduction to the 65816

by Brett Tabke, in Commodore World magazine Issue #16

After programming in 6502 language for over a decade, I was getting a bit BORED. One can only code the same routines with the same opcodes so many times before the nausea of repetition becomes overpowering. When I heard the news that CMD was building a cartridge based on a 20 MHz 65816 I was overjoyed. For years I've heard those with 65816 bases systems brag about its capabilities. To us old 6502 programmers, the opportunity to program the fabled 65816 is a new lease on life.

The 65816 is an 8-/16-bit register selectable upgrade to the 6502 series processor. With 24 bit addressing of up to 16 Megabytes of RAM, the powerful 65816 is a logical upgrade that leaves 6502 programmers feeling right at home. It is amazing how fast one can adapt to the new processor

Getting a Feel for the Modes
The 65816 may be operated in Native mode or 6502 Emulation mode. Emulation mode is a 100% 6502 compatible mode where the whole processor looks and feels like a vintage 6502. Native mode offers 8- or 16-bit user registers and full access to 24-bit addressing.

While in emulation mode, not only are all the 6502 opcodes present in their virgin form, but the new 65816 instructions are also available for usage. In fact, the first lesson to learn about programming the 65816 is that emulation mode is much more powerful than a stock 6502. The only true difference between emulation mode and our venerable C64's 6510 processor is that unimplemented opcodes will not produce the results expected on the former. Since all 256 of the potential opcodes are now implemented on the 65816, older C64 software that uses previously unimplemented opcodes will produce erratic results.

To select between emulation and native modes, a new phantom hidden emulation bit (E) was added to the status register. Shown in programming models hanging on top of the Carry bit, the emulation bit is only accessible by one instruction. The new instruction (XCE) exchanges the status of the Carry bit and Emulation bit. To move to emulation mode, set the carry and issue an XCE instruction. To move to native mode, clear the carry and issue the XCE instruction.

My, How Your Index Registers Have Grown!
While in native mode there are two new directly accessible bits present in the status register. The 65816 implements new hardware interrupt vectors which include a new hardware BRK vector in ROM; therefore, the old BRK bit of the status register is no longer needed. The BRK bit is replaced with the X bit to select either 8- or 16-bit index registers. The former empty bit 5 is now filled with the M bit to specify the accumulator and memory access as 8- or 16-bit. Two new instructions are used to clear or set bits within the status register. The SEP instruction sets bits, and REP clears bits. SEP and REP use a one byte immediate addressing mode operand to specify which bits are to be set or cleared. For example, to set the X bit for 8 bit user registers:
SEP #%00010000  ; set bit 4 for 8-bit index registers.  
Or to clear bit 4:
REP #%00010000  ; clear bit 4 for 16-bit index registers.  

When in 8 bit mode, the index registers perform their function in standard 6502 form. When status bit X is set to 0, both the X and Y index registers become 16 bits wide. With a 16-bit index register you can now reach out to a full 64K with the various indexed addressing modes. An absolute load to an index register in 16-bit mode will retrieve 2 bytes of memory-the one at the effective address and the one at the effective address plus one. Simple things like INX or DEY work on a full 16 bit, which means you no longer have to specify a memory location for various counters, and loops based on index counters can now be coded in a more efficient manner.

The formerly empty status register bit 5 is now referred to as bit M. M is used to specify an 8- or 16-bit wide acculmulator and memory accesses. When in 8 bit mode, (M=1), the high order 8 bits are still accessible by exchanging the low and high bytes with a XBA instruction-it is like having two acculmulators! However; when set for a full 16-bit wide accumulator, all math and accumulator oriented logical intructions operate on all 16 bits! If you add up the clock cycles and bytes required to perform a standard two byte addition, you can start to see the true power of 16-bit registers.

More Register Improvements
Zero Page has now been renamed to Direct Page. A new processor register D was added to allow Direct Page to be moved anywhere within the first 64K of memory. The direct page register is 16 bits wide, so you can now specify the start of direct page at any byte. Several old instructions now include direct page addressing as well. To move direct page, just push the new value onto the stack (16 bits) and then PLD to pull it into the direct page register. You may also transfer the value from the 16-bit accumulator to the direct page register with the TCD instruction. Direct page may also be moved while in emulation mode.

While in native mode, the stack pointer is a full 16 bits wide, which means the stack is no longer limited to just 256 bytes. It can be moved anywhere within the first 64K of memory (although while in emulation mode, the stack is located at page one). There are several new addressing modes that can use the stack pointer as a quasi-index register to access memory. Numerous new push and pull instructions allow you to manipulate the stack. A few of the more useful stack intructions useful to programmers, are the new instructions to push & pull index registers with PHX/PHY and PLX/PLY.

Two other new processors registers are the Program Bank Register (PBR) and the Data Bank Register (DBR). The Program Bank Register can be thought of as extending the program counter out to 24 bits. Although you can JSR and JMP to routines located in other RAM banks, individual routines on the 65816 still must run within a single bank of 64K-there's no automatic rollover from one bank of RAM to the next when executing successive instructions. In this sense, it may help to think of the 65816 processor as a marriage of Commodore's C128 Memory Management Unit (MMU) and an 'enhanced' 6502-a very similar concept.

The Data Bank Register is used to reach out to any address within the 16 megabyte address space of the 65816. When any of the addressing modes that specify a 16-bit address are used, the Data Bank byte is appended to the instruction address. This allows access to all 16 megabytes without having to resort to 24-bit addressing instruction, and helps enable code that can operate from any bank.

New Addressing Modes
There are new addressing modes on the 65816. Several new instructions are designed to help relocatable code that can execute at any address. The use of relocatable code on the 6502 was extremely limited. With 16 megabytes of address space, writing relocatable code increases the overall utility of the program. To write relocatable code, several new instructions use Program Counter Relative Long addressing. This allows relative branching within a 64K bank of RAM. There's also Stack Relative addressing, and a push instruction to place the program counter onto the stack, so that a code fragment can pull it back off and can instantly know its execution address.

Another new feature are two Block Move instructions, one for forward MVP and one for backward MVN. Simply load the 16-bit X register with the starting address, the Y index register with the ending address, the accumulator with the number of bytes to move, and issue the MVP or MVN instructions. MVN is for move negative, and MVP is for move positive, so that your moves don't overwrite themselves. Block Moves use two operand bytes: one for the source bank of 64K and one for the destination bank. Memory is moved at the rate of seven clock cycles per byte.

Several new addressing modes are used to access the full address space. A 65816 assembler would decode "long" addressing given this input:

LDA $0445F2   ; load byte from $45F2 of RAM bank 4  
LDA $03412F,x ; load byte from $412F of bank 3 plus x.  
Quite a few instructions have been given new addressing modes. How many times have you wanted to do this:
LDA ($12) ; load indirect without an offset.  
Or how about a table of routine addresses:
JSR ($1234,x)  ; jump to a subroutine via  indexed indirect addressing!  
Other fun new instructions:
TXY,TYX      Transfer directly between index registers  
BRA          Branch always regardless of status bits  
TSB          Test and set any bit of a byte  
TRB          Test and reset (clear) any bit of a byte  
INC A/DEC A  Increment or decrement the accumulator directly  
STZ          Store a zero to any byte  

Log in or register to write something here or to contact authors.