Disclaimer: while the majority of the 6502 writeups on E2 are biased to the NES system, I have no experience with the NES. Instead, my experience with the 6502 is that with the Apple family of computers - primarily the Apple ][+.
It is difficult to talk of the addressing modes of the 6502 processor without mentioning one of its strongest features - that of the zero page and addressing it.
The memory in an Apple computer was broken up into pages of 256 bytes each. Particular pages had particular uses. For example, on the page 1 (memory from $0100 - $01FF) was the stack, page 2 (memory from $0200 - $02FF) was the keyboard buffer, page 3 was used by the monitor (some may recall that within the monitor (the '*' prompt) the way to get out without rebooting the system was 3D0 G which ran the routine located at $03D0). Hi-res video memory was located from page 20 to page 3F, and again at page 40 to page 5F. ROMs were located from page C8 to Page FF (the last page) (some may recall the mini-assembler
in ROM which started at $F666). This is all well and good,
however likely the most important page was that of page 0.
The Zero Page (called "direct page" in the newer 6502 family of processors and at 6502 addressing modes is abbreviated as 'dp' in many opcode nodes), like all the other pages of memory held only 256 bytes. However, memory from this page could be accessed only using one byte. "Whoop de do" you say... but this was a very big deal. Consider the LDA instruction which loads a byte
into the Accumulator.
$AD $44 $00 -- LDA Absolute: $AD
$A5 $44 -- LDA Zero Page: $A5
These two commands are equivalent as in they do the same thing, each loads the byte from $0044 into the accumulator. However, the zero page access took one machine
cycle (two clock cycles) less time (3 cycles rather than 4 cycles) - 25% faster. Granted, this speedup isn't quite as impressive with things such as
INC (INCrement memory) where you go from 6 cycles to 5 cycles
So, what do you do with the Zero Page? The 6502 had only a few registers:
- Accumulator - 1 byte
- X Register - 1 byte
- Y Register - 1 byte
- Stack pointer - 1 byte + 1 bit
- Program Counter - 2 bytes
- Flags - 1 byte
Of these, the only ones that a programmer could really use are the Accumulator, and the X and Y registers. Of these, the X and Y registers were used as
offsets for memory and couldn't be dealt with directly - you couldn't
AND the
Accumulator with the X register, or do a shift left (
ASL) to the Y register (you could do that to the Accumulator). Now, today there are how many programmable registers in a modern processor (I recall 32 on the
MIPS system). So, the Zero Page was used to store important values that you would use again and again, such as the index of an array that is being pointed to - just do a
LDX or
LDY off of the Zero Page. So now, you've got 256 bytes of quick(er) to access memory than the rest of memory - yes, its slower than the CPU registers, but then, everything is.
http://www.callapple.org/apple2/magazines/aar/assembler.html
http://sbprojects.fol.nl/sbasm/6502.htm
http://apple2history.org/history/ah02.html
http://www.6502.org/tutorials/6502opcodes.htm