So there I was, sitting in the library, and I thought to myself, why not just change the program counter register instead of using a JMP instruction?

The program counter points to the memory address the current instruction the CPU is considering resides in. It is referenced by the CPU to access the next instruction, and it gets incremented after a fetch (this is a simplified view of how it works - pipelines, superscalar designs, etc. complicate things). Programs don't always progress in a linear fashion like this. In order to implement a real computer, you need conditional branching instructions, which will either jump to another section of code or not, depending on a certain condition.

Well, given a CPU that can do basic arithmetic, it's easy enough to replace the functionality of "jump" with equivalent code that changes the program counter (EIP/IP - instruction pointer - in Intel-ese). A simplified CPU cycle might go like this:

  • while () { fetch instruction from memory pointed to by PC
  • increment PC
  • write PC from a register
  • } loop;
So why would you want to do this? Well, it seems that this cuts down on instruction set size (allowing very RISCy instruction sets with tiny instruction lengths, perhaps 20 bits as in Chuck Moore's F21 CPU). I have a feeling it might make pipelining simpler, but on the other hand, it's not all that much different from JMP so maybe not. This would probably only work for very RISCy designs. Modifying the program counter will probably mess up a lot of things and bad stuff will probably happen if an interrupt comes in while setting PC. Or, not, it's just one instruction after all.

Anyway, in the interest of approaching ultimate RISC, this is one more instruction it might be possible to do without.


Magenta: my excuse is that's it's an academic argument ;-)

I hadn't thought that it would slow things down to make PC a regular register. Okay, on a conventional CPU, but how about a stack machine? It probably shows that I don't know what I'm talking about, but on the other hand, stack machines seem to have been pretty much abandoned for the CISCy RISCs that we have today.

Everything I've read about stack machines seems to say something along the lines of "this is a promising design that's fast an simple, but it's been abandoned because it makes life a little difficult for programmers."

(To elaborate - stack machines get acceptable performance without intricate pipelining tricks. They get to have very short instructions that can fetched as a group, which gets a similar effect. That's why I brought them up.)

Uhm.

Sorry to burst your bubble, but on probably 99% of the instruction sets out there... most RISCs included...

PC/IP isn't a normal register.

And, in fact, JMP xx is the "load $PC xx" instruction. And BRA xx is the "add $PC xx" instruction. Any assembler which allows you to do these things would probably just emit JMP/BRA instructions (or their equivalents) in the instruction stream.

Also, your little event loop completely disregards all of the things which make CPUs fast these days - prefetch caches, pipelines, instruction scheduling/reordering, dynamic recompilation (believe it or not, but most CPUs these days, RISC and CISC alike, internally recompile stuff into a completely different representation; the Athlon and Crusoe, for example, recompile x86 CISC into native VLIW formats (as a generalization in the case of Athlon, anyway)), and so forth.

And there's a very good reason the PC isn't a normal register. Even single-cycle RISC and VLIW setups have a state machine inside, with some very precarious timing with bus lines and clock signals and so forth.

To make PC a normal register would be a huge performance hit, and for what gain? So you can multiply it or something? When would that even be a conceivable operation except in the most esoteric of cases?

By your own admission, your view was extremely oversimplified. It was also extremely naive, considering that jump/branch instructions are simply load/add instructions specific to the PC which allow the CPU (as a whole) to work out the whole juggling act of pipelines and scheduling and the like.


Sageran: stack machines suck. There have never been any clear examples of a stack machine being even remotely as efficient as the most basic RISC, simply because either you have a very small stack in the CPU (see the 8087 for why that sucks) or you keep your stack in some external memory, which will be slow, just like if you had - get this - a normal braindead accumulator architecture. And we all know how fast the 6502 is.

The argument that "stack machines are better than pipelined machines because you don't have to pipeline them" is total bull. Stack machines can't be efficiently pipelined (since everything inherently causes read/write hazards), but that doesn't make it inherently faster! That's like saying "A bicycle is faster than a Ferrari because you can't put in a faster engine." There are many real-world examples of stack-based instruction sets being pipelined, yes (see JIT Java compilers as executed on the host CPU, or the Athlon's implementation of the 8087), but internally the CPU is certainly not stack-based. As far as "fetching a whole block of instructions at once," that's what's known as a "prefetch cache," and that's not something exactly unique to stack-based machines. :)


pokey: Yes, I know the PC is a register. But it's not worth the overhead to make it addressable as a normal register - that is, it's a special register. Yes, the circuitry for storing the value is the same, but that doesn't mean the path leading to the circuitry is. Also, I was rambling about JMP being load (MOV) and BRA (by which I meant - even if I didn't say - all the conditional branch operations, not just BRanch Always) being add (ADD), only in generic RISCy terms, so one would think I already said what you did. ;) Or were you replying to Sageran?
pokey: Oops, sorry. :)

It's my understanding that most architectures use offsets in branch instructions, as opposed to jumps. In the Motorola HC11 ISA the BRA and B?? (conditional branch) instructions add an 8-bit signed offset to the program counter as opposed to JMP, which puts a 16-bit word into PC. This is effectively what you're talking about, I think.

As a side note, PC is always a register. You're just required to use special instructions to write it. Also, even if you could do what you describe (result of an ALU op -> PC) you shouldn't have to worry about interrupts. Show me someone who designed a CPU with non-atomic instructions and I'll gladly kick their teeth in!


Magenta: This was a reply to Sageran. It's not the first time that my slow nodeforging skills have confused others!

Log in or register to write something here or to contact authors.