Planning to Write an Emulator


Emulator programming can be quite a fulfilling computer geek experience. The concept of creating a piece of software to represent accurately a tangible mess of chips and circuits can instill feelings of computer Goddery in the mind of a lowly code monkey. Knowing that your program will provide a platform for gaming, work, research – you name it – is a very satisfying feeling. Yes sir, emulator programming is worthwhile.

It’s also as frustrating as hell, and will take up weeks, months or years of your time, depending on when you decide to stop plowing away at a thankless task. If you have hair, be prepared to lose it. If you expect words of encouragement and thanks from users, forget it. Your mailbox will be filled with “where can i download a bunch of games 4 free?” and “why the hell doesn’t Big Beans Fighter Shankalankadank 3 work on ur emulator?”. Demands will be made of you by selfish members of the ‘net generation who believe that you take pleasure in creating free software for them and spending hours of your own time giving them tech support because they didn’t bother reading the short instruction blurb supplied with your program. Still…

This node will provide a starting point for the uninitiated so that they can try their hand at making an emulator of their own, or for the casual reader to read through and learn how emulators are made. A good knowledge of programming languages and computers would help, although it’s not necessary to grasp the basic concepts.

Types of Emulator

It’s important to know exactly what you’re creating before you go ahead and start. Emulators come in various shapes and sizes. A recompilation emulator attempts to convert the machine code instructions from the emulated platform into an executable file on the target system. This is good for speed, as there is no conversion being done as the program runs. However, it isn’t very practical, particularly for programs which modify their own code. An interpreting emulator deals with instructions as it goes. This is slower than the recompilation emulator, but infinitely more practical. A dynamic recompilation emulator is a combination of the previous methods.

The most common type of emulator is the interpreting emulator, and so this node will focus on that. The other two methods would be considered advanced techniques, which you may perhapsd progress to yourself after you’ve got a bit of emulation experience under your belt.


You will need extensive documentation about your target platform. It is important to know exactly how each component in the system works, so that you can do an adequate job of implementing it in software. Usually this documentation isn’t hard to find – a programming manual or system handbook would do. These usually come with the computer itself in the case of older machines, and can be found archived on the Internet. Games consoles are a little harder as the user wasn’t intended to interface at a low level with the hardware, although most of the time enthusiasts have provided written accounts of the hardware, which can also be found ‘netwise. Newer systems are hardware, and may be comprised of proprietary components with no documentation at all. In this case, an emulator programmer will have to reverse engineer and experiment with the hardware to figure out how it works. This is a good reason for not choosing a recent system for your first emulation project!

Programming Language

I can’t emphasize enough how important a strongly-typed language is in emulator programming. Don’t choose BASIC. Despite popular opinion, it is not an easier language to program in, particularly with a project like this. You really do need to be using a systems programming language (C/C++/Pascal/ASM), or a strongly typed interpreted language (Java/C#). You also need to be able to write fast code – this is why I would personally not use an interpreted language, and especially not BASIC.

“But BASIC is dead easy! Pre-schoolers use it!”. Yes, the keywords and syntax are easy to learn, but that doesn’t make it an easier language. Imagine, for example, you want to increment an unsigned 8-bit value and have it wrap around after reaching its maximum capacity. In BASIC:

value = value + 1
REM Check for overflow
if value > 255 then value = 0

In C:


It may not seem like much more code, but it adds up.


You do of course want to write fast code, as you are intending to emulate at full speed (at least!). Little optimizations will play a big part in the speed of your entire program.


An integer value can be multiplied by 2 simple by shifting it 1 bit to the left. This tends to be faster than straightforward multiplication.

y = x<<1; //y = x multiplied by 2
y = x<<4; //y = x multiplied by 16
y = x>>2; //y = x divided by 4
y = x<<6 + x<<4; //y = x multiplied by 80

Inline Functions

An intensive part of the emulator, e.g. the CPU core, will probably call a function millions of times per second. A function is usually called by PUSHing the parameters to the stack, CALLing the function, POPping the parameters back off, and putting the return value in a register. Ouch, that’s a lot of overhead, and over time it’s overhead multiplied by a few billion. So, instead of “calling” the function, we can have the compiler instead insert the code of the function where it would normally be called. This has the disadvantage of increasing the size of the executable, but is certainly worth doing for small functions that are called very often. Knowing when to do this is the key – it’s a waste to inline a function which is only called once per frame. Also, be aware that compilers are picky about the in-lining, and have different methods to specify and inline function. You’ll have to refer to the relevant documentation.

There are lots of ways you can make things faster, and these are just suggestions. Remember to get things working before you make them optimal. Never optimize as you go – you will end up writing error-prone code.

von Neumann Architecture

Emulation is made possible by the fact that the common computer is based on the same model devised by some scientists in the 1950s, and documented by the famous mathematician John von Neumann, who controversially took most of the credit. This model dictates that data and CPU instructions share the same memory – at the most basic level the CPU fetches instructions from RAM and executes them, writing different values to elsewhere in RAM or performing output. Other components in a computer are essentially extensions to this idea, all tied together to interface with one another using basic methods. You can look at many components in a system as mini computers inside the main computer, driven by the system clock, coordinated by the CPU, and interfacing via I/O ports and RAM.

Writing the Emulator

Got everything ready? Got a fair knowledge of computers and how they work? Chosen a programming language, environment and platform? Ready to cancel all social engagements over the next month? Then let’s begin.

Program Body and Structure

You’ll want to formulate your program structure to meet your own needs, but a quick outline of the main loop of the program will help you set out more quickly. The basic structure is as follows:

//process CPU instructions until the next screen redraw
    //etc. for other components

Although this is very much simplified from what you are likely to actually write, it gives a general outline idea. The basic concept is to run the CPU until the entire frame is rendered, and then present it to the screen. After each CPU instruction, the individual system components are passed the number of clock cycles the instruction took, so that they can convert this into the amount of time passed and update their selves accordingly.


In general, memory refers to a form of storage in a computer, but is more specifically used to talk about RAM and ROM.


This is the fast, temporary area of memory (data is lost when power is lost) directly linked to the CPU, used to store instructions and data. In your program, you will probably store this as an array of bytes, for quick and easy access. If the system contains 64KB of RAM, the memory area could be declared as “unsigned char RAM[0x10000];”, and accessed through normal array methods.


This is an area of memory which can only be read from. This is often used to hold the program which runs when the computer is switched on, or represents the data on a cartridge for a games console. In particular, systems which don’t load programs as files into RAM will fetch their CPU instructions straight from ROM and use RAM as a scratch pad, storing temporary values and results. Since it’s important not to let your emulator program write to ROM (especially as in many systems, ROM and RAM are indexed by the same counter), you may want to write a set of functions for memory accesses, performing checks on illegal writes. Remember, these will be called very often and so should be optimized.


The stack is a portion of RAM where data can be “PUSHed” to and “POPed” from. This is provided so that programs may use memory dynamically. A stack pointer points to the last value PUSHed to the stack. When the program wants to PUSH a new value, the stack pointer decreases and the value is written. When a value is POPped, the value is read and the stack pointer increased. For example, when a routine is CALLed, the next instruction address is PUSHed to the stack and the CPU is set to run the opcodes at the address of the routine. When the routine returns, the address is POPed and the CPU continues running from that point.

IO Ports

In order to communicate with the CPU, components may use a specific address in RAM to provide information about their state, or to be given an area of RAM to access data from. These are referred to as IO ports. It may be the case that the specific area of RAM is made read-only (despite being in RAM), which is another good reason for writing functions to handle memory access.


The CPU constantly reads instructions from memory and acts on them. It dictates what happens in the machine, does all of the calculations and runs the programs. Generally, your emulator will be written around the CPU, so it is important to get all of its functions right.


You will commonly hear of computers referred to as 8-bit, 16-bit, 32-bit etc., as a marketing point. The number of “bits” in a computer system is sometimes hard to gauge, but generally it means the native data size that the CPU deals with. So, for example, an 8-bit CPU will deal primarily with 8-bit values (256 different combinations of number). This doesn’t mean it can’t deal with numbers higher than 255 – but it is designed to work most easily and most quickly with 8-bits.


Registers are places in a CPU for holding data. This means values read from memory, results of an instruction, or special numbers required for the CPU to function. As a simple example, I’ll detail the registers in the Z80-like CPU found in the Nintendo Gameboy.

General purpose registers:
A – an 8-bit register, commonly used to hold the result of operations. Opcodes such as ADD and SUB will store the result in A.
B, C, D, E, H, L – 8-bits. No special purposes, these are simply extra registers for convenience.

Functional registers:
F – the 8-bit “flags” register. See heading below. In this particular case, only the 4 upper bits are used.
SP – the “stack pointer”. This points to the top of the stack (see above). Generally it isn’t manipulated directly by the programmer.
PC – the “program counter”. This points to current instruction for the CPU to execute. After the instruction is dealt with, it is automatically incremented.

16-bit paired registers:
AF, BC, DE, HL – these are paired registers, made up from the combinations of the general purpose 8-bit registers. This allows the CPU to do 16-bit operations, although bear in mind that an operation on a 16-bit register will affect its two 8-bit equivalents.

You should have a pretty good idea of the overall makeup of a CPU now. In fact, you now know the Gameboy’s processor quite intimately – there really is nothing more to it than this. The registers can be stored as structures inside a union – this way operations on register pairs will be automatically reflected in individual registers and vice-versa, as they share the same memory. An adequate structure for the CPU detailed above would be:

union _REGS
struct {byte A, F, B, C, D, E, H, L, PCl, PCh, SPl, SPh}b;
struct {word AF, BC, DE, HL, PC, SP}w;

(note: this is endian-specific. See below)


The flags register details logical aspects of the last operation performed by the CPU. Some operations don’t affect flags at all, while others may only affect some. Flags differ between CPUs, but common flags include the zero flag (did the last operation result in a 0?) and the carry flag (did a number borrow a digit?).

Little Indians and Big Indians

CPUs are classed “little endian” and “big endian” depending on how they store and read multiple byte values in memory. A big endian system stores the most significant byte first, where a little endian system stores the least significant byte first. This is very important to remember. A little endian system reading the number 32768 stored in big endian format will interpret it as the number 128! You have to be aware of whether the endian-ness (or bytesexuality) of your host system and target system match, and keep it in mind when writing your data structures too. This causes some massive problems for cross-platform emulators, where you have to facilitate each bytesex.

Little endian CPUs:
  Intel x86 (common PC)
  Zilog Z80 (Sinclair Spectrum, Nintendo Gameboy, Sega Master System, Sega Megadrive/Genesis sound chip)
  Intel 8080 (Nintendo NES, Altair 8800, early PCs)

Big endian CPUs:
  Motorola 68000 (Amiga, early Macs)
  PowerPC (Macs)


An opcode is the numerical representation of an instruction. A CPU will read the number, and know what to do with it. Some opcodes have parameters, which take the form of extra numbers after the opcode byte/s. Some example 8-bit Z80 opcodes (numbers in hexadecimal):

00 (NOP)
C3 50 01 (JP 0150 – notice the little endian storage of 0150)

Another term for numerical opcodes is “machine code”. Assembly language hides opcodes by replacing them with mnemonics which are easily remembered by humans, but are essentially converted directly into their numerical equivalent by the assembler.


Instructions are CPU operations, the actual processes carried out when an opcode is read. A program is made up of instructions. You will most likely write individual functions for each instruction, and call them when the CPU core encounters a particular opcode. This is perhaps the most vital part of your emulator. The main CPU loop could be written as a big switch statement:

byte opcode = memory[REGS.w.PC]; //read next instruction from memory
	case 0x00:
	case 0xC3: 
		printf(“Unimplemented opcode %02X\n”, opcode);

Common Instructions Overview

	MOV/LD – take a value from memory and put it in a register, or vice-versa
	ADD – add one register to another
	SUB – subtract a register from another
	CALL – push the address of the next function to the stack,
                 and jump to the address specified
	RET – pop an address from the stack and jump here
	JP – jump straight to a specified address
	INC – add 1 to a register
	DEC – subtract 1 from a register
	CP – compare a register with a value

Clock Cycles

Each instruction will take a certain amount of time for the CPU to execute, depending on its complexity. This time is always an integer multiple of the main ticking clock (a 4MHz CPU has a clock ticking at 4 million times per second). So, a CPU clocked at 4MHz could perform a 4 clock cycle instruction 1 million times per second. It’s useful to store the amount of cycles of each instruction so that you can pass them to other modules of your emulator, as they can then use it as a guide to how much time has passed and update their selves accordingly. In essence, clock cycles are the units of time in your emulator, rather than seconds.


An interrupt is signaled by hardware in the computer system at specific moments, so that the CPU can call a procedure at a certain time. A common, almost universal interrupt is the vertical blank interrupt. When the video chip has redrawn the entire screen, an “interrupt request” is flagged. The CPU’s program counter will then jump to a set place in memory, and run the routine set out there. This is useful for timed tasks, as in this case where the program might set out the screen ready for the next frame. Your emulator will need to handle this – a common way is for the module of your emulator (video? sound? timer?) to store the number of ticks passed, and signal the interrupt appropriately. Timing should be worked out accurately so that your emulator functions according to the original system. Using clock cycles as a time unit will make this a lot easier.

Tying it all Together

Well, those are the basics of emulator programming. It’s not that much now, is it? All you have to do now is implement system specifics, video, sound, timers, cartridges, hardware bugs, an entire CPU core, interface, input… the list is nearly endless. This can’t all be covered here because computers are by their very nature incompatible. This is why research is so very important. Hopefully the information here is enough to get you off the ground, and figure out how emulators work.

Have fun!

This was originally a 15,000 word monster, but I've cut it down to around a 5th of its original size. It was Blabbicus Maximus.