Home Page


Engineering an NES Emulator
The Making of granola

CPU Basics

Our CPU consists of the following registers:

A Accumulator Register 8 bits wide (unsigned char)
X Index Register 8 bits wide (unsigned char)
Y Index Register 8 bits wide (unsigned char)
P Status Register 8 bits wide (unsigned char)
S Stack Pointer Register 16 bits wide (unsigned short)
PC Program Counter Register 16 bits wide (unsigned short)

Now that we have grouped these registers together, we can actually use a struct to represent our registers in a regs.c file:

typedef struct regs {
    unsigned char A;
    unsigned char X;
    unsigned char Y;
    unsigned char P;
    unsigned short S;
    unsigned short PC;    
} Registers;

Now, we can conveniently access our registers in one nice struct. Let's test it out by making a quick test program in our main function:

#include <stdio.h>
#include "reg.c"
Registers reg;
int main() {
    unsigned char result;
    reg.X = 2;
    reg.Y = 3;
    result = reg.X + reg.Y;
    reg.A = result;
    printf("X contains: %d\n", reg.X);
    printf("Y contains: %d\n", reg.Y);
    printf("A contains: %d\n", reg.A);
    getchar();
    
    return 0;
}

Our test output:

X contains: 2
Y contains: 3
A contains: 5

Cool! We have CPU registers! But now let's think about what a CPU does... A CPU needs to be able to run the whole fetch-decode-execute cycle in order to manipulate the registers. We'll need to actually create a CPU module in this case. Let's create a simple cpu.h file and define the API. For now, we just want a CPU to run and have access to the registers. That's all!

cpu.h

#ifndef GR_CPU_H
#define GR_CPU_H
void cpu_run();
#endif

cpu.c

#include <stdio.h>
#include "reg.c"
#include "cpu.h"
void cpu_run() {
	puts("I'm running!");
}

main.c

#include <stdio.h>
#include "cpu.h"
int main() {
    cpu_run();
    getchar();
    return 0;
}

Output:

I'm running!

Now, let's implement the run loop that will simulate a fetch-decode-execute cycle.

#include <stdio.h>
#include "reg.c"
#include "cpu.h"
Registers reg;
unsigned char mem[65536];
void cpu_run() {
	reg.X = 0x0;
	reg.Y = 0x0;
	reg.A = 0x0;
	reg.P = 0x0;
	
	reg.S = 0x01FF;
	reg.PC = 0x8000;
	
	for(;;) {
		unsigned char op = mem[reg.PC];
		
		switch(op) {
			case 0x00:
				++reg.PC;
				printf("Run opcode 0x00");
				break;
			case 0x01:
				++reg.PC;
				printf("Run opcode 0x01");
				break;
			case 0x02:
				++reg.PC;
				printf("Run opcode 0x02");
				break;
			default:
				/* no-op */
				break;
		}
	}
}

Here is what is going on:

In this example, I've only implemented up to opcode 0x02. The rest should just be typing.. or scripting... whatever you prefer.

But just by having this skeleton code tells us a couple of things:

  1. We need to implement a memory controller. Having an array of bytes is not going to cut it because the 6502 uses an MMIO system and many peripherals will access certain addresses along with the NES using memory mirroring in certain spaces. We need to abstract this away and make it easier to program by providing an API to deal with this.
  2. We need to be able to know how many bytes to increment the PC depending on the opcode being executed.
  3. We should take advantage early in this CPU process to also create a disassembler. This will help us become familiar with the 6502 assembly language.

Okay, let's get started.

Memory Controller

Refer to the Memory Map of the NES for the address ranges we should hand for our memory manager. For now let's create two new files, mem_man.h and mem_man.c.

Let's keep our API simple by defining two functions: mm_read, which will receive an address and return a byte in that location memory. mm_write, which will write a byte to a particular address.

mem_man.h

#ifndef GR_MEM_MAN_H
#define GR_MEM_MAN_H
unsigned char mm_read(unsigned short address);
void mm_write(unsigned short address, unsigned char value);
#endif

mem_man.c

#include "mem_man.h"
#include "memory.c"
extern unsigned char memory[MEMORY_SIZE];
unsigned char mm_read(unsigned short address) {
	return memory[address];
}
void mm_write(unsigned short address, unsigned char value) {
	memory[address] = value;
}

memory.c

#define MEMORY_SIZE	65536
unsigned char memory[MEMORY_SIZE];

memory.c will be nifty in that it will serve as a global memory variable for all our source files to access. As you can see, the implementation of our API is super simple for now. We just read and write from and to the appropriate addresses in the memory array.

Now, we can modify the cpu_run function to use this new memory controller.

		...
		unsigned char op = mm_read(reg.PC);
		...

I think now is the time to construct our big switch statement to write a basic STDOUT disassembler of the ROM we will (eventually) be reading into our machine.

Let's define and initialize a map of opcode # to the specific opcode mnemonic:

const char *ops[256] = {
	"brk", "ora", "", "", "", "ora", "asl", "", 
	"php", "ora", "asl", "", "", "ora", "asl", "", 			/* 0x0F */
	"bpl", "ora", "", "" , "" , "ora", "asl", "", 
	"clc", "ora", "", "", "", "ora", "asl", "", 			/* 0x1F */
	"jsr", "and", "", "", "bit", "and", "rol", "", 
	"plp", "and", "rol", "", "bit", "and", "rol", "",		/* 0x2F */
	"bmi", "and", "", "", "", "and", "rol", "", 
	"sec", "and", "", "", "", "and", "rol", "", 			/* 0x3F */
	
	"rti", "eor", "", "", "", "eor", "lsr", "", 
	"pha", "eor", "lsr", "", "jmp", "eor", "lsr", "",		/* 0x4F */
	"bvc", "eor", "", "", "", "eor", "lsr", "", 
	"cli", "eor", "", "", "", "eor", "lsr", "",			/* 0x5F */
	"rts", "adc", "", "", "", "adc", "ror", "", 
	"pla", "adc", "ror", "", "jmp", "adc", "ror", "", 		/* 0x6F */
	"bvs", "adc", "", "", "", "adc", "ror", "", 
	"sei", "adc", "", "", "", "adc", "ror", "",			/* 0x7F */
	
	"", "sta", "", "", "sty", "sta", "stx", "", 
	"dey", "", "txa", "", "sty", "sta", "stx", "",			/* 0x8F */
	"bcc", "sta", "", "", "sty", "sta", "stx", "", 
	"tya", "sta", "txs" "", "", "sta", "", "",			/* 0x9F */
	"ldy", "lda", "ldx", "", "ldy", "lda", "ldx", "", 
	"tay", "lda", "tax", "", "ldy", "lda", "ldx", "",		/* 0xAF */
	"bcs", "lda", "", "", "", "lda", "ldx", "", 
	"clv", "lda", "tsx", "", "ldy", "lda", "ldx", "",		/* 0xBF */
	
	"cpy", "cmp", "", "", "cpy", "cmp", "dec", "", 
	"iny", "cmp", "dex", "", "cpy", "cmp", "dec", "",		/* 0xCF */
	"bne", "cmp", "", "", "", "cmp", "dec", "", 
	"cld", "cmp", "", "", "", "cmp", "dec", "", 			/* 0xDF */
	"cpx", "sbc", "", "", "cpx", "sbc", "inc", "", 
	"inx", "sbc", "nop", "", "cpx", "sbc", "inc", "", 		/* 0xEF */
	"beq", "sbc", "", "", "", "sbc", "inc", "", 
	"sed", "sbc", "", "", "", "sbc", "inc", ""			/* 0xFF */
};

From here, we can modify our CPU loop to output the specific opcode that is being read:

		...
		unsigned char op = mm_read(reg.PC);
		
		printf("%s\n", ops[op]);
		
		...

From here, we what we will need to do is go through every case statement within our switch block and increment the number of bytes needed to advance to the program counter to the next logical instruction.

We've hit another thing that we need... we need a way to read in a ROM file into our memory. For that, we'll have to write a ROM loader that will specifically load a 32K ROM into memory. We'll call these files rom_loader.h and rom_loader.c. This way, we can being executing our program.

The API for now will be simple. It will just consist of a single function load_rom that will use the mm_write function to write bytes to our RAM from the file.

rom_loader.h

#ifndef GR_ROM_LOADER_H
#define GR_ROM_LOADER_H
void load_rom(char *filename);
#endif

rom_loader.c

void load_rom(char *filename) {
	FILE *fp;
	int data;
	unsigned short address;
	address = 0x8000;
	fp = fopen(filename, "rb");
	while((data = fgetc(fp)) != EOF) {
		mm_write(address, data);
		address++;
	}
	fclose(fp);
}

I decided to hard code the memory address to 0x8000 now since that's where the NES ROM file starts.

Let's define an enum type in a new file called mmap.c and use that to consistently apply the proper memory addresses across our emulator:

mmap.c

enum MEMORY_MAP {
	WORK_RAM 		= 0x0000,
	IO_PORTS 		= 0x2000,
	EXPANSION_PORTS	 	= 0x5000,
	CARTRIDGE_RAM		= 0x6000,
	CARTRIDGE_ROM_LOW 	= 0x8000,
	CARTRIDGE_ROM_HIGH 	= 0xC000
};

Now we can go back to the load_rom function within rom_loader and edit the hardcoded address line to be the address of where our cartridge ROM should start:

	...
	address = CARTRIDGE_ROM_LOW;
	...

The 6502 has several addressing modes that will determine the number of bytes to read in as operands for the opcode. You can reference the various addressing modes that are valid in the 6502 here in this 6502 Addressing Modes page.

We will create two files called addresser.h and addresser.c that will help us determine the real addresses in which the opcode and operand need to fulfill the CPU operations. The API will just contain corresponding functions to address modes available in the 6502.

addresser.h

#ifndef GR_ADDRESSER_H
#define GR_ADDRESSER_H
unsigned short addr_absolute(unsigned short address);
unsigned short addr_pc_relative(unsigned short pc, signed char operand);
unsigned short addr_stack(unsigned short s);
unsigned short addr_zero_page(unsigned char operand);
unsigned short addr_absolute_x(signed char x, unsigned short operand);
unsigned short addr_absolute_y(signed char y, unsigned short operand);
unsigned short addr_zero_page_x(signed char x, unsigned char operand);
unsigned short addr_zero_page_y(signed char y, unsigned char operand);
unsigned short addr_absolute_indirect(unsigned short operand);
unsigned short addr_zero_page_indirect_y(signed char y, unsigned char operand);
unsigned short addr_zero_page_indirect_x(signed char x, unsigned char operand);
#endif

addresser.c

#include "addresser.h"
#include "mem_man.h"
unsigned short addr_absolute(unsigned short address) {
	unsigned short effective_address;
	effective_address = address & 0xFFFF;
	return effective_address;
}
unsigned short addr_pc_relative(unsigned short pc, signed char operand) {
	unsigned short effective_address;
	effective_address = pc + operand;
	return effective_address;
}
unsigned short addr_stack(unsigned short s) {
	unsigned short effective_address;
	effective_address = s & 0xFFFF;
	return effective_address;
}
unsigned short addr_zero_page(unsigned char operand) {
	unsigned short effective_address;
	effective_address = operand & 0x00FF;
	return effective_address;
}
unsigned short addr_absolute_x(signed char x, unsigned short operand) {
	unsigned short effective_address;
	effective_address = operand + x;
	return effective_address;
}
unsigned short addr_absolute_y(signed char y, unsigned short operand) {
	unsigned short effective_address;
	effective_address = operand + y;
	return effective_address;
}
unsigned short addr_zero_page_x(signed char x, unsigned char operand) {
	unsigned short effective_address;
	effective_address = (operand + x) & 0x00FF;
	return effective_address;
}
unsigned short addr_zero_page_y(signed char y, unsigned char operand) {
	unsigned short effective_address;
	effective_address = (operand + y) & 0x00FF;
	return effective_address;
}
unsigned short addr_absolute_indirect(unsigned short operand) {
	unsigned short effective_address;
	unsigned char effective_low;
	unsigned char effective_high;
	effective_high = mm_read(operand);
	effective_low = mm_read(operand + 1);
	effective_address = ((effective_high << 8) | (effective_low));
	return effective_address;
}
unsigned short addr_zero_page_indirect_y(signed char y, unsigned char operand) {
	unsigned short effective_address;
	unsigned char effective_low;
	unsigned char effective_high;
	unsigned char indirect_address;
	indirect_address = operand;
	effective_high = mm_read(indirect_address);
	effective_low = mm_read(indirect_address + 1);
	effective_address = (((effective_high << 8) | effective_low) + y) & 0xFF;
	return effective_address;
}
unsigned short addr_zero_page_indirect_x(signed char x, unsigned char operand) {
	unsigned short effective_address;
	unsigned char effective_low;
	unsigned char effective_high;
	unsigned char indirect_address;
	indirect_address = operand + x;
	effective_high = mm_read(indirect_address);
	effective_low = mm_read(indirect_address + 1);
	effective_address = ((effective_high << 8) | effective_low) & 0xFF;
	return effective_address;
}

We do not implement the following addressing modes because they do not access memory:

  1. Implied
  2. Accumulator
  3. Immediate

Other than that, everything else should return some sort of memory address. :)

At this point, it may, or may not start to become clear now that we should probably perform some sort of testing on our code. Unit testing will be pretty important to this project. We will need to test frequently to make sure we do not inadvertently implement something incorrectly and not know until down very far in our implementation.

For the unit tests, let's use a library called CuTest. All we will have to do is include the CuTest.h and CuTest.c files into our project. One should read the README.txt that accompanies the library for examples on how the unit test framework should be integrated to the project.

Let's write some unit tests for our addresser module.

void test_addr_pc_relative(CuTest *tc) {
	unsigned short result;
	puts("Testing program counter relative addressing mode.");
	result = addr_pc_relative(0x8096, -100);
	CuAssertIntEquals(tc, 0x8032, result);
	result = addr_pc_relative(0x8000, 0xA);
	CuAssertIntEquals(tc, 0x800A, result);
}

We then write the rest of our tests for addresser and proceed to add them to the test suite accordingly:

CuSuite *AddresserGetSuite() {
	CuSuite *suite = CuSuiteNew();
	SUITE_ADD_TEST(suite, test_addr_absolute);
	SUITE_ADD_TEST(suite, test_addr_pc_relative);
	SUITE_ADD_TEST(suite, test_addr_stack);
	SUITE_ADD_TEST(suite, test_addr_zero_page);
	SUITE_ADD_TEST(suite, test_addr_absolute_x);
	SUITE_ADD_TEST(suite, test_addr_absolute_y);
	SUITE_ADD_TEST(suite, test_addr_zero_page_x);
	SUITE_ADD_TEST(suite, test_addr_zero_page_y);
	SUITE_ADD_TEST(suite, test_addr_absolute_indirect);
	SUITE_ADD_TEST(suite, test_addr_zero_page_indirect_y);
	SUITE_ADD_TEST(suite, test_addr_zero_page_indirect_x);
	return suite;
}

Running our test suite, we will get the output:

Testing absolute addressing mode.
Testing program counter relative addressing mode.
Testing stack relative addressing mode.
Testing zero page addressing mode.
Testing absolute X addressing mode.
Testing absolute Y addressing mode.
Testing zero page X addressing mode.
Testing zero page Y addressing mode.
Testing absolute indirect addressing mode.
Testing zero page indirect Y addressing mode.
Testing zero page indirect X addressing mode.
...........
OK (11 tests)

:)

Now, it is time to implement our instructions. Since each instruction has multiple implementations depending on the opcode, we will prototype every function that will execute the emulated instructions like so:

void adc(const char opcode);

The above will implement the ADC instruction. Depending on the opcode passed, we will retrieve the operands and operate accordingly.

An implementation skeleton would look something like this:

void adc(const char opcode) {
	switch(opcode) {
		case 0x61:
			break;
		case 0x65:
			break;
		case 0x69:
			break;
		case 0x6D:
			break;
		case 0x71:
			break;
		case 0x75:
			break;
		case 0x79:
			break;
		case 0x7D:
			break;
	}
}

Let's do that for all our instructions and modify the main CPU loop to call these new functions.