Data mining the F-Zero X overdump
Ever go on a treasure hunt? Think of that, except the entire adventure takes place inside a blob of data no one has ever laid eyes on. We data miners are digital archaeologists, and we struck a goldmine. Today, I will tell the tale of the F-Zero X overdump expedition.
Brief attribution
Many video game preservation enthusiasts were hard at work that day, and countless discoveries were made. I may detail the totality of everyone’s findings at a later date, but for now, I want to focus on recounting my own contributions to this data mining effort.
A very big thank you to Forest of Illusion for making this data available to us.
What is the F-Zero X overdump?
Dumping refers to the process of copying a memory chip’s contents. In the case of this particular dump, the chip in question was from a Nintendo 64 development cartridge known to contain F-Zero X.
The cartridge had a 32 MiB capacity, but F-Zero X is only a 16 MiB game. Therefore, any data stored beyond the initial 16 MiB was considered unrelated to F-Zero X, and was colloquially referred to as “the overdump.”
Initial observations
Since the overdump was from the Nintendo 64 era, there was a good chance it would contain uncompressed image data. Therefore, a quick sweep for potential image data was the first order of business.
Almost immediately, these textures were discovered. They are stored in standard RGBA8888 format (8 bits per color channel). Surprisingly, these are from an earlier stage of The Legend of Zelda: Ocarina of Time’s development.
Hunting for geometry (3D models)
There were surviving textures, but how about geometry? Let’s make some assumptions based on what is known about other games from that era. Here is a snippet from the notes I wrote while brainstorming:
* Addresses on the Nintendo 64 use big-endian byte ordering.
* Values use hexadecimal notation unless otherwise specified.
* Scene files reference room files, which contain texture-mapped geometry.
Scene Files
-> Scene files are always located at 64-bit aligned addresses
-> Scene files point to data within themselves through segment addresses
on segment 02.
-> They always begin with a header
-> Some contain multiple headers scattered throughout the file
-> Scene headers contain some notable commands (not an exhaustive list):
-> headers often begin with 15xxxxxx xxxxxxxx or 18xxxxxx xxxxxxxx
-> 03000000 xxxxxxxx points to collision header at local address X
-> 04nn0000 xxxxxxxx defines N room file addresses at local address X
-> The pattern 14000000 00000000 marks the end of the scene header
Room Files
-> Room files also begin with a header
-> Useful commands to know (not an exhaustive list):
-> 01nn0000 xxxxxxxx defines N game entities at local address X
-> 0Bnn0000 xxxxxxxx defines N game objects at local address X
-> 0A000000 xxxxxxxx points to mesh header at local address X
(the mesh header references the geometry)
-> The pattern 14000000 00000000 marks the end of the room header
Derived from
https://wiki.cloudmodding.com/oot/Scenes_and_Rooms
Based on this information, I was able to locate some data bearing a striking resemblance to a scene file header in the overdump.
Writing a short program to locate all the scene files
Since those notes worked so well in helping me locate a scene file using only a hex editor, I decided to write a short program to quickly search the whole overdump based on the same parameters.
I love automation. Check out the source:
/*
* find-scenes.c <z64.me>
*
* locates scene files in a binary blob
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define die(X) { fprintf(stderr, X"\n"); exit(EXIT_FAILURE); }
/* minimal file loader
* returns 0 on failure
* returns pointer to loaded file on success
*/
void *loadfile(const char *fn, size_t *sz)
{
FILE *fp;
void *dat;
/* rudimentary error checking returns 0 on any error */
if (
!fn
|| !sz
|| !(fp = fopen(fn, "rb"))
|| fseek(fp, 0, SEEK_END)
|| !(*sz = ftell(fp))
|| fseek(fp, 0, SEEK_SET)
|| !(dat = malloc(*sz))
|| fread(dat, 1, *sz, fp) != *sz
|| fclose(fp)
)
return 0;
return dat;
}
/* read big-endian u32 */
uint32_t rbeu32(void *data)
{
uint8_t *b = data;
return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | (b[3]);
}
/* retrieve data pointed to by a scene segment
* returns 0 if out of bounds or if invalid segment pointer
*/
void *scenesegment(uint8_t *datBegin, uint8_t *datEnd, uint8_t *scene, void *ptr)
{
uint32_t addr = rbeu32(ptr);
uint8_t *dat = scene + (addr & 0xffffff);
if (!ptr)
return 0;
/* invalid segment pointer */
if ((addr >> 24) != 0x02)
return 0;
/* out of bounds */
if (dat < datBegin || dat >= datEnd)
return 0;
return dat;
}
/* searches for scene header pattern and prints findings */
void findheaders(uint8_t *datBegin, size_t datSz)
{
uint8_t *dat;
uint8_t *datEnd = datBegin + datSz;
const uint8_t endmarker[] = { 0x14, 0, 0, 0, 0, 0, 0, 0};
#define STRIDE 8 /* length of a header command */
for (dat = datBegin ; dat < datEnd; dat += STRIDE)
{
/* potential scene header end marker match */
if (!memcmp(dat, endmarker, sizeof(endmarker)))
{
uint8_t *bounds = dat - 32 * STRIDE;
uint8_t *roomlist = 0;
uint8_t *scene = 0;
uint8_t *sceneEnd = 0;
uint8_t *walk;
uint8_t roomnum = 0;
uint8_t i;
unsigned datAddr;
/* take care to avoid buffer underflow */
if (bounds < datBegin)
bounds = datBegin;
/* walk backwards for a max of 32 unique header commands
* to confirm whether the structure matches what's typical
*/
for (walk = dat; walk > bounds; walk -= STRIDE)
{
switch (*walk)
{
case 0x04:
roomlist = walk + 4;
roomnum = walk[1];
scene = walk; /* some scenes begin with this command */
break;
/* some scenes actually lack these commands */
case 0x15:
case 0x18:
scene = walk;
break;
}
}
/* missing start command, so not a scene */
if (!scene)
continue;
/* not 64-bit aligned, so not a scene */
datAddr = (unsigned)(scene - datBegin);
if (datAddr & 0xf)
continue;
/* no room list */
if (roomnum == 0)
continue;
/* invalid room list pointer */
if (!(roomlist = scenesegment(datBegin, datEnd, scene, roomlist)))
continue;
/* print file address of potential scene header */
fprintf(stdout, "%08X\n", datAddr);
/* walk room list */
for (i = 0; i < roomnum; ++i)
{
uint8_t *this = roomlist + i * sizeof(uint32_t) * 2;
uint32_t begin = rbeu32(this);
uint32_t end = rbeu32(this + sizeof(uint32_t));
/* invalid room address conditions */
if (begin > end
|| begin >= datSz
|| end >= datSz
)
{
fprintf(stdout, " -> ERROR\n");
break;
}
/* list start address of each room referenced by scene */
fprintf(stdout, " -> %08X\n", begin);
/* zero each room file's contents so its header is ignored */
memset(datBegin + begin, 0, end - begin);
/* files are packed such that the end of the scene
* happens to be the same address as the beginning
* of the first room
*/
if (i == 0)
sceneEnd = datBegin + begin;
}
/* it made it through the entire room list without a problem,
* so that data appears to be correct; now zero the scene
* file's contents in case it contains alternate headers
*/
if (i == roomnum)
memset(scene, 0, sceneEnd - scene);
}
}
#undef STRIDE
}
int main(int argc, char *argv[])
{
uint8_t *dat;
size_t datSz;
const char *fn = argv[1];
if (argc != 2 || !fn)
die("arguments: find-scenes file.bin");
if (!(dat = loadfile(fn, &datSz)))
die("failed to load input file");
findheaders(dat, datSz);
/* cleanup */
free(dat);
return 0;
}
Bringing the geometry back to life
Now with a list of all the surviving scene and room files in the overdump, I wrote a new program to extract them.
The first thing I tried after extracting these files was opening them in various programs capable of interpreting and displaying Nintendo 64 geometry. Unfortunately, none of the programs I tried were able to recognize the assets I recovered, so I was faced with a new challenge: find out why they don’t work, and do something about it.
By carefully inspecting each file’s hex dump using the format spec described on the CloudModding wiki as a guide, I was able to determine the following culprits:
- Game entity/object list indices have been shifted and are thus stale.
- The player spawn point in one scene uses an invalid flag.
- Door lists in some scenes are formatted differently.
- Water boxes are formatted differently.
- Collision cameras are formatted differently.
- Certain collision flags have been rearranged.
- Room behavior flags function differently.
- Optional alternate headers complicate things, so disable them.
- Some mesh headers are formatted differently.
- Some geometry accomplishes scrolling texture animation by referencing external data (which doesn’t exist) in a fashion similar to GLSL’s shader uniforms.
- Triangle commands store their indices in different bytes.
- The geometry was compiled using a different microcode.
These culprits were trivial to circumvent: all but one.
Converting the microcode
All the geometry I found is expressed through a graphics microcode. It operates as a fixed function pipeline, much like OpenGL’s immediate mode. A series of commands are encoded as 64-bit instructions, and then each instruction is executed in order to initialize materials, load vertices into the vertex buffer, render triangles, and so on.
The microcode I was able to ascertain these files were using was incompatible with the game I planned on injecting them into, so the only thing left to do was convert them to a compatible microcode. I devised the following steps:
- Extract 3D model data.
- Disassemble to a common intermediate format.
- Recompile with new microcode.
- Inject recompiled 3D model data.
- Repeat for all geometry.
Incompatible 3D model data:
00000000: e700000000000000 fa00000100a100ff ................
00000010: e700000000000000 b900031dc8103478 ..............4x
00000020: bb00000000000000 e700000000000000 ................
00000030: fc41fffffffff638 b700000000002000 .A.....8...... .
00000040: b700000000020000 b700000000000200 ................
00000050: 040040ff03000f10 bf00000000000204 ..@.............
00000060: bf00000000000406 bf00000000080a0c ................
00000070: bf00000000080c0e bf00000000101214 ................
00000080: bf00000000101416 bf00000000181a1c ................
00000090: bf00000000181c1e 0400207f03001010 .......... .....
000000a0: bf00000000000406 bf00000000000608 ................
000000b0: bf00000000020a0c bf00000000020c0e ................
000000c0: b800000000000000 ........
This binary data is then disassembled into C macros using a program called gfxdis:
gfxdis output:
gsDPPipeSync(),
gsDPSetPrimColor(0, qu08(0.00390625), 0x00, 0xA1, 0x00, 0xFF),
gsDPPipeSync(),
gsDPSetRenderMode(AA_EN | Z_CMP | Z_UPD | IM_RD | CVG_DST_CLAMP | ZMODE_INTER | CVG_X_ALPHA | ALPHA_CVG_SEL | GBL_c1(G_BL_CLR_FOG, G_BL_A_SHADE, G_BL_CLR_IN, G_BL_1MA), AA_EN | Z_CMP | Z_UPD | IM_RD | CVG_DST_CLAMP | ZMODE_INTER | CVG_X_ALPHA | ALPHA_CVG_SEL | GBL_c2(G_BL_CLR_IN, G_BL_A_IN, G_BL_CLR_MEM, G_BL_1MA)),
gsSPTexture(0, 0, 0, G_TX_RENDERTILE, G_OFF),
gsDPPipeSync(),
gsDPSetCombineLERP(SHADE, 0, PRIMITIVE, 0, 0, 0, 0, PRIMITIVE, 0, 0, 0, COMBINED, 0, 0, 0, COMBINED),
gsSPSetGeometryMode(G_CULL_BACK),
gsSPSetGeometryMode(G_LIGHTING),
gsSPSetGeometryMode(G_SHADING_SMOOTH),
gsSPVertex(0x03000F10, 16, 0),
gsSP1Triangle(0, 1, 2, 0),
gsSP1Triangle(0, 2, 3, 0),
gsSP1Triangle(4, 5, 6, 0),
gsSP1Triangle(4, 6, 7, 0),
gsSP1Triangle(8, 9, 10, 0),
gsSP1Triangle(8, 10, 11, 0),
gsSP1Triangle(12, 13, 14, 0),
gsSP1Triangle(12, 14, 15, 0),
gsSPVertex(0x03001010, 8, 0),
gsSP1Triangle(0, 2, 3, 0),
gsSP1Triangle(0, 3, 4, 0),
gsSP1Triangle(1, 5, 6, 0),
gsSP1Triangle(1, 6, 7, 0),
gsSPEndDisplayList(),
Finally, my own program called gfxasm recompiles these C macros into the desired microcode:
gfxasm output:
00000000: e700000000000000 fa00000100a100ff ................
00000010: e700000000000000 e200001cc8103478 ..............4x
00000020: d700000000000000 e700000000000000 ................
00000030: fc41fffffffff638 d9ffffff00000400 .A.....8........
00000040: d9ffffff00020000 d9ffffff00200000 ............. ..
00000050: 0101002003000f10 0500020400000000 ... ............
00000060: 0500040600000000 05080a0c00000000 ................
00000070: 05080c0e00000000 0510121400000000 ................
00000080: 0510141600000000 05181a1c00000000 ................
00000090: 05181c1e00000000 0100801003001010 ................
000000a0: 0500040600000000 0500060800000000 ................
000000b0: 05020a0c00000000 05020c0e00000000 ................
000000c0: df00000000000000 ........
This was automated for every file. Some 3D models contain instructions for branching to other 3D models, so I got to have some fun with recursion when writing the code for this.
Exploring the end product
Once everything was working, I compiled it all into a playable demo for others to download and enjoy. Click here to try it for yourself, or skip around the video below to watch someone else explore them.
It doesn’t end there
People are still doing new and exciting things with these findings:
- Click here to explore these worlds directly in your web browser. Great work, Jasper!
- The Space World 97 Experience is an in-progress project aiming to use these and other recovered assets to accurately recreate an early build of The Legend of Zelda: Ocarina of Time.
Press
My data recovery project was a huge success, and thousands of people were happy with the results! It even made a few headlines:
- Walker, Ian. “Ocarina of Time Demo Fragments Reveal a Much Earlier Version of the Game”, Kotaku, 20 Jan. 2021
- Fratz, Chris. “A Zelda 64 Beta Was Found (well, half of it)”, RetroRGB, 24 Jan. 2021
- Girardin, Geoff. “Some Lost Space World 97 Zelda 64 Beta Files Have Been Found”, KeenGamer, 20 Jan. 2021
Source code
The full source code for this is available on GitHub: z64me/overdump-extract-scenes