GPK
Used in the following game(s):
- Schooldays HQ
Structure
Overall | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Size | Content | Description | ||||||||||||
5120 Bytes | Unknown | Microsoft.Net assembly *.EXE | ||||||||||||
? Bytes | Data | full archive data | ||||||||||||
? Bytes | Index | Protected Space | ||||||||||||
12 Bytes | Identifier | "STKFile0PIDX" | ||||||||||||
4 Bytes | File Size | Size of the protected space | ||||||||||||
16 Bytes | Identifier | "STKFile0PACKFILE" |
Protected Space | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Size | Content | Description | ||||||||||||
4 Bytes | Unknown | |||||||||||||
? Bytes | Unknown |
Research
Strange format... First 5120 bytes look like program code but is the same code for every archive. Data itself might not be protected but index data, including offsets are. There may be some anti-debugging code. If the debugger eats up the exception that it throws, the program never catches it so it knows it's being debugged and exits. Passing the exception to the program appears to get around this.
When loading a archive, the game does the following...
reads the last 16 bytes in the file (read offset(fileSize - 16) for 16 bytes).
reads 16 bytes, 32 bytes before the end of the file (read offset (fileSize - 32) for 16 bytes).
The last 4 bytes of the data that was just read is a number, that number is the size of the protected space that is at the end of the file - 32 bytes.
The program then tries to process the protected area. It XOR's the protected space in memory by a number that might change for each byte. Once that is done, the first 4 bytes is a number the program uses for an unknown purpose. It always appears larger than what the protected space is so chances are, it could actually be compressed/encoded data and the number at the front is really just the uncompressed size. Needs more research.
Managed to find unprotected file list while debugging however unsure of structure yet. Lots of unknown data and some could be junk. Looks like it's full width characters (2 bytes per character) for file names. Possibly found offset which is first number after the file name string.
Fuck this format... seriously... All the "tools" I found to see if anyone else managed ended up being a simple program that searches for common headers. Putting this format aside for now. Might return someday... narrowed it down to one gigantic sub routine though. XD With the usage of anti-debugger tricks, who knows what other bullshit is put into this. Could just be a lot of nothing to just confuse people dissembling things... or actually compression. O.o
...actually... Found the source for a program that might help by searching for the identifiers on google. XD
ToDo: Run test on post XOR data to check if it's really just compressed.
4/12/2020
Gonna revisit this format. Seems interesting. I have a few environments setup for reversing programs. I just need to figure out how index structure. If data itself is uncompressed and unprotected, no point in using compression on index only... Hopefully it's not some weirdly complicated format that doesn't really use file names but instead a compiled processes with hashes or something. Looking at the BGM.GPK, I can see all the OGG format headers so data is probably not protected... at least this one. Found 63 headers in that one so might have 63 files or could have other, non-ogg files. Not sure. I'll keep looking at it.