So, several years ago I wrote a post talking about decoding the X-Wing vs. TIE fighter briefing file format. Blog posts are a perfect medium for this sort of thing, since they give me a place to take notes, and they give you a place where you can listen to me talk about file formats. Win-win.
Anyway. I find myself in this position again, in my work decoding World of Warships replays. For some parts, I want to access data from the game files, to supplement what’s in the replay itself. So, this is the tale of how I figured out what I figured out. You can find the code on my GitHub here, in Rust of course.
The World of Warships game files have two parts, a .pkg and a corresponding .idx file:
lane@tellurium:~/.steam/steam/steamapps/common/World of Warships$ ls res_packages/ | head
basecontent_0001.pkg
camouflage_0001.pkg
clanbase_0001.pkg
crates_0001.pkg
gui_0001.pkg
particles_0001.pkg
shaders_0001.pkg
sound_banks_0001.pkg
sound_banks_logic_0001.pkg
sound_languages_0001.pkg
lane@tellurium:~/.steam/steam/steamapps/common/World of Warships$ ls bin/4344082/idx/ | head
basecontent.idx
camouflage.idx
clanbase.idx
crates.idx
gui.idx
particles.idx
shaders.idx
sound_banks.idx
sound_banks_logic.idx
sound_languages.idx
We’re conveniently provided with a utility, wowsunpack.exe
, which can extract files from the above. For example, if we run it passing the .idx and .pkg directories, we can see a list of the contained files:
$ wine ~/Downloads/wowsunpack.exe ~/.steam/steam/steamapps/common/World\ of\ Warships/bin/4344082/idx/ -p ~/.steam/steam/steamapps/common/World\ of\ Warships/res_packages/ --list
content/animation/world/port/NWP011_Dock_Fjords_OVP048.animation
content/animation/world/port/NWP025_Hamburg_OSV071.animation
content/animation/world/port/NWP028_Hamburg_OVA187.animation
content/animation/world/port/NWP049_Dock_NY2020.animation
content/animation/world/port/NWP024_Hamburg_OSB3040.animation
content/animation/world/port/NWP029_Hamburg_OVA188.animation
content/animation/world/port/NWP030_Hamburg_LVR046.animation
content/animation/world/port/NWP032_Hamburg_OVP100_Graf.animation
content/animation/world/port/NWP041_Dock_NY2020.animation
...
and so forth.
And no, sadly, the file
command didn’t reveal anything about either file type.
I have a hunch, that the .idx file is basically an index into the .pkg file, which holds the actual data. (I got this hunch because I ran the strings
command on a .idx file, and saw a bunch of valid filenames. Also because it’s called the “.idx” file) So let’s look at that file first. Scrolling through an xxd dump of such a file, I see that it’s divided into roughly three sections. The first, a bunch of numbers:
00000000: 4953 4650 0000 0002 3e26 0c02 4000 0000 ISFP....>&..@...
00000010: 8201 0000 4701 0000 0100 0000 0000 0000 ....G...........
00000020: 2800 0000 0000 0000 5650 0000 0000 0000 (.......VP......
00000030: a68d 0000 0000 0000 1300 0000 0000 0000 ................
00000040: 4030 0000 0000 0000 af69 7888 dc7c 8388 @0.......ix..|..
00000050: ea87 c443 8b78 f8c6 1300 0000 0000 0000 ...C.x..........
00000060: 3330 0000 0000 0000 af69 d822 0dfa d781 30.......i."....
00000070: ea87 c443 8b78 f8c6 1000 0000 0000 0000 ...C.x..........
00000080: 2630 0000 0000 0000 afb3 8d01 660e 3e76 &0..........f.>v
00000090: c72f dd70 728c f430 2300 0000 0000 0000 ./.pr..0#.......
000000a0: 1630 0000 0000 0000 afd1 07a9 75f0 2660 .0..........u.&`
notice that, after about 0x38, the structure repeats every 0x20 bytes. This is probably going to be important later, so I’m going to make a note of it. (and also, notably, that implies that the first 0x38 bytes are probably a header)
The middle section is just a bunch of null-terminated strings:
00003080: 696e 675f 616f 2e64 6473 004c 4e43 3533 ing_ao.dds.LNC53
00003090: 345f 5269 6e67 5f6d 672e 6464 3000 4c42 4_Ring_mg.dd0.LB
000030a0: 4330 3133 2e67 656f 6d65 7472 7900 3035 C013.geometry.05
000030b0: 5f52 696e 675f 6d69 7363 5f61 7370 6861 _Ring_misc_aspha
000030c0: 6c74 3039 5f61 6c70 6861 5f61 2e64 6430 lt09_alpha_a.dd0
000030d0: 004c 4c32 3038 5f41 6c69 696f 6c61 6e69 .LL208_Aliiolani
000030e0: 6861 6c65 2e67 656f 6d65 7472 7900 4c42 hale.geometry.LB
000030f0: 4330 3133 004c 4e43 3533 365f 5269 6e67 C013.LNC536_Ring
00003100: 5f61 2e64 6432 006c 6f63 6174 696f 6e00 _a.dd2.location.
00003110: 6369 7479 0030 355f 5269 6e67 5f6d 6973 city.05_Ring_mis
00003120: 635f 6173 665f 3031 5f6e 2e64 6473 0062 c_asf_01_n.dds.b
00003130: 7569 6c64 696e 6700 4c4c 3137 325f 526f uilding.LL172_Ro
Comparing the strings and the list output, it appears that both the directory names and the file names are mixed in this list. So I suspect that the first (or third?) section is a definition of a tree, with pointers into this list.
And then the third section is more numbers:
00005040: 652e 7562 6572 7365 7474 696e 6773 0074 e.ubersettings.t
00005050: 7265 6573 2e62 696e 0077 6561 7468 6572 rees.bin.weather
00005060: 732e 786d 6c00 2fbd 5bae 32a2 031f 57c0 s.xml./.[.2...W.
00005070: 43c5 2a7b 8eca b4bf 0100 0000 0000 0500 C.*{............
00005080: 0000 0100 0000 9ee4 0100 af91 b5de 7c1f ..............|.
00005090: 0700 0000 0000 af69 d822 0dfa d781 57c0 .......i."....W.
000050a0: 43c5 2a7b 8eca ad34 b200 0000 0000 0500 C.*{...4........
000050b0: 0000 0100 0000 32e2 0f00 3a76 1ae3 8000 ......2...:v....
000050c0: 4000 0000 0000 afb3 8d01 660e 3e76 57c0 @.........f.>vW.
000050d0: 43c5 2a7b 8eca 0000 0000 0000 0000 0500 C.*{............
000050e0: 0000 0100 0000 b607 0000 b43a b1ca 6e18 ...........:..n.
000050f0: 0000 0000 0000 afd1 07a9 75f0 2660 57c0 ..........u.&`W.
it seems like there’s a 0x30-byte repeating pattern.
Let’s look more into the first section, because it’s first. If you look carefully, there appears to be a decrementing number right around 0x40, 0x60, 0x80, etc. - and this number, 0x3040, 0x3033, 0x3026, is suspiciously close to the offsets for the strings. (this may seem like a bit of a stretch, but I looked at a different .idx file where the strings started closer to 0x15000, and the numbers changed accordingly)
Focusing right on the end of the first section, we see:
00002fe0: 4a20 0000 0000 0000 b152 98b6 f4be c9e9 J .......R......
00002ff0: f0da 53cb dd59 678f 0f00 0000 0000 0000 ..S..Yg.........
00003000: 3520 0000 0000 0000 8d90 32ef 74e7 74db 5 ........2.t.t.
00003010: f0da 53cb dd59 678f 1300 0000 0000 0000 ..S..Yg.........
00003020: 2420 0000 0000 0000 97a2 275e 5219 c9ef $ ........'^R...
00003030: f0da 53cb dd59 678f 0a00 0000 0000 0000 ..S..Yg.........
00003040: 1720 0000 0000 0000 f17f bb65 b8f0 2a82 . .........e..*.
00003050: f0da 53cb dd59 678f 0d00 0000 0000 0000 ..S..Yg.........
00003060: 0120 0000 0000 0000 340e 2e54 c449 63cc . ......4..T.Ic.
00003070: f0da 53cb dd59 678f 4c4e 4335 3331 5f52 ..S..Yg.LNC531_R
00003080: 696e 675f 616f 2e64 6473 004c 4e43 3533 ing_ao.dds.LNC53
and, on a hunch (that the strings are in roughly the same order as the tree nodes), here’s the end of the second section:
00004fc0: 3031 5f6e 2e64 6473 0030 355f 5269 6e67 01_n.dds.05_Ring
00004fd0: 5f6d 6973 635f 7761 6c6c 3032 5f61 2e64 _misc_wall02_a.d
00004fe0: 6473 0030 355f 5269 6e67 0073 7061 6365 ds.05_Ring.space
00004ff0: 7300 6465 636f 722e 6269 6e00 6c69 6768 s.decor.bin.ligh
00005000: 746d 6170 5f73 6861 646f 772e 6464 7300 tmap_shadow.dds.
00005010: 6d69 6e69 6d61 705f 7761 7465 722e 706e minimap_water.pn
00005020: 6700 7363 7265 656e 2e70 6e67 0073 7061 g.screen.png.spa
00005030: 6365 2e73 6574 7469 6e67 7300 7370 6163 ce.settings.spac
00005040: 652e 7562 6572 7365 7474 696e 6773 0074 e.ubersettings.t
00005050: 7265 6573 2e62 696e 0077 6561 7468 6572 rees.bin.weather
00005060: 732e 786d 6c00 2fbd 5bae 32a2 031f 57c0 s.xml./.[.2...W.
We know that 0x2001 isn’t an address into the strings. But 0x2001 + 0x3060 (the offset of the number “0x2001” itself), 0x5061, is suspiciously close to the end of the string segment. So, I’m going to guess some sort of relative pointer. Let’s prove that hypothesis.
One observation is that 0x2017 - 0x2001 is 0x16, but the 0x2001 pointer’s base address is 0x20 bytes later - so if it is a relative pointer, it actually points to a string that is 0xA bytes later. And indeed, one of the last strings (trees.bin\0
) is 10 (0xA) characters long.
Let’s do that exercise again. 0x2024 - 0x2017 is 0xD, but since the base address is later the 0x2017 number is referring to a base address 0x20 - 0xD = 0x13 later. And indeed, space.ubersettings\0
is 19 (0x13) characters long. So I think we’re on to something.
Then there’s the question of, what’s the offset relative to? The entry, or the pointer itself? If we suppose that the 0x2001 number, located at 0x3060, refers to the trees.bin
string, located at 0x504F, that gives us a difference of 0x1FEF or 0x12 less than the offset.
Let’s test the first number we saw, 0x3040 located at 0x40. 0x3040 + 0x40 + 0x12 = 0x3092, and if we look at 0x3092 we see… the middle of a string. Darn, back to the drawing board.
Oh wait, I’m bad at math. It should be a -0x12, maybe. Let’s see: 0x3040 + 0x40 - 0x12 = 0x306E, which is…
00003040: 1720 0000 0000 0000 f17f bb65 b8f0 2a82 . .........e..*.
00003050: f0da 53cb dd59 678f 0d00 0000 0000 0000 ..S..Yg.........
00003060: 0120 0000 0000 0000 340e 2e54 c449 63cc . ......4..T.Ic.
00003070: f0da 53cb dd59 678f 4c4e 4335 3331 5f52 ..S..Yg.LNC531_R
00003080: 696e 675f 616f 2e64 6473 004c 4e43 3533 ing_ao.dds.LNC53
00003090: 345f 5269 6e67 5f6d 672e 6464 3000 4c42 4_Ring_mg.dd0.LB
…also not a string.
Oh, wait, I’m actually really bad at math. Based on the above analysis, the 0x2001 number actually refers to the last string, because the difference between it and the 0x2017 number is trees.bin\0
, and since 0x2017 points to an earlier address it must be the one referring to trees. (I figured this out by doing the 0x12 math on 0x2001 and 0x2017)
Okay. So 0x5059 (weathers.xml
) - 0x2001 - 0x3060 is -0x8. Okay, so the base address is 8 bytes before the number appears (right around the long segment of zeroes).
For example: 0x3040 + 0x38 is 0x3078, which is exactly the start of LNC531_Ring_ao.dds\0
. Perfect. The next one is 0x3033 + 0x58 = 0x308B, which is… exactly the start of LNC534_Ring_mg.dd0\0
. Magnifico.
We have an algorithm. Let’s start writing some code.
Code
I’ll be using Rust, obviously, but there are two very good reasons for this:
- I want to eventually integrate this parser into a larger project that’s already written in Rust,
- and I like Rust.
use std::io::Read;
fn main() {
let mut contents = vec![];
let mut f = std::fs::File::open("World of Warships/bin/4344082/idx/spaces_ring.idx").unwrap();
f.read_to_end(&mut contents).unwrap();
println!("Got {} bytes", contents.len());
}
outputs:
lane@tellurium:~/wows-replays$ cargo run --bin idxpkg
Compiling idxpkg v0.1.0 (/home/lane/wows-replays/idxpkg)
Finished dev [unoptimized + debuginfo] target(s) in 6.21s
Running `target/debug/idxpkg`
Got 36323 bytes
which agrees with ls:
lane@tellurium:~/.steam/steam/steamapps/common/World of Warships$ ls -altr bin/4344082/idx/spaces_ring.idx
-rwxrwxr-x 1 lane lane 36323 Aug 14 22:34 bin/4344082/idx/spaces_ring.idx
so we’re in business.
A couple assumptions to make right off the bat - since the base offset of the pointer was 8 bytes before, I’m going to assume that the first 0x38 bytes are a file header. So let’s skip that.
Also, we don’t actually know how many records there are. Since we know it ends at 0x3078, and we’re assuming it starts at 0x38, that’s 0x3040 bytes or 0x182 (386) 0x20-byte segments. So let’s hardcode parsing that many relative pointers.
If you want to follow along, all of this code uses nom, which is extremely useful for parsing data.
/// This reads an entire 20-byte entry, extracting the relative pointer.
fn parse_pointer(i: &[u8]) -> IResult<&[u8], i64> {
// "take" is a nom combinator which just grabs N bytes into a slice
let (i, unknown) = take(8usize)(i)?;
let (i, ptr) = le_i64(i)?;
let (i, unknown2) = take(16usize)(i)?;
Ok((i, ptr))
}
fn parse_pointers_with_strings(i: &[u8]) -> IResult<&[u8], ()> {
let mut i = i;
for _ in 0..0x182 {
let (new_i, ptr) = parse_pointer(i)?;
let ptr = ptr as usize;
// The string is nul-terminated, find the nul termination
let mut len = 0;
loop {
if i[ptr + len] == 0 {
break;
}
len += 1;
}
// Now that we know the length, grab the bytes and turn it into a string
let string = &i[ptr..ptr + len];
let string = std::str::from_utf8(string).unwrap();
println!("0x{:x} {}", ptr, string);
i = new_i;
}
Ok((i, ()))
}
/// Strips the header, parsing the pointers & strings
fn parse_file(i: &[u8]) -> IResult<&[u8], ()> {
let (i, header) = take(0x38usize)(i)?;
let (i, ptrs) = parse_pointers_with_strings(i)?;
Ok((i, ()))
}
fn main() {
let mut contents = vec![];
let mut f = std::fs::File::open("World of Warships/bin/4344082/idx/spaces_ring.idx").unwrap();
f.read_to_end(&mut contents).unwrap();
println!("Got {} bytes", contents.len());
parse_file(&contents).unwrap();
}
and, behold, we get the strings we expect:
Got 36323 bytes
0x3040 LNC531_Ring_ao.dds
0x3033 LNC534_Ring_mg.dd0
0x3026 LBC013.geometry
0x3016 05_Ring_misc_asphalt09_alpha_a.dd0
0x3019 LL208_Aliiolanihale.geometry
0x3016 LBC013
0x2ffd LNC536_Ring_a.dd2
0x2fef location
0x2fd8 city
0x2fbd 05_Ring_misc_asf_01_n.dds
...
0x20ab 05_Ring
0x2093 spaces
0x207a decor.bin
0x2064 lightmap_shadow.dds
0x2058 minimap_water.png
0x204a screen.png
0x2035 space.settings
0x2024 space.ubersettings
0x2017 trees.bin
0x2001 weathers.xml
.pkg files
Okay. So now that we’ve made some progress on extracting strings, let’s shift gears and look at the .pkg files. Starting with a quick scroll-through of xxd
’s output, no obvious structure pops out at me. For example, the beginning has no clear header:
00000000: ed98 7f4c d475 18c7 1f0c 454f 4021 2ec9 ...L.u....EO@!..
00000010: 4e45 c113 23e5 4e45 90f3 be98 96a9 5027 NE..#.NE......P'
00000020: 9ad6 4c97 a256 6af9 b39c 9af0 41fc 314b ..L..Vj.....A.1K
00000030: ad4d cdca 1f68 c690 e9e0 0bde 1d37 d79d .M...h.......7..
00000040: 60b8 d256 36d7 5470 5a6a 1ab1 a529 b3d5 `..V6.TpZj...)..
00000050: 1a5c 3dcf e7fb 1bce 96cb 3f9a 93ef becf .\=.......?.....
however, towards the end, there are a couple regions where a lot of zeros appear in order:
0bfb7d20: 14ed 1736 5cd8 7d61 df85 3d0a 1736 5ed8 ...6\.}a..=..6^.
0bfb7d30: 77b1 9fba e161 141f 5c92 5f94 4b44 7060 w....a..\._.KDp`
0bfb7d40: 0b0b e203 c212 6f30 0c96 e832 2336 b26c ......o0...2#6.l
0bfb7d50: f451 f22f 0000 0000 0011 c5f5 10ce 0300 .Q./............
0bfb7d60: 0000 0000 00 .....
so that gives me hope that this data is not encrypted.
I unpacked all of the files within this single .pkg, and found a .xml file. Running strings
on the file and grepping for “HDR_Shot”, which appears in the .xml file, returns no results - so it’s clear that the .pkg files are not stored in cleartext. It could be compressed though. To check, I add up the size of all the (327) extracted files, a quick du -cb
gives me 372,587,527 bytes. However, the .pkg file is only 201,031,013 bytes - so there’s some sort of compression going on.
Making a wild guess, files are compressed individually, rather than concatenated and then compressed as a whole. For the compression algorithm to use, I think we gotta guess - fortunately, I already know the compression algorithm used for the .wowsreplay files, and I can’t imagine them using a different one, so let’s try it.
…and that algorithm (zlib) doesn’t work, but after some trial and error, the DeflateDecoder
appears to work:
fn main() {
// Open the .pkg file & read it
let mut contents = vec![];
let mut f = std::fs::File::open("World of Warships/res_packages/spaces_ring_0001.pkg").unwrap();
f.read_to_end(&mut contents).unwrap();
println!("Got {} bytes", contents.len());
// Grab the first 256 bytes of the file, use them to extract 64 bytes.
// I only grab the first 256 bytes in case later bytes cause the decoder
// to decide it's an invalid stream
let mut deflater = flate2::read::DeflateDecoder::new(&contents[..256]);
let mut contents = vec![0; 64];
println!("{:?}", deflater.read_exact(&mut contents));
println!("{:?}", contents);
}
we’re just grabbing the first 256 bytes of the file, and extracting 64 bytes of compressed data from that, giving:
Finished dev [unoptimized + debuginfo] target(s) in 0.56s
Running `target/debug/idxpkg`
Got 201031013 bytes
Ok(())
[1, 0, 0, 0, 1, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 136, 0, 0, 0, 0, 0, 0, 0, 200, 0, 0, 0, 0, 0, 0, 0, 220, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
and after some sleuthing, we find that content/location/building/city/LBC013/LBC013.geometry
starts with the same 64-byte sequence:
/tmp/wows/unpack/content/location/building/city/LBC013/LBC013.geometry [1, 0, 0, 0, 1, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 136, 0, 0, 0, 0, 0, 0, 0, 200, 0, 0, 0, 0, 0, 0, 0, 220, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
so it seems highly likely that this is decoding to that file. That file is 6254 bytes, and in order to decode that many bytes from the file we need to uncompress the first 1973 bytes from the .pkg file (found by trial and error).
And if we scroll down to 0x7B5 in our .pkg file, we see the first sequence of contiguous zeros:
00000790: 3529 37c0 bcc2 787e 5276 1d79 7e52 765d 5)7...x~Rv.y~Rv]
000007a0: 787e d20a d35a 53fc ae7c 9c7f 93a1 94d5 x~...ZS..|......
000007b0: dd8e a28f fc37 0000 0000 b43a b1ca 6e18 .....7.....:..n.
000007c0: 0000 0000 0000 ecbd 0f7c 93d5 bd3f 7e48 .........|...?~H
000007d0: f913 d25d e5bb 5d7e f6ea dd4c 4b95 2445 ...]..]~...LK.$E
000007e0: ba67 5ea6 0f4c 311d 8c36 b8e9 324f 23c1 .g^..L1..6..2O#.
I’m kind of curious what the interstitial header is, or at least I want to put some boundaries on it, so let’s try and uncompress the second file. We can use some code like this:
for offset in 0..48 {
let mut deflater = flate2::read::DeflateDecoder::new(&contents[offset + 1973..]);
let mut contents = vec![0; 64];
println!("{} {:?}", offset, deflater.read_exact(&mut contents));
}
to try decompressing starting from a range of offsets (failed decompressions will print “corrupt deflate stream”, successful decompression will print Ok(())
). And we see that the first successful offset is +17, or 1990 (0x7c6).
Seventeen is such a weird number. What the heck.
But okay. We have some idea of what this file looks like, let’s leverage that to make a little more progress.
…and back to .idx
So now that we have this pseudo-.idx-parser, which parses the 0x20-byte block alongside its associated string, we can print out the block associated with the string “LBC013.geometry”, which should be the node associated with that file:
Finished dev [unoptimized + debuginfo] target(s) in 0.54s
Running `target/debug/idxpkg`
Got 36323 bytes
0x3026 LBC013.geometry
|10000000 00000000 26300000 00000000| ........&0...... 00000000
|afb38d01 660e3e76 c72fdd70 728cf430| ....f.>v./.pr..0 00000010
00000020
0x3024, which is the offset to the string (we already knew that), and a bunch of gobblygook. What numbers do we know about our “LBC013.geometry”? - It starts at offset 0 in the file - The compressed data is 0x7B5 bytes long - It therefore ends at either 0x7B5 or 0x7C6, depending on whether you consider the interstitial header to be a part of the previous or next file. - The uncompressed data is 0x186E bytes long.
Well…. none of those numbers show up. Darn.
Okay. Well while we’re here, we can decode two numbers from the 38-byte header:
00000000: 4953 4650 0000 0002 3e26 0c02 4000 0000 ISFP....>&..@...
00000010: 8201 0000 4701 0000 0100 0000 0000 0000 ....G...........
00000020: 2800 0000 0000 0000 5650 0000 0000 0000 (.......VP......
00000030: a68d 0000 0000 0000
Remember how there were 386 strings? Well, that’s 0x182, which appears right there at offset 0x10. Remember how there were 327 unpacked files? That’s 0x147, which is right after it at offset 0x14.
The string segment ends at 0x5066. That doesn’t show up here, but 0x5056 does, so I’m going to hypothesize that that number is related to where the strings end and the third segment starts. So we can get a rudimentary header parser down:
#[derive(Debug)]
struct Header {
num_nodes: i32,
num_files: i32,
third_offset: i64,
first_block: Vec<u8>,
unknown1: i64,
unknown2: i64,
unknown3: i64,
}
fn parse_header(i: &[u8]) -> IResult<&[u8], Header> {
let (i, _) = tag([0x49, 0x53, 0x46, 0x50])(i)?; // ISFP
let (i, first_block) = take(12usize)(i)?;
let (i, num_nodes) = le_i32(i)?;
let (i, num_files) = le_i32(i)?;
let (i, unknown1) = le_i64(i)?;
let (i, unknown2) = le_i64(i)?;
let (i, third_offset) = le_i64(i)?;
let (i, unknown3) = le_i64(i)?;
Ok((
i,
Header {
num_nodes,
num_files,
third_offset,
first_block: first_block.to_owned(),
unknown1,
unknown2,
unknown3,
},
))
}
...
fn parse_file(i: &[u8]) -> IResult<&[u8], ()> {
let (i, header) = parse_header(i)?;
println!("{:#x?}", header);
let (i, ptrs) = parse_pointers_with_strings(header.num_nodes, i)?;
Ok((i, ()))
}
prints out:
Finished dev [unoptimized + debuginfo] target(s) in 0.53s
Running `target/debug/idxpkg`
Got 36323 bytes
Header {
num_nodes: 0x182,
num_files: 0x147,
third_offset: 0x5056,
first_block: [
0x0,
0x0,
0x0,
0x2,
0x3e,
0x26,
0xc,
0x2,
0x40,
0x0,
0x0,
0x0,
],
unknown1: 0x1,
unknown2: 0x28,
unknown3: 0x8da6,
}
and that last number, 0x8da6, just so happens to be almost the length of the file. Let’s keep that in mind.
So if we look at the third section, the one after the strings, we see that it’s a 0x30-byte repetition in structure. It starts at 0x5066, and ends around 0x8db6, which gives approximately 0x3d50 bytes in length - or exactly 0x147 sections, the number of files. Coincidence! I think not. Therefore, I suspect that this third section is what contains the actual pointers into the .pkg file. So, writing up a little parser, I can get a dump of all the file records like so:
12
|ca774409 84ef447a 57c043c5 2a7b8eca| .wD...DzW.C.*{.. 00000000
|7447ed00 00000000 05000000 01000000| tG.............. 00000010
|30030100 df8c42a4 f0550500 00000000| 0.....B..U...... 00000020
00000030
13
|9afcc5e8 55d5923b 57c043c5 2a7b8eca| ....U..;W.C.*{.. 00000000
|c6070000 00000000 05000000 01000000| ................ 00000010
|2c9b0000 c104e9ff e8550100 00000000| ,........U...... 00000020
00000030
14
|5ad77ccb 4d2b48a6 57c043c5 2a7b8eca| Z.|.M+H.W.C.*{.. 00000000
|aa856e09 00000000 05000000 01000000| ..n............. 00000010
|d3620200 808db622 04560500 00000000| .b.....".V...... 00000020
00000030
where the number (12, 13, 14) is the index of the record. If you look carefully, the number at offset 0x10 in the 13th record (the middle one here) is 0x7C6, which is where I theorized that the second file in the .pkg began. If we look around for a record where that number is zero, we find one early in the sequence:
2
|afb38d01 660e3e76 57c043c5 2a7b8eca| ....f.>vW.C.*{.. 00000000
|00000000 00000000 05000000 01000000| ................ 00000010
|b6070000 b43ab1ca 6e180000 00000000| .....:..n....... 00000020
00000030
and, what do you know, the number at offset 0x20 is 0x7B6, only one off from the length! So I’m going to extract these fields, as an i64 offset and a i32 length.
We still don’t have a way of tying these file records to the nodes, though. But we have more information now: - The file record for “LBC013.geometry” is the second one, and starts at 0x50C6. - The node record started at 0x78, so that’s a delta of 0x504E (we know they like relative offsets, so we have to keep that in mind as a possibility)
Comparing the node and the file record side-by-side, though, I actually see a much more interesting equality:
File record:
|afb38d01 660e3e76 57c043c5 2a7b8eca| ....f.>vW.C.*{.. 00000000
|00000000 00000000 05000000 01000000| ................ 00000010
|b6070000 b43ab1ca 6e180000 00000000| .....:..n....... 00000020
00000030
Node:
|10000000 00000000 26300000 00000000| ........&0...... 00000000
|afb38d01 660e3e76 c72fdd70 728cf430| ....f.>v./.pr..0 00000010
00000020
Look at the afb38d01 660e3e76
- exactly equal between the two. My guess is that the two are linked via some sort of globally unique ID (GUID, I guess).
This sort of makes me wonder, whether the connections between nodes (e.g. a node to its parent) is also via GUIDs. The “LBC013.geometry” file is inside the content/location/building/city/LBC013/
path, so I suspect that it’ll have a “parent” marked as the “LBC013” node. And if we go get that node:
0x3016 LBC013
|07000000 00000000 16300000 00000000| .........0...... 00000000
|c72fdd70 728cf430 63010286 cbe141f2| ./.pr..0c.....A. 00000010
00000020
Holy macaroon, the c72fdd70 728cf430
matches exactly the last 8 bytes of the “LBC013.geometry” node.
This is the final (major) piece of the file format - we have the ability to extract the complete tree and then (correctly) extract the file from the .pkg file. .idx files describe a tree, where nodes are identified with a 64-bit number, linking nodes to their parents. Within each node block we’re given the location (offset) of the data in the corresponding .pkg file, and it’s size. Some data is compressed, some is not, but using a similar process we can determine methods to decide how to decode the actual data into the cleartext.
Once we have all these major parts, combining them together is just a matter of software engineering. Like I mentioned at the beginning of the article, all the code is available at GitHub here. To run it, clone the repo, then run:
$ cargo build --release
$ ./target/release/idxpkg help