It is not a schizophrenic zip file to have inline headers that are not referenced in the TOC. the TOC is the only source of truth in a zip file. It was designed this way in purpose so that you can add new versions of files on your 20 disk zip without having to re-write all 20 disks. Pkzip would read the TOC from disk 20, append your new file to disk 20 (or 21 if there was not enough space) and then write a new TOC at the end that does not reference the old file still in the zip. That is by design. Reading anything other than the files in the TOC is an invalid zip reader
It's an interesting hole that the test cases don't cover any of Microsoft Office, Windows Explorer, PowerShell's various cmdlets, or the several major .NET ZIP archive libraries. It would seem that the author just does not use Microsoft Windows.
There's a whole extra level of archive file format tooling gotchas that one misses out on when one assumes "UNIX" for everything, and does not account for "FAT", "NTFS", "HPFS", and even "OpenVMS".
The trick depends upon different implementations doing different things. Not likely for Word (though I suppose it is -possible- across different versions or different OSes).
To respond to Grandfather comment, modern Office files are really just ZIPs with different extensions, they even have the magic string "PK" at the very beginning of the file.
I do wonder, since a lot of tools outside of the MS ecosystem can read Office files (e.g. LibreOffice and Google Docs as well as plenty of other online tools), if indeed the hack as described by the article is possible. One would just need to figure out the ZIP stacks used by said tools.
You can even just rename a docx file to use the zip extension and then manually unzip it for those curious. If I remember correctly, the contents are XML files with structure encoding the formatting around the content.
The described exploit seems theoretical. In order to create the schizophrenic ZIP, the attacker would have to figure out what ZIP stacks are being used and ensure they act differently - if the 2 departments use the same stack, then the exploit can't work, can it?
There was a time when passing ZIP files around was a very popular method of software distribution, and things like this were gotchas that had to be watched for. It was widely known, at least amongst sysops, that the varied toolsets that handled ZIP archives were functionally different. And there were scanners and sanity checkers, and bugfixes to PKUNZIP, that dealt in this stuff for uploaded files and FREQ responses.
Did people exploit the differences? Yes. Although it was mainly on the level of creating prank ZIP files on non-Microsoft operating systems with 8.3 filenames such as "PRN" or "CLOCK$".
However, the truly terrible idea of self-extracting archives was popular, which meant that archives with "interesting" arrangements of the archive within the overall file were widespread. ZIP comments were also liberally applied and altered by pretty much every BBS that passed an archive along. And the Unix people wanted to be able to use pipes, something that the MS-DOS original never had to cater for.
Also, there were people who exploited the fact that different tools took different things as gospel. Even within the past decade one can find people still being caught out by the fact that there's a header field that instructs what the pathname separator character(s) used are; and that ZIP tools that expect non-seekable streams operate differently to ZIP tools that expect seekable regular files.
A more realistic attack would be something like, slipping a malicious payload past a scanner by emailing a zip file that appears innocent when unpacked with the scanner’s zip implementation but produces malware when unpacked with the email client’s implementation. There’s a decent chance they’ll be different, and it wouldn’t be too hard to guess which ones a target might be using.
The author explains in the article that they previously gave a presentation outlining various techniques to achieve a "schizophrenic" zip file. The blog post discusses an additional technique that was not present in their previous presentation.
It is not a schizophrenic zip file to have inline headers that are not referenced in the TOC. the TOC is the only source of truth in a zip file. It was designed this way in purpose so that you can add new versions of files on your 20 disk zip without having to re-write all 20 disks. Pkzip would read the TOC from disk 20, append your new file to disk 20 (or 21 if there was not enough space) and then write a new TOC at the end that does not reference the old file still in the zip. That is by design. Reading anything other than the files in the TOC is an invalid zip reader
Since docx files are similar to a zip file with the extension changed, could this trick fake out Microsoft Word?
It's an interesting hole that the test cases don't cover any of Microsoft Office, Windows Explorer, PowerShell's various cmdlets, or the several major .NET ZIP archive libraries. It would seem that the author just does not use Microsoft Windows.
There's a whole extra level of archive file format tooling gotchas that one misses out on when one assumes "UNIX" for everything, and does not account for "FAT", "NTFS", "HPFS", and even "OpenVMS".
Or ZIP64. (-:
* https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...
* https://github.com/mihula/ProDotNetZip/blob/main/src/Zip/Zip...
The trick depends upon different implementations doing different things. Not likely for Word (though I suppose it is -possible- across different versions or different OSes).
To respond to Grandfather comment, modern Office files are really just ZIPs with different extensions, they even have the magic string "PK" at the very beginning of the file.
I do wonder, since a lot of tools outside of the MS ecosystem can read Office files (e.g. LibreOffice and Google Docs as well as plenty of other online tools), if indeed the hack as described by the article is possible. One would just need to figure out the ZIP stacks used by said tools.
You can even just rename a docx file to use the zip extension and then manually unzip it for those curious. If I remember correctly, the contents are XML files with structure encoding the formatting around the content.
The Office365 online and desktop implementations of zip could be different.
Obviously it sucks in the real world but I do always appreciate the cleverness of exploits like these.
The described exploit seems theoretical. In order to create the schizophrenic ZIP, the attacker would have to figure out what ZIP stacks are being used and ensure they act differently - if the 2 departments use the same stack, then the exploit can't work, can it?
None of this stuff is theoretical. It's just old.
There was a time when passing ZIP files around was a very popular method of software distribution, and things like this were gotchas that had to be watched for. It was widely known, at least amongst sysops, that the varied toolsets that handled ZIP archives were functionally different. And there were scanners and sanity checkers, and bugfixes to PKUNZIP, that dealt in this stuff for uploaded files and FREQ responses.
Did people exploit the differences? Yes. Although it was mainly on the level of creating prank ZIP files on non-Microsoft operating systems with 8.3 filenames such as "PRN" or "CLOCK$".
* https://groups.google.com/g/alt.comp.virus/c/zLV-Y2a71gs/m/U...
However, the truly terrible idea of self-extracting archives was popular, which meant that archives with "interesting" arrangements of the archive within the overall file were widespread. ZIP comments were also liberally applied and altered by pretty much every BBS that passed an archive along. And the Unix people wanted to be able to use pipes, something that the MS-DOS original never had to cater for.
Also, there were people who exploited the fact that different tools took different things as gospel. Even within the past decade one can find people still being caught out by the fact that there's a header field that instructs what the pathname separator character(s) used are; and that ZIP tools that expect non-seekable streams operate differently to ZIP tools that expect seekable regular files.
A more realistic attack would be something like, slipping a malicious payload past a scanner by emailing a zip file that appears innocent when unpacked with the scanner’s zip implementation but produces malware when unpacked with the email client’s implementation. There’s a decent chance they’ll be different, and it wouldn’t be too hard to guess which ones a target might be using.
Often you don't have to guess, just use how the software responds as an oracle.
Like spam, the exploit would still be profitable if only a small fraction worked.
I don't see anything "another" about this; this problem is well known by $((CURRENTYEAR-10)) or so.
The author explains in the article that they previously gave a presentation outlining various techniques to achieve a "schizophrenic" zip file. The blog post discusses an additional technique that was not present in their previous presentation.