Office document malware

Adam Tilmar Jakobsen · April 19, 2023

Microsoft office support visual basic (VBA) macros, which give adversaries the option to automate the execution of command on any system that opens the file. These macros files are able to interact with the OS, and that is what makes them so dangerous.
These macro files can be heavily obfuscated to both evade detection and make analyzing the file difficult.

There are two formats of office files that come in, and both use the archive format to bundle multiple files together.
The first is the legacy binary format called OLE2 it tries to mimic a file system using the storage for folders and streams for files objects.
The more modern version uses XML, it’s a combination of multiple files into an archive file, that the office product can parse.
The source code gets compiled into p-code, as a binary OLE2 file inside the archive.
Macro without the source code will only run in the specific version of the office for which the pcode was compiled, this is because only the compatible office version is able to decompile it back to VBA source code.
The reverse technique can also be used where you remove the pcode and only leave the source code remaining, this technique is called VBA-purged Another type of macro is excel 4.0 macro, They allowed users to add commands into spreadsheet cells that were then executed to perform a task. As of 2022, this should now be disabled by default.

You should not limit yourself to just the macro. You can also gain a lot of insight from the XML files within, such as the language setting of the author, hidden comments, and sources of the images within the document.

Macro

When analyzing macro, a simple method is to open the document in a secure environment and start debugging the code using the built-in editor within word or excel. It is a good thing that exists a tool that will help do the analysis for us, oletools is a must-have when analyzing any office document. It is a python package that was created with the specific goal of analyzing Microsoft OLE2 files.

Prints detailed information about the macro

olevba file.docx 

Dump and decompress the macros and write the content to a file

oledump.py file.docx -s a -v > scrip1.vbs

You can also dump only one macro

oledump sample.docx -s A5 -v 

Another tool is vmonkey which can emulate the vba code and afterward provide you with a summary of the action taken

vmonkey sample.doc > sample.vmoney

VBA stomping

VBA stomping is the process of deleting the source code inside the office file, The macro will still execute because the embedded compiled code (p-code) is still there. It just requires that the version that was used to create the document and the one which opens it is the same. This technique can be detected with olevba and oledump.

oledump -i sample.docx

It will tell you the size of both the compiled code and the source code. e.g.

9: M    154 154+0 'Marcos/VBA/Module1' 

The first number tells you the size of the compiled pcode (154),
The latter is the size of the compressed source code (0).

If you put the s parameter you can see the source code as hex.

oledump.py sample.docx -s A3s 

Print out the compiled code as hex notices the difference between s and c in the command.

òledump.py sample.docx -s A3c

Because it is compiled it needs to be decompiled, pcode2code can be used for this task.

pcode2code sample.docx

Tools

Tool description
olevba.py print macro information
oledump.py  
vmonkey Emulate vba code

Excel 4.0 macro

Where built for Excel in 1992. Excel 4.0 macro, allowed the users to add commands into spreadsheet cells. The best way to identify if an xlsm file has 4.0 macro is to unzip the excel file.

  1. First, you need to identify all the sheets within the file, by locating xl/workbook.xml.
  2. Inside this file, it will identify all the sheets within the file even if they are hidden
  3. Look for the definedNames tags, It will tell if there are any macros in the sheet.

*Example of excel 4.0 macro in workbook.xml

<definedName name=" _xlnm.Auto_Open">sheet!$A$256</definedName>

Another and easier way is to use olevba or xlmdeobfuscator

olevba sample.xlsm
xlmdeobfuscator -f sample.xlsm

As you can see there are multiple ways for an attack to use an office document as a payload. This is because over the years Microsoft has added features of office documents that allow it to interact with the system. Doing this manual method for all documents is too time-consuming, which is why building a system that will automatically analyze any office documents. This can either be done using a sandbox environment or by creating scripts that utilize the olevba-tools. I will leave this as a task for you.

Twitter, Facebook