Pdf malware

Adam Tilmar Jakobsen · April 20, 2023

PDF documents are regularly exchanged by people, and for that reason also used by malware authors as a delivery method for their payload. Which is why it has taken the nickname payload deliver file. Opening pdf files within a chromium browser makes it harder for the payload to infect the system, because of the built-in sandboxing feature.

Some threat actors have started to password protect the pdf file, frequently the password is in the email, to stop anti-virus from scanning the file and bypassing the security tools. With a password-protected pdf, you are still able to view the structure of the file, but you need to decrypt the file to examine them further. Tools such as qpdf and pdftk to decrypt pdf files.

The structure and contents of a PDF file are defined using objects, which use keywords that tell the pdf reader how to handle the data, within the object.

Here are the common object and their keyword that is used by malware:

  • JavaScript: /JS, /JavaScript, /AcroForm, /XFA
  • Launch external or embedded programs: /Launch, /EmbeddedFiles
  • Acton: /OpenAction, /AA
  • Web calls: /URI, /SubmitForm

A PDF file is a collection of elements that describe the file’s structure and contents and provide rendering and, possibly, execution instructions. This slide illustrates a typical physical layout of a PDF document. A PDF file is laid out like this: • The PDF header contains information about the version of PDF. • Objects, specify how to render the document, including text, fonts, graphics, and dynamic components such as JavaScript. • After the objects, the PDF file includes its cross-reference (xref) table, which specifies the offsets where the file’s objects are located. • At the end of the file is a trailer. It contains vital details such as the offset of the xref table, the number of objects (size), which object is the first (root) object in the PDF’s logical layout, and sometimes metadata.

Example encoded object

44 0 obj # Object number
<<
    /Filter
        [/FlateDecode]
    /Length 463         # the length of the object
>>
Stream
    encoded contents
endstream
endobj

Object referencing object

obj 11 0   #Object number
 Type:
 referencing: 15 0 R    #The object that is reference

When starting the process of analyzing pdf, the best place to start looking is at all the keywords that are present within the pdf. From there you can start analyzing the individual potential malicious objects. Use pdf-parser to list out all the objects.

pdf-parser.py file.pdf -a 

Using the -s switch you can search for keywords that are present in objects.

pdf-parser.py file.pdf -s /URI 

You can use the -o switch if you want to display the content of a specific object.

pdf-parser.py file.pdf -o 5

Once found an object of interest is it can be dumped out for further analysis. This is especially useful for dealing with JavaScript as it is frequently obfuscated with the pdf. Thus dumping it out to a file allow you to work with it as any other JavaScript. And starting on the process of analyzing the code to identify the purpose.

pdf-parser.py file.pdf -o 5 -d object5

Another useful method is to find any objects that reference a specific object.

pdf-parser.py file.pdf -r 5

Tools

Tool description
pdfid.py display keywords in pdf
pdf-parser.py Analysis of pdf

Malicious PDF files can operate in different ways. Some take advantage of the capability to perform malicious actions by interacting with programs, while others act as droppers or downloaders for the next stage. The key takeaways are:

  • Look for risky and unusual objects.
  • Locate, extract and decode code within the file.

Twitter, Facebook