Move Fast and Don’t Break Things (Part 2): Automated Malware De-obfuscation by Accurate API Monitoring
In our previous blog post, we showed how hypervisor-based API monitoring can achieve accurate logging of API calls at high performance, resulting in a more detailed view of the malware’s internal behavior.
In this blog post we show three practical examples of how this more detailed view can be used in real-world malware analysis:
- getting the unencrypted C2 traffic,
- automatically de-obfuscating scripts,
- and making configuration extraction easier.
The logged API calls are provided by VMRay and used in multiple ways:
- The entire function log can be downloaded in both a human-readable text format and a machine-parsable XML format.
- The strings which appear in the function log as parameters or return values are also extracted for each process as a text file.
VMRay uses these strings for malware detection and classification by matching YARA rules and running Antivirus scans on them. VMRay users can also provide their own YARA rules for the platform to run on the function strings. The feature can also be used to complement the strings tool. The strings tool extracts strings that appear statically in a file, while VMRay provides the strings which appear in API calls during its execution.
Intercepting and detecting the de-obfuscated network traffic
Malware authors want to evade firewalls and sandboxes which would detect their attack based on network traffic. To hide network indicators, they implement custom communication protocols which include a mix of encryption and custom obfuscation. Typically, the sample creates the message it wants to send, then obfuscates and encrypts it. To detect the malware or classify its family based on network traffic, we should intercept the message before the obfuscation.
The malware’s goal with a message is to get some information to the server. Such information can be the identifier of the malware client, information collected from the host, versioning data, stolen credentials, and much more. The malware uses a predefined format to construct a message from this information that can be interpreted by the server. To make constructing messages easier, many malware families use built-in OS API functions. VMRay monitors these calls, exposing the message before it is obfuscated and sent to the C2 server. VMRay also runs YARA rules on the strings extracted from these calls – using this feature we are able to write detections for the non-obfuscated C2 message.
Example Analysis: Ursnif RM3
Ursnif uses a format string to create network beacons, then encrypts and submits them to the C2 server. The VMRay function log provides the network beacon format string, which can be used to identify the family, and often the exact variant.
VMRay Analyzer Report: https://www.vmray.com/analyses/330bf7ae4ba7/report/overview.html
Direct link to full function log: https://www.vmray.com/analyses/330bf7ae4ba7/logs/flog.txt
Direct link to function strings of the Ursnif process: https://www.vmray.com/analyses/330bf7ae4ba7/report/function_strings_process_1.txt
Automating de-obfuscation of scripts
To bypass static detection, malicious scripts are almost always obfuscated. The script interpreter (PowerShell, cmd, cscript) executes the obfuscated script instruction-by-instruction, which de-obfuscates the next instructions from strings. Finally, the real, de-obfuscated script lines are also executed. When monitoring the interpreter’s API calls, the de-obfuscated script is often a parameter to a function call just like any other. This means that we can get lines of the de-obfuscated script by simply executing the malicious script, then precisely monitoring the API calls made by the interpreter. VMRay extracts these function strings from the monitored API calls for each process, and makes it downloadable as a single file. Scrolling towards the end of this file often shows the de-obfuscated script.
Example analysis: Brushaloader
Brushaloader is an obfuscated VBScript loader, which simply requests other VBScripts to execute from the C2 server. VMRay automates de-obfuscation of this malware by monitoring the API calls made by the interpreter, cscript.exe, and extracting strings used as API parameters for the process. These strings contain the de-obfuscated script sending home a POST request.
The server’s reply to the POST request is a second VBScript, also visible in VMRay’s function strings.
After this second script is executed, Brushaloader keeps communicating with the C2 to query further commands. The polite replies from the server are also visible.
VMRay Analyzer Report: https://www.vmray.com/analyses/a54c4c2c8777/report/overview.html
Direct link to the function strings of the cscript.exe: https://www.vmray.com/analyses/a54c4c2c8777/report/function_strings_process_1.txt
Making configuration extraction easier
Malware binaries are usually generated by malware builder tools. Builders allow the attacker to simply change configuration of the malware such as the C2 URLs, modules, injected processes and timing without understanding how the malware works, or even recompiling it.
Malware analysts often want to extract these configurations as part of their research. Creating and maintaining robust, automated configuration extractor scripts is normally time-consuming and requires manual reverse-engineering. VMRay’s granular API monitoring can often help with this.
When the malware payload starts, the malware needs to parse its configuration to read parameters such as C2 URLs. For parsing the configuration, malware often uses convenience functions for converting between types (e.g. string to integer, ASCII to unicode), or splitting and parsing the string. These API calls make the plain text configuration items visible to monitoring.
Example analysis: Trickbot
Trickbot uses configurations for each module and the loader itself. The malware stores these valuable files on disk, but, unfortunately, they are encrypted securely with AES. During its execution, Trickbot reads these files from disk, decrypts and parses them. The decrypted configurations are in XML format, and most notably contain large lists of C2 addresses. The XMLs appear in the VMRay function log as parameters as they are parsed, making it easy for analysts to extract configurations.
VMRay Analyzer Report: https://www.vmray.com/analyses/e1a3d8c2c842/report/overview.html
Direct link to function log of the analysis: https://www.vmray.com/analyses/e1a3d8c2c842/logs/flog.txt
Direct link to function strings of the Trickbot process loading the configuration: https://www.vmray.com/analyses/e1a3d8c2c842/report/function_strings_process_17.txt
As we can see from the examples above, much can be learned when scripts are de-obfuscated and the function strings parsed. However, the capability to do this level of analysis is simply not possible using traditional sandbox technology. In designing these systems, the engineers were constrained in an either/or situation – do an analysis rapidly and not thoroughly OR do a painfully slow, thorough analysis.
Unlike traditional sandbox technology, VMRay has the ability to be both fast and accurate because it operates entirely in the hypervisor layer. This unique design allows our software to monitor all the API calls made by a code sample, giving analysts deep insight into malware behavior that is lacking in other solutions.