In this blog post, we’ll walk through the first version of the VMRay Analyzer IDA Plugin, which uses the output of VMRay Analyzer to enrich IDA Pro static analysis with behavior-based data. The plugin adds comments to dynamically-resolved API calls within IDA to show the resolved function, its parameters, return value and timestamp.
Figure 1: Example of an annotation
Logging API calls in sandboxes based on API hooking is unreliable — unhooked functions would get lost, or could be replaced by other, lower-level functions. VMRay Analyzer’s hypervisor-based approach logs exactly the API calls which were made by the sample. Because VMRay Analyzer extracts the exact function calls and the memory addresses they were called from, we can automatically show them in IDA.
In this blog post, we will demonstrate how malware analysts can use the VMRay Analyzer IDA plugin on a Pony sample.
View the VMRay Analyzer Report
The sample we chose for this analysis belongs to the Pony malware family, which is a simple, but very common information stealer. In-the-wild, the Pony sample was hosted at “92.63[.]197.60/p.exe”. It isn’t properly packed so the only thing that sets back a static analysis is that many functions are resolved dynamically, and as IDA warns, the import table is destroyed.
Figure 2: IDA warning for destroyed import table
The VMRay Analyzer report shows a YARA match for the Pony family and multiple VTI rule matches for information stealing capabilities: reading credentials from applications including FTP clients and browsers, and also brute-forcing the password of a user account.
Figure 3: VMRay Analyzer Report for the Pony sample
The VMRay Analyzer IDA plugin adds an annotation to dynamically resolved function calls based on the execution in the sandbox and colors the annotated function calls. This information is visible in the text view, graph view and the pseudocode view.
In the example below the sample tries to read out Internet Explorer’s stored HTTP basic authentication passwords using the CredEnumerateA function. The function parameters and the return value are also commented – from the return value we can tell that the call failed during the sandbox run, on this specific VM there weren’t any stored HTTP basic auth credentials by Internet Explorer. Since the CredEnumerate function is dynamically resolved, normally this would just be visible as a “call dword_” instruction.
Figure 4: Function resolved by the IDA plugin
Figure 5: Without the plugin
The coloring is useful for identifying which branch the sample took when it was executed. This is especially effective in the pseudocode view where it is clearly visible that the conditions before the call were satisfied.
Figure 6: The plugin’s result in the pseudocode view
In addition to the API parameters and return value, the plugin can also add a timestamp to the annotation (the number in square brackets). Figure 7 shows the Pony sample enumerating the stored RDP credentials by calling the same function with the “TERMSRV/*” filter. From the timestamp, we can tell that this section of code was reached sometime after the Internet Explorer credential enumeration. The timestamps help set up a timeline, which can be challenging to do when only relying on static analysis.
Figure 7: Enumeration of stored RDP credentials
If a section of code is called multiple times, the same API call from the same address will be repeatedly called. In this scenario, the plugin only shows the first few calls only, not to make the IDA view huge. This helps to filter out insignificant information and allows analysts to focus on what is relevant. In the figure below, the sample is brute-forcing the password of the guest account by calling LogonUserA with different lpszPassword arguments. The annotation shows the first 5 brute-force attempts. Without the annotations, this function call would be unpleasant to analyze statically: the function is resolved dynamically, and its arguments are also dynamic.
Figure 8: Brute-forcing the password of the Guest account
Retrieving system information: This time the API address is stored in the EAX register, this is also resolved by the plugin.
Figure 9: Querying system information
The sample also steals certificates:
Figure 10: Enumeration of certificates
Then it enumerates usernames of the machine.
Figure 11: Username enumeration
The first step is submitting the sample to the VMRay Analyzer. VMRay Analyzer executes the sample and logs its behavior, including the API calls it made. The full analysis is available as a downloadable ZIP file.
The second step is to download the ZIP file which contains a log of all API calls made by the sample.
Figure 12: Function Log Example
Step three is to open the sample with IDA for static analysis. In the Output window the IDA plugin logs that it started.
Figure 13: The plugin logs that it’s loaded successfully
Step four is loading the downloaded ZIP’ed report via the File/Load file menu.
Figure 14: Loading the sandbox report into the plugin
If necessary, the plugin helps with rebasing the sample and matching the memory regions to the PE sections. For this sample, it is self-explanatory, since there is only 1 section that is a possible match. The plugin uses heuristics to calculate a match percent.
The user can choose to also show the timestamp of each API call. This can also be useful for showing the order of execution of subroutines in IDA.
Figure 15: Choosing a timestamp for each API call
When analyzing an unpacked payload, a malware sandbox is great for finding the main behaviors of a sample: extracting IOCs and network activity and establishing a timeline of events. However, for understanding the internal structure of the sample, and finding every feature it is capable of, it’s even better to combine dynamic analysis with static. The VMRay Analyzer IDA plugin makes this combination easier and helps in situations where static analysis would get stuck.
VMRay Analyzer customers can download the IDA plugin from the Integration page within the web interface.