Table of Contents
A new malware family called Stealc was released recently, which is a Spyware designed to copy files, credentials and other sensitive information from the victim’s hard drive and make them available to the attacker. It also employs a variety of techniques to evade detection, including one technique based on remotely storing a hard disk-based hardware ID, which prevents the sample from infecting the same machine twice, and thus avoids revealing malicious behavior in any future analysis runs.
In this blog post, we want to highlight how Stealc, which has been shown to have parallels to other known malware families (such as Vidar, Raccoon, Mars and RedLine), steals sensitive data from its victims and how it tries to evade detection. We have aided our analysis with an in-depth research into how this malware and it’s techniques have been implemented.
Malicious Behavior of Stealc
VMRay Platform uses a dynamic analysis method, which means that the malware is actually executed in a virtual environment, where the actions of the malware are recorded and later analyzed to detect malicious behavior.
These detection rules are known as VMRay Threat Identifiers (VTIs). In our case, they reveal plenty of malicious behavior, ranging from capturing screenshots, reading sensitive e-mail and web browser data, to searching for cryptocurrency wallets (see Figure 1).
The VTIs also provide additional information on what exactly the malware is trying to do if you open one of the listings (see Figure 2).
In this case, the malware tried to find cryptocurrency wallets for Bitcoin, Ethereum and Electrum Bitcoin Wallet. Additionally, one of our triggers suggests that large amounts of data are uploaded to a remote server – a good indicator that some of the information is likely copied to the attacker (see Figure 3).
All of this behavior leads VMRay Platform to correctly conclude that this must be a malicious executable, classified as Spyware.
Now we have a better understanding what kind of information the stealer is collecting, but it is still unclear how the information is sent to the attacker.
Let us look into the Network tab, which displays all the recorded network traffic mapped to the associated Windows API calls that were used to send or receive the data. This is a powerful tool to analyze network traffic – due to our unique and extensive monitoring approach, we can sometimes even capture communication data before it becomes encrypted.
Stealc network activity
The Network tab shows some interesting behavior (see Figure 4). We can see that the malicious sample tries to download a set of DLL files.
These libraries are used by benign popular applications such as web browsers to access and, if required, decrypt confidential information belonging to the application itself – Stealc abuses these libraries to collect the same confidential information, but with malicious intent (see Table below).
As an example, let’s take “sqlite3.dll” as shown in Figure 4 above. This library allows Stealc to access a local database created in the SQLite database engine, which is also the method employed by Mozilla Firefox for storing user session cookies. When an attacker obtains a session cookie, they can potentially use it to access the victim’s account without needing the second factor of authentication (e.g., a one-time password or a biometric). In essence, the attacker bypasses the 2FA mechanism by piggybacking on the authenticated session established by the victim.
Another use-case for these libraries is decrypting all saved login information, such as usernames, e-mail addresses and passwords.
|sqlite3.dll||Accessing SQLite databases, e.g., to extract cookies.|
|nss3.dll, freebl3.dll, softokn3.dll, mozglue.dll||These libraries provide low-level and security-related functionality to Mozilla Firefox, such as cryptographic algorithms, e.g., to decrypt passwords.|
|msvcp140.dll, vcruntime140.dll||Libraries related to Microsoft Visual C++ Redistributables, required for some of the functionality.|
Stealc does not just use the network connection to download additional DLLs, but also to communicate to a remote server about what its intended purpose is. The server then responds, for example, with specific filenames to search for. Analyzing this network behavior becomes easier thanks to one of the best features of our platform for malware researchers: the function log (or “flog” for short). This file contains all observed calls to the Windows API chronologically with human-readable function and parameter names. There is a reason why we internally see the flog as the malware analysts Swiss Army knife.
In this case, we find the base64 encoded network communication string in the function log, as well as the decoded version, which allows us analyze the communication more in-depth (see Figure 5). In one of the exchanges, for example, the C2 server asks our sample to collect information regarding the MetaMask crypto wallet and other web browser extensions, mostly related to crypto wallets and password managers.
Stealc’s Encrypted Strings
Now that we have gathered all behavior-based information, we take a closer look into how the malware is implemented. A look at the code gives us a sense that this malware is likely written in C/C++. However, there are nearly no sensible human-readable strings present.
As we already know by now that this malware tries to find certain sensitive files, it needs to store file paths and search terms, but the strings associated with that process are nowhere to be found (see Figure 6).
Malware often uses obfuscation and encryption to try and hide important information, for example to evade static analysis tools such as antivirus signatures. In the case of Stealc, the strings are stored in an encrypted manner and are decrypted at runtime during the initialization step of the malware.
We have analyzed the sample to identify how the encryption takes place – this not only helps us to better understand the inner workings of the malware but also to develop a config extractor later on. Config extractors are tremendously helpful addition to the VMRay Platform which provides our customers with a malware family classification as well as high-quality IOCs by automatically extracting the configuration such as C2 URLs, encryption keys etc., without requiring manual reverse engineering.
We have identified the encryption algorithm to be RC4, which matches earlier reports about Stealc, however, we also found an issue involving randomly placed null bytes during decryption that all current Stealc decryptors seem to suffer from. A closer inspection reveals a key difference between how RC4 is usually implemented and how it is implemented in Stealc, namely that the ciphertext is not XORed with the keystream if it results in a null-byte, demonstrated in pseudo-code in Figure 7.
One powerful feature is VMRay’s Function Strings, which is a collection of all strings that were passed as an argument to API calls during the analysis of a process. You can access this log file by going to the Behavior tab, selecting the relevant process and opening up “Extracted Function Strings” to download the file (see Figure 8).
YARA & Detection Engineering Tips
Another use of the function strings, other than helping malware researchers to extract useful information even for packed samples, is the possibility of writing YARA rules based on these runtime strings.
This opens up a robust way for detection engineers to identify certain malware families that is more resistant to code changes while threat actors continually evolve their software, often evading existing YARA rules. Basing rules on runtime function strings allows us to write more robust rules.
Additionally, obfuscated scripts are challenging to detect as they can change widely from version to version or depending on the packer, but runtime strings often reveal unique identifiers or sometimes even the unobfuscated version of the script, which allows us to write YARA rules in these difficult cases as well. For Stealc, the function strings log indeed reveals the decrypted strings, which were harder to extract before (see Figure 9). Here, we see the decrypted configuration, including the expiration date and the URL to the C2 server.
Evasion and Obfuscation Techniques used by Stealc
We have been able to observe a number of evasion techniques that Stealc employs to avoid detection by antivirus and sandboxing technologies. One interesting technique uses a unique hardware identifier to limit a machine to just a single infection, which could be intended as a evasion techniques on platforms where the hardware ID does not change in-between analysis runs. This would limit the detonation to the very first run and avoid revealing malicious behavior in future runs.
Additionally, we have identified common evasion techniques like checking for the size of RAM or refusing to run on machines with certain language settings. In summary, we have found the following evasion and obfuscation techniques.
Hardware ID check
Stealc uses the serial number of the volume on the main hard disk to generate a unique identifier (see Figure 10). This hardware ID is sent to the C2 server (see Figure 11), which likely checks if this ID has ever been seen before (and thus has been infected before already), in which case the sample is terminated.
While we do not have access to the server-side code where this logic is implemented, we think there are two likely explanations for this behavior, (1) to avoid reinfecting the same machine and collecting duplicate data, and (2), as an evasion technique for dynamic, behavior based analyzers.
In the latter case, the first analysis would reveal malicious behavior while any analysis runs in the future on the same virtual machine would force the sample to terminate itself – if the volume serial number remained identical in-between runs and was thus banned.
Size of RAM
If the RAM size is smaller than 1GB, the execution is aborted as this is the case for some virtualized environments (see Figure 12).
Limit to certain languages
Stealc does not run on machines where the user language is set to Russian, Ukrainian, Belarusian, Kazakh or Uzbek – as this is hard-coded and does not seem to be up for configuration, this is a choice made by the developers (Figure 13).
Avoid antivirus emulators
The execution is aborted if the computer name is set to “HAL9TH” and the user name is “JohnDoe”, which is an indicator that the sample is emulated by the Windows Defender (see Figure 14).
Another check for antivirus sandboxing is implemented through a call to VirtualAllocExNuma, which is often not implemented in emulated environments.
Indirect loading of DLL functions
Instead of importing functions statically, Stealc dynamically traverses the Process Environment Block to import DLL functions (see Figure 15), which is a well-known technique of dynamically resolving imports to avoid AV detection.
Encrypted strings / function names
As already mentioned earlier, most strings, including function names, are base64 encoded and RC4 encrypted, which are only decrypted at runtime.
Stealc uses random bytes and jumps to confuse disassemblers and decompilers, thus impeding manual analysis by threat researchers. See the following code snippet where a random byte was placed in the middle of the code, surrounded by a jump instruction that would skip this random byte when executed.
During static analysis this would require the disassembler to have a deeper understanding of the code to avoid being fooled, which most static analysis tools do not posses and thus generate invalid code (see Figure 16). We fix these by defining the random byte as data and exclude it from being parsed as code, which reveals the correct disassembly.
Note that the behavior-based monitoring approach utilized by VMRay Platform is not confused by this as static analysis via disassembly is not necessary to detect malicious behavior on our end.
Detecting the new versions of Stealc
While researching this malware family, we found a few versions very similar to Stealc but with key differences, suggesting that these are likely updated versions. In these cases, the developers removed some of the evasion techniques, namely the check for the RAM size (see Figure 12), as well as the detections designed for antivirus emulators (see Figure 14). Notably, they have also decided to encrypt those few strings that had remained unencrypted in the original version, probably to avoid strictly YARA based detection methods.
This also demonstrates our robustness against these kinds of changes as our behavior-based VMRay Platform still identifies Stealc as malicious in the newest version.
Based on this analysis, we have developed a config extractor which allows customers to peek into the configuration built into the executable by the attacker.
This enables our customers to upload a sample and get a listing of all the important configuration parameters, such as remote servers the malware communicates with, the expiration date and the encryption key (see Figure 17). In addition, the config extractor provides access to high-quality IOCs which can be used to proactively secure environments.
Here at VMRay, we are always on the look out for malware families that have the potential to become a prominent tool among threat actors. Detecting these threats early allows us to investigate their behavior and be prepared to protect the assets of our customers.
One of the strength of VMRay Platform is it’s ability to detect new attacks before malware families are known to researchers. In this regard, VMRay Platform reveals itself to be a helpful tool: our behavior-based analysis at the core of the detection engine can detect malicious behavior before static detection signatures are developed by threat researchers – in fact, threat researchers can further benefit from the detailed report on the behavior of the malware our product generates, which can be used as a strong, detailed basis to assist additional in-depth manual analysis efforts.
In malware research, one wants to quickly understand what a piece of malware does and how it is accomplished. In this post, we have seen how different features of VMRay Platform, such as the VTI’s, the function log, and the function strings allow a very quick but still deep overview of what the core mechanism of the malware is in just a few minutes of analysis time.
Sample (Jan/Feb 2023)
Sample (Mar 2023)
Sample (April 2023)