In this post, we'll demonstrate a process for decoding a visual basic (.vbs) script, which contains an encoded Powershell Script used to download Remcos malware from a Google Drive.
We'll manually analyse and deobfuscate both the vbs and powershell, and develop a decoder to obtain IOCs and decoded values.
File Link
Hash b632a2ab492dbe0f71c18cab99b61bded82cbb66696f2d30c9bc354605ebb136
Initial Analysis and Cleaning Up The Script
We can begin by moving the file into a safe analysis machine and unzipping it with the password infected
.
As the file is a .vbs script, we can directly open it inside of a text editor.
From here, we can immediately see a large number of comments with junk text. We can also take note that each of these comments begins with a single quote of '
These comments do not provide any value to the script. So we can go ahead and remove them straight away using regex. You could also manually highlight and remove them if regex is not your thing.
I will go ahead and remove the comments with a regex of ^'.*\s+
.
^
- Only match at the start of each line (this avoids removing any quotes that are used in strings or "legitimate" places)'
- Look for a single quote (at the start of each line).*
- Grab everything after the single quote\s+
- Grab any spacing at the end of the line (useful for removing newlines)
After hitting "Replace All", this regex was able to remove 1516
lines from the code.
The script now has 80 lines remaining (from a previous 1609).
We can now begin to see some functionality related to grabbing the local time from the system using WMI objects. This doesn't look super interesting so I'll scroll down and come back later.
There are some seemingly random variables being created. Some contain integers, and some contain junk text.
These don't seem to provide any value, but they also don't take up too much space. I'll go ahead and leave them and move on.
Scrolling down more, we can see a reference to WScript.Shell
, as well as a partial reference to Powershell
. Following the PowerShell reference, there appears to be a PowerShell script that has been broken up into multiple pieces.
I will go ahead and focus on the "broken" script, assuming that the aim of the initial obfuscated piece is to use WScript.Shell
to execute the obfuscated PowerShell command.
By temporarily disabling Word Wrapping, you can obtain a clearer overview of the obfuscated Powershell script.
Identifying the Embedded PowerShell Script
Here we can see that the script is broken up into about 20 strings which are all concatenated together.
Now you could manually take each line and add them together, but instead, I will use regex again to clean everything up.
I will begin by copying the PowerShell strings into a new file and removing the "Randomize" line seen in the previous screenshot on line 69.
A new file allows me to attempt decoding without "breaking" the original .vbs script. This also allows me to return to the previous script if I need additional context on the decoded content.
I will go ahead and remove the string concatenation at the beginning of each line. This can be done manually or with a regex.
The results should look something like this.
I will also go ahead and remove the quotes at the beginning and end of each line.
This can be done manually or with a regex, whichever is preferred. I personally used the regex of "\s+"
, which will remove any quotes with only whitespace \s
in-between. (Eg Quote, followed by newline, followed by quote)
After applying this regex and modifying the text highlighting from "Visual Basic" to "Powershell". We are left with the following content.
Beginning of PowerShell Script Analysis
We can see that the resulting PowerShell begins with a Minimif
function, followed by lots of calls to Minimif
and some more encoded values.
Before proceeding, I will go ahead and run the script through a generic beautifier. This is to add newlines and spacing that will make the script much easier to read.
Note that Generic Beautifier has a tendency to break PowerShell scripts, but this is fine since we don't intend on executing it.
Moving the beautified script back into a text editor, we can see that consists almost entirely of obfuscated values being passed to Minimif
Analysing the Obfuscation Routine
The Minimif
function begins to make sense if we give each variable a meaningful name.
At first glance, the script appears to take the 8th character of each encoded string. The script iterates through each string, taking additional characters at 8,16,24 etc. All the way to the end of the string.
Verifying The Obfuscation
With a theory that the decoding is taking the 8th character from each string, we can go ahead and verify this with a single encoded string.
Here is the first encoded string from the PowerShell Script.
Deobfuscation With Python
By using a simple Python Script, we can test out the decoding method. Immediately the first value returns a URL to a Google Drive file.
Instead of using Python, we could also go ahead and use another regex to decode the encoded text.
Deobfuscation With Regex
The below regex looks for blobs of 8 characters and stores the 8th value inside of a capture group. This capture group can be referenced using the value $1.
This regex is able to decode the text, obtaining the same value as the Python Script.
We can use this to our advantage and decode the remaining values using CyberChef.
Deobfuscation Using CyberChef
We can begin this by prototyping a Regular Expression that takes the original Powershell script and obtains all values between quotes.
The regex of '[^']+'
, can achieve this. This regex looks for single quotes, followed by anything that is not a single quote and is ended by another single quote.
Here we can use the Regular Expression and "Highlight Matches" functions to confirm our prototype.
With the Regular Expression working as intended, we can change "Highlight Matches" to "List Matches".
This will list only the encoded values in the script.
From here we can go ahead and apply a "Fork", which means we can act on each encoded value individually. We can also go ahead and remove the single quotes from each line
After applying the fork and removing quotes, we should have something like this. This is all of the encoded values separated by a newline, it looks like junk but we'll fix that in a second.
With the output looking as expected, we can go ahead and apply our previous regex to the CyberChef Recipe.
Final Output
Applying the above recipe, each of the encoded lines will be individually decoded according to the regex we provided.
We can now see all decoded values from the Powershell script.
This includes references to the Google Drive URL, PowerShell, BitsTransfer, AppData folder, as well as additional base64 encoding.
The combination of these values implies that the script uses Powershell to Download a base64 encoded file to the AppData folder. The download is performed using the Bits protocol, using the BitsTransfer Powershell module.
At this point, the script is now successfully decoded and IOCs obtained.
Conclusion
We've now successfully decoded the script and obtained all decoded values. We manually analysed a script and removed decoy comments, identified an embedded PowerShell script, and ultimately extracted and decoded all encoded values.
We've also looked at a simple but interesting method of obfuscation and demonstrated multiple means of successfully decoding (Python, Regex/CyberChef).
Sign up for Embee Research
Malware Analysis Insights
No spam. Unsubscribe anytime.