Anti-Hooking checks of SmokeLoader 2018

SmokeLoader is a quite old but still very popular bot generally used to drop other malware families or deploy additional modules that implement some nice features.

The other day I was checking a sample from a recent campaign, and as I was stepping through the loader I found some interesting stuff I hadn't seen before. In the latest releases of the 2018 version of SmokeLoader they have implemented some anti-debugging checks, as well as anti-vm, anti-disassembly and anti-analysis in general.

CERT Polska did a great job describing most of them. Some are really neat, like the one that calculates the address of the next instructions based on the values of BeingDebugged and NtGlobalFlag.

However, the anti-hooking check is not described in the blog post (maybe it wasn't present yet) and it's actually preventing SmokeLoader from detonating on Cuckoo Sandbox and possibly others.

The assembly snippet of this check is the following:

I've commented the assembly to make it easier to read.

Basically the bot has a list of Windows functions that it's checking for userland hooks. To do this, it compares the first bytes of each function with hardcoded assembly byte patterns commonly used when hooking functions in userland.

We can see in the Cuckoo's monitor source code that the opcodes used for hooking are the same.

The Windows functions checked at this stage are:

ntdll.ZwOpenProcess
ntdll.ZwTerminateProcess
ntdll.ZwCreateSection
ntdll.ZwMapViewOfSection
ntdll.ZwUnmapViewOfSection
ntdll.ZwClose
ntdll.ZwAllocateVirtualMemory
ntdll.ZwFreeVirtualMemory
ntdll.ZwWriteVirtualMemory
ntdll.LdrLoadDll

Finally, if three or more of these functions are detected to be hooked, SmokeLoader simply terminates its execution.

This isn't a new technique to detect hooks, but it's always nice to see these checks implemented in real-world malware.

You may find the hashes and samples in the Malware Traffic campaign post, but I've also uploaded the unpacked SmokeLoader sample to VT: 26f02a2ed9a1f1902862101f70e361d7

Reverse engineering a CS:GO cheating software

TL;DR: Technical low-level analysis of the cheat, also including the licensing and differences between public and private version.

CS:GO is one of the most popular competitive online games, it has 520.285 current players as I write these lines. As in any other competition-driven game, cheaters arise, and specially in the CS community, they have become a serious problem.

Today we are taking a look at the public and private version of a cheat for this game!

I won't mention the name of the cheat to avoid giving them free advertisement and because it's not necessary for this post, but if you're into this topic, you'll probably guess.

Before we start, it's important to mention that I managed to get a private version build using an alternative channel 😈. This means I've never paid to the developer, so I didn't support their business in any way! Damn you, cheaters!

Public vs Private version

This cheat is quite accessible, as the developer provides a public (free) version with all the capabilities for the users to try. The most important "downside", is that the public cheat is obviously detected by VAC, so if you use it in a VAC-protected server, it's a matter of time that your account gets VAC-banned.

Here is where the paid private version comes into play: Customers get a unique build that is guaranteed to be undetected.

Licensing

Each private version build of the cheat is tied to a machine, to avoid piracy, reselling, ...

The license procedure gets the SystemDrive environment variable, and using DeviceIoControl with the parameter IOCTL_DISK_GET_DRIVE_GEOMETRY, reads the technical capabilities of the hard drive. Then the Processor Brand String is also read using the cpuid instruction.

This information is formatted into a string, hashed with SHA1, and mutated with a custom ASCII rotation algorithm:

for ( i = 0; i < v16; v16 = strlen((const char *)&sha1_hex) ) {
v18 = *((char *)&sha1_hex + i);
if ( (unsigned int)(v18 - '0') > 9 )
*((_BYTE *)&sha1_hex + i) = v18 + 5;
else
*((_BYTE *)&sha1_hex + i) = v18 + '!';
++i;
}

The resulting string is your unique license, which is sent to the cheat developer when you buy it, and in return you get a build that only works in the computer that generated this license.

How the cheat works

This cheat is an external cheat, which means all the work is done out of the CS:GO process (no DLL injection).

The first thing it does is open the csgo.exe process, and get the base addresses of client.dll and engine.dll.

Then it uses patterns to find game structures (offsets) in the memory, these patterns usually match opcodes of the game binaries, where memory pointers are referenced, or other useful information. They also use patterns to find game functions and strings.

For example, one of the patterns is:

89 0D ? ? ? ? 8B 0D ? ? ? ? 8B F2 8B C1 83 CE 08

If we look for these bytes in the client.dll file, we get the following hit:

 0x102bdf1d 890de815f214 mov dword [0x14f215e8], ecx
0x102bdf23 8b0d5ccaec12 mov ecx, dword [0x12ecca5c]
0x102bdf29 8bf2 mov esi, edx
0x102bdf2b 8bc1 mov eax, ecx
0x102bdf2d 83ce08 or esi, 8

Which means this pattern is looking for one of those global memory references present in the first two disassembly lines.

As we said, they also use patterns to locate game functions, for instance with the following pattern, the cheat locates the start of the function used by the game to execute console commands in-game:

55 8B EC 8B ? ? ? ? ? 81 F9 ? ? ? ? 75 0C A1 ? ? ? ? 35 ? ? ? ? EB 05 8B 01 FF 50 34 50 A1

This one is found in engine.dll:

      0x100aa300 55           push ebp
0x100aa301 8bec mov ebp, esp
0x100aa303 8b0d54345b10 mov ecx, dword [0x105b3454]
0x100aa309 81f938345b10 cmp ecx, 0x105b3438
,=< 0x100aa30f 750c jne 0x100aa31d
| 0x100aa311 a168345b10 mov eax, dword [0x105b3468]
| 0x100aa316 3538345b10 xor eax, 0x105b3438
,==< 0x100aa31b eb05 jmp 0x100aa322
|`-> 0x100aa31d 8b01 mov eax, dword [ecx]
| 0x100aa31f ff5034 call dword [eax + 0x34]
`--> 0x100aa322 50 push eax
0x100aa323 a1f8325a10 mov eax, dword [0x105a32f8]
[...]

If the cheat wants to run an in-game console command, it can allocate memory in the game process, pass the arguments to the function using this memory, and create a new thread using CreateRemoteThread at the beginning of the procedure.

When the cheat has located all it needs to work, it will start a bunch of threads that implement each of the functionalities. These threads are in charge of monitoring and manipulate the game memory using the functions ReadProcessMemory and WriteProcessMemory.

Changing the values of the internal game structures at will, the cheat can achieve the functionalities it offers.

I have identified some of the functions and renamed them in my pseudocode:

CreateThread(0, 0, (LPTHREAD_START_ROUTINE)aimassist, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)aimlock, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)bunnyhop, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)anti_flash, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)sub_403F0E, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)esp_hack, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)radar_hack, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)kill_message, 0, 0, 0);
while ( !byte_4F1081
|| !byte_4F1054
|| !byte_4F1082
|| !byte_4F10C9
|| !byte_4F1062
|| !byte_4F1040
|| !byte_4F1090
|| !byte_4F1028 )
Sleep(0x64u);
// Default config
cfg_antiflash = 1;
cfg_aimlock = 1;
cfg_killmessage = 1;
cfg_radarhack = 1;
byte_4F1032 = 0;
cfg_glowesp = 1;
byte_4F10C0 = 0;
cfg_bunnyhop = 1;
cfg_aimassist = 1;
cfg_reload();
while ( WaitForSingleObject(csgo_prochandler, 0) != 0 )
cfg_changes_loop();
CloseHandle(csgo_prochandler);
j_exit(0);

Private version protection

The public version is poorly protected, they just encrypted the strings with a simple algorithm but it has no code obfuscation or PE packing.

On the other side, the private version is protected with Themida, a commercial packer that, depending on its configuration, can be quite effective protecting executables.

It's very likely that they use Themida for two purposes:

  1. Protect the cheat license from being patched. The program can be manipulated to validate any license when running in a computer, but reconstruct a fully working version of the packed executable and patch it may be quite tricky.
  2. The second and most important, avoid the VAC signatures from detecting their cheat when running. Themida can protect the original opcodes of the program when it's loaded in memory and running, and writing signatures (patterns) for those opcodes is one of the methods VAC uses to detect cheaters.

Closing

If we compare it to other cheats, this one is simple in terms of functionality, but still quite effective.

Bear in mind that the CSGO binaries used for the analysis are not from the latest game update, as I wrote this one week ago. The binaries I used are:

942fa5d3ef9e0328157b34666327461cee10aae74b26af135b8589dc35a5abc7 client.dll
e6f3eda5877f2584aeb43122a85d0861946c7fb4222d0cb6f3bc30034e4d3e24 engine.dll
1a5bb2b0ae9f2e6ef757c834eeb2c360a59dce274b2e4137706031f629e6455f csgo.exe

This means that the cheat signatures may have been slightly modified to work with the new executables, and the offsets probably won't be the same if these binaries changed in the latest version of the game.

Sricam AP003 by Sricctv

I got this camera not long ago, and as it usually happens, in addition to its main purpose it served some hours of fun!

It's one of the cheapest wireless IP cameras right now, you can find it for around 40$ depending on the store. The manufacturer is Sricctv, a company based in Shenzhen specialized in CCTV.

It uses Linux 2.6 and has a MIPS processor (MIPS 24K V4.12).

Firmware

I couldn't find the firmware in the official website and they didn't agree to send me the latest version. Luckily for me I got a firmware for a camera similar to mine so I could study the system a bit without messing with the hardware.

The firmware file format is pretty straightforward. It expects a 32 bytes header string, the size of the package in a 4 byte value, a ZIP file with the contents and a 32 bytes footer:

00000000 77 69 66 69 2d 63 61 6d 65 72 61 2d 73 79 73 2d |wifi-camera-sys-|
00000010 71 65 74 79 69 70 61 64 67 6a 6c 7a 63 62 6d 6e |qetyipadgjlzcbmn|
00000020 43 17 05 00 50 4b 03 04 0a 00 00 00 00 00 e7 7e |C...PK.........~|
*
00051760 00 07 13 05 00 00 00 77 69 66 69 2d 63 61 6d 65 |.......wifi-came|
00051770 72 61 2d 65 6e 64 2d 6e 76 78 6b 68 66 73 6f 75 |ra-end-nvxkhfsou|
00051780 74 65 71 7a 68 70 6f |teqzhpo|

There are two types of upgrades handled by the upgrader, system upgrades and web app upgrades.

- System upgrades overwrite the main system binaries, located in /system/system/ and have this header and footer combination: wifi-camera-sys-qetyipadgjlzcbmn, wifi-camera-end-nvxkhfsouteqzhpo.

- Web app upgrades overwrite the contents in /system/www/ and have this header and footer combination: wifi-camera-app-qazwsxedcrfvtgba, wifi-camera-end-yhnujmzaqxswcdef.

Interestingly, web app upgrades are expected to contain a password protected ZIP file, but system upgrades are not. As the upgrader is in the system firmware image, we can look at the binary and locate the hardcoded password.

Telnet access for everyone!

While there are several ports listening in the camera, the most interesting are probably 23 (telnet) and 81 (default http panel). When we extracted the firmware, we located a nice string:

root:LSiuY7pOmZG2s:0:0:Administrator:/:/bin/sh

The encrypted password for root is easy to crack: 123456

So now we can log in to the camera via telnet with the root account and get access to all the file system.

This can also be used to disclose all the configuration, including user and password of the admin account in the web panel:

$ telnet 192.168.1.111
Trying 192.168.1.111...
Connected to 192.168.1.111.
Escape character is '^]'.

(none) login: root
Password:

BusyBox v1.12.1 (2013-03-02 13:26:40 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

# cd /system/www/

# cat system.ini

...

We can change the root password, but as the passwords file is in a volatile partition of the file system, the default will be set again after reboot.

Connect the camera and say Hi to the Internet!

This bothered me a lot when I connected the camera for the first time. If your router has UPnP enabled, which is very common in SOHO routers, the camera will use this protocol to open the external port (Internet facing) of your router and forward it to the port where the web management service is listening. By default this port is 81.

If you haven't setup your credentials yet, the camera is wide open to everyone. If a vulnerability is found in the service, no matter what your configuration is, the camera will be there for sneaky eyes.

This is probably a "convenience" for non-technical users to connect from external networks using the P2P app provided by the vendor. The camera will also get the external IP of your network connecting to www.ip138.com, so the app knows where to connect.

Conclusion

If you care about your privacy, this camera is not for you. I guess you get what you pay, the camera has good specifications and performance, but the software design is just horrible.

rdtsc x86 instruction to detect virtual machines

A new version of pafish has been recently released. It comes with a set of detections completely new for the project (read: not new techniques), which are based on CPUs information. To get this information, the code makes use of rdtsc and cpuid x86 instructions.

Here we are going to look at rdtsc instruction technique, and how it is used to detect VMs.

What is rdtsc?

Wikipedia's description is pretty straightforward [1]:

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of cycles since reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the higher 32 bits of RAX and RDX. Its opcode is 0F 31.

So it is a counter increased in each CPU cycle.

Well, it actually depends on the processor.

Initially, this value was used to count the actual internal processor clock cycles. It was meant for developers to measure how many cycles a routine takes to complete. It was good to measure performance.

In the latest Intel processor families, this counter increases at a constant rate, which is determined by the maximum frequency the processor can run at that boot. Maximum does not mean current, as power-saving measures can dynamically change the velocity of the processor. This means it is not good to measure performance anymore, because the processor frequency can change at runtime and ruin the metric. On the other hand, now it can be used to measure time.

This is explained much better in reference [2].

So, how is this used to detect VMs?

In a physical (host) system the counters subtraction of two consecutive rdtsc instructions will result in a very small amount of cycles.

On the other hand, doing the same in a virtualized (guest) system, the difference can be much bigger. This is caused by the overhead of actually run inside the virtual machine.

I wrote a small program to verify this behaviour, it will do the subtraction ten times with a sleeping period of time in between. You can get the source from here.

This is similar to what pafish does, the output in a physical machine looks like this:

$ gcc -O2 vmfun.c && ./a.out
(81889337556698 - 81889337556746) rdtsc difference: 48
(81891335245484 - 81891335245508) rdtsc difference: 24
(81893332927964 - 81893332927988) rdtsc difference: 24
(81895330659684 - 81895330659708) rdtsc difference: 24
(81897326984696 - 81897326984720) rdtsc difference: 24
(81899324782460 - 81899324782520) rdtsc difference: 60
(81901322471630 - 81901322471690) rdtsc difference: 60
(81903320069632 - 81903320069656) rdtsc difference: 24
(81905317727808 - 81905317727832) rdtsc difference: 24
(81907314531066 - 81907314531078) rdtsc difference: 12
difference average is: 32

Try to compile and run this code with different compiler optimizations if you want to have some fun ;)

This is the theory, but in practice it depends on the virtualization product, its configuration, and the number of cores assigned to the guest system.

For instance, VMware virtualizes the TSC by default. This can be disabled but it is not recommended, the TSC virtualization can also be tweaked in the configuration. Much more information about this in references [3] and [4].

There is also a substantial difference when the VM has two or more cores assigned. With one core, the differences are not that big, and it gets close to a physical processor although sometimes some peaks can happen. With two or more cores, the differences are much bigger and consistent.

I suspect the second behaviour is caused by CPU ready times, which is explained in references [5] and [6].

Have a look at the following example in VirtualBox:

One core assigned, note the peaks

 

Two cores assigned, the differences are large and consistent

So we can conclude two things.

The first one is, this method is not always reliable as it is heavily dependant on the processor and the virtualization product.

The second one is, if I were running a sandbox cluster, I would try to assign only one core to each guest machine. Not only because it would make this method a bit less reliable, but also for performance.

Our fabulous sandbox uses an emulator instead of a VM, should I care about this?

Well, generally speaking you should not care about this specific method then. Emulators replicate the whole machine hardware, including the CPU at the lowest level (binary translation), so it has its own TSC implementation, and the cycles usage for a routine should be similar to a physical CPU.

We can verify this running our testing program in QEMU:

QEMU is nice

I hope you enjoyed the post and this new pafish release, thanks to mlw.re members for helping me with the tests :)

Check out the references for more information on this topic and general understanding on how VMs / emulators work!

References:

[1] https://en.wikipedia.org/wiki/Time_Stamp_Counter

[2] https://randomascii.wordpress.com/2011/07/29/rdtsc-in-the-age-of-sandybridge/

[3] http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf

[4] http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf

[5] https://virtualblocks.wordpress.com/2010/06/22/cpu-ready-over-built-vm-or-over-utilized-host/

[6] http://www.spug.co.uk/?p=294

Home← Older posts