Reverse engineering a CS:GO cheating software

TL;DR: Technical low-level analysis of the cheat, also including the licensing and differences between public and private version.

CS:GO is one of the most popular competitive online games, it has聽520.285 current players as I write these lines. As in any other competition-driven game, cheaters arise, and specially in the CS community, they have become a serious problem.

Today we are taking a look at the public and private version of a cheat for this game!

I won't mention the name of the cheat to avoid giving them free advertisement and because it's not necessary for this post, but if you're into this topic, you'll probably guess.

Before we start, it's important to mention that I managed to get a private version build using an alternative channel 馃槇. This means I've never paid to the developer, so I didn't support their business in any way! Damn you, cheaters!

Public vs Private version

This cheat is quite聽accessible, as the developer provides a public (free) version with all the capabilities for the users to try. The most important "downside", is that the public cheat is obviously detected by VAC, so if you use it in a VAC-protected server, it's a matter of time that your account gets VAC-banned.

Here is where the paid private version comes into play: Customers get a unique build that is guaranteed聽to be undetected.

Licensing

Each private version build of the cheat is tied to a machine, to avoid piracy, reselling, ...

The license procedure gets the聽SystemDrive environment variable, and using聽DeviceIoControl with the parameter聽IOCTL_DISK_GET_DRIVE_GEOMETRY, reads the technical capabilities of the hard drive. Then the聽Processor Brand String is also read using the cpuid instruction.

This information is formatted into a string, hashed with SHA1, and mutated with a custom ASCII rotation algorithm:

for ( i = 0; i < v16; v16 = strlen((const char *)&sha1_hex) ) {
v18 = *((char *)&sha1_hex + i);
if ( (unsigned int)(v18 - '0') > 9 )
*((_BYTE *)&sha1_hex + i) = v18 + 5;
else
*((_BYTE *)&sha1_hex + i) = v18 + '!';
++i;
}

The resulting string is your unique license, which is sent to the cheat developer when you buy it, and in return you get a build that only works in the computer that generated this license.

How the cheat works

This cheat is an external cheat, which means all the work is done out of the CS:GO process (no DLL injection).

The first thing it does is open the csgo.exe process, and get the base addresses of聽client.dll and engine.dll.

Then it uses patterns to find game structures (offsets) in the memory, these patterns usually match opcodes of the game binaries, where memory pointers are referenced, or other useful information. They also use patterns to find game functions and strings.

For example, one of the patterns is:

89 0D ? ? ? ? 8B 0D ? ? ? ? 8B F2 8B C1 83 CE 08

If we look for these bytes in the client.dll file, we get the following hit:

 0x102bdf1d 890de815f214 mov dword [0x14f215e8], ecx
0x102bdf23 8b0d5ccaec12 mov ecx, dword [0x12ecca5c]
0x102bdf29 8bf2 mov esi, edx
0x102bdf2b 8bc1 mov eax, ecx
0x102bdf2d 83ce08 or esi, 8

Which means this pattern is looking for one of those global memory references present in the first two disassembly lines.

As we said, they also use patterns to locate game functions, for instance with the following pattern, the cheat locates the start of the function used by the game to execute console commands in-game:

55 8B EC 8B ? ? ? ? ? 81 F9 ? ? ? ? 75 0C A1 ? ? ? ? 35 ? ? ? ? EB 05 8B 01 FF 50 34 50 A1

This one is found in聽engine.dll:

      0x100aa300 55           push ebp
0x100aa301 8bec mov ebp, esp
0x100aa303 8b0d54345b10 mov ecx, dword [0x105b3454]
0x100aa309 81f938345b10 cmp ecx, 0x105b3438
,=< 0x100aa30f 750c jne 0x100aa31d
| 0x100aa311 a168345b10 mov eax, dword [0x105b3468]
| 0x100aa316 3538345b10 xor eax, 0x105b3438
,==< 0x100aa31b eb05 jmp 0x100aa322
|`-> 0x100aa31d 8b01 mov eax, dword [ecx]
| 0x100aa31f ff5034 call dword [eax + 0x34]
`--> 0x100aa322 50 push eax
0x100aa323 a1f8325a10 mov eax, dword [0x105a32f8]
[...]

If the cheat wants to run an in-game console command, it can allocate memory in the game process, pass the arguments to the function using this memory, and create a new thread using聽CreateRemoteThread at the beginning of the procedure.

When the cheat has located all it needs to work, it will start a bunch of threads that implement each of the functionalities. These threads are in charge of monitoring and manipulate the game memory using the functions聽ReadProcessMemory and聽WriteProcessMemory.

Changing the values of the internal game structures at will, the cheat can achieve the functionalities it offers.

I have identified some of the functions and renamed them in my pseudocode:

CreateThread(0, 0, (LPTHREAD_START_ROUTINE)aimassist, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)aimlock, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)bunnyhop, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)anti_flash, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)sub_403F0E, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)esp_hack, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)radar_hack, 0, 0, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)kill_message, 0, 0, 0);
while ( !byte_4F1081
|| !byte_4F1054
|| !byte_4F1082
|| !byte_4F10C9
|| !byte_4F1062
|| !byte_4F1040
|| !byte_4F1090
|| !byte_4F1028 )
Sleep(0x64u);
// Default config
cfg_antiflash = 1;
cfg_aimlock = 1;
cfg_killmessage = 1;
cfg_radarhack = 1;
byte_4F1032 = 0;
cfg_glowesp = 1;
byte_4F10C0 = 0;
cfg_bunnyhop = 1;
cfg_aimassist = 1;
cfg_reload();
while ( WaitForSingleObject(csgo_prochandler, 0) != 0 )
cfg_changes_loop();
CloseHandle(csgo_prochandler);
j_exit(0);

Private version protection

The public version is poorly protected, they just encrypted the strings with a simple algorithm but it has no code obfuscation or PE packing.

On the other side, the private version is protected with Themida, a commercial packer that, depending on its configuration, can be quite effective protecting executables.

It's very likely that they use Themida for two purposes:

  1. Protect the cheat license from being patched. The program can be manipulated to validate any license when running in a computer, but reconstruct a fully working version of the packed executable and patch it may be quite tricky.
  2. The second and most important, avoid the VAC signatures from detecting their cheat when running. Themida can protect the original opcodes of the program when it's loaded in memory and running, and writing signatures (patterns) for those opcodes is one of the methods VAC uses to detect cheaters.

Closing

If we compare it to other cheats, this one is simple in terms of functionality, but still quite effective.

Bear in mind that the CSGO binaries used for the analysis are not from the latest game update, as I wrote this one week ago. The binaries I used are:

942fa5d3ef9e0328157b34666327461cee10aae74b26af135b8589dc35a5abc7 client.dll
e6f3eda5877f2584aeb43122a85d0861946c7fb4222d0cb6f3bc30034e4d3e24 engine.dll
1a5bb2b0ae9f2e6ef757c834eeb2c360a59dce274b2e4137706031f629e6455f csgo.exe

This means that the cheat signatures may have been slightly modified to work with the new executables, and the offsets probably won't be the same if these binaries changed in the latest version of the game.

Sricam AP003 by Sricctv

I got this camera not long ago, and as it usually happens, in addition to its main purpose it served some hours of fun!

It's one of the cheapest wireless IP cameras right now, you can find it for around 40$ depending on the store. The manufacturer is Sricctv, a company based in聽Shenzhen specialized in CCTV.

It uses Linux 2.6 and has a MIPS processor (MIPS 24K V4.12).

Firmware

I couldn't find the firmware in the official website and they didn't agree to send me the latest version. Luckily for me I got a firmware for a camera similar to mine so I could study the system a bit without messing with the hardware.

The firmware file format is pretty straightforward. It expects a 32 bytes聽header string, the size of the package in a 4 byte value, a ZIP file with the contents and a 32 bytes聽footer:

00000000 77 69 66 69 2d 63 61 6d 65 72 61 2d 73 79 73 2d |wifi-camera-sys-|
00000010 71 65 74 79 69 70 61 64 67 6a 6c 7a 63 62 6d 6e |qetyipadgjlzcbmn|
00000020 43 17 05 00 50 4b 03 04 0a 00 00 00 00 00 e7 7e |C...PK.........~|
*
00051760 00 07 13 05 00 00 00 77 69 66 69 2d 63 61 6d 65 |.......wifi-came|
00051770 72 61 2d 65 6e 64 2d 6e 76 78 6b 68 66 73 6f 75 |ra-end-nvxkhfsou|
00051780 74 65 71 7a 68 70 6f |teqzhpo|

There are two types of upgrades handled by the upgrader, system upgrades and web app upgrades.

- System upgrades overwrite the main system binaries, located in /system/system/ and have this header and footer combination:聽wifi-camera-sys-qetyipadgjlzcbmn, wifi-camera-end-nvxkhfsouteqzhpo.

- Web app upgrades overwrite the contents in聽/system/www/ and have this header and footer combination: wifi-camera-app-qazwsxedcrfvtgba, wifi-camera-end-yhnujmzaqxswcdef.

Interestingly, web app upgrades are expected to contain a password protected ZIP file, but system upgrades are not. As the upgrader is in the system firmware image, we can look at the binary and locate the hardcoded password.

Telnet access for everyone!

While there are several ports listening in the camera, the most interesting are probably 23 (telnet) and 81 (default http panel). When we extracted the firmware, we located a nice string:

root:LSiuY7pOmZG2s:0:0:Administrator:/:/bin/sh

The encrypted password for root is easy to crack: 123456

So now we can log in to the camera via telnet with the root account and get access to all the file system.

This can also be used to disclose all the configuration, including user and password of the聽admin account in the web panel:

$ telnet 192.168.1.111
Trying 192.168.1.111...
Connected to 192.168.1.111.
Escape character is '^]'.

(none) login: root
Password:

BusyBox v1.12.1 (2013-03-02 13:26:40 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

# cd /system/www/

# cat system.ini

...

We can change the root password, but as the passwords file is in a volatile partition of the file system, the default will be set again after reboot.

Connect the camera and say Hi to the Internet!

This bothered me a lot when I connected the camera for the first time. If your router has UPnP enabled, which is very common in SOHO routers, the camera will use this protocol to open the external port (Internet facing) of your router and forward it to the port where the web management service is listening. By default this port is 81.

If you haven't setup your credentials yet, the camera is wide open to everyone. If a vulnerability is found in the service, no matter what your configuration is, the camera will be there for sneaky eyes.

This is probably a "convenience" for non-technical users to connect from external networks using the P2P app provided by the vendor. The camera will also get the external IP of your network connecting to聽www.ip138.com, so the app knows where to connect.

Conclusion

If you care about your privacy, this camera is not for you. I guess you get what you pay, the camera has good specifications and performance, but the software design is just horrible.

rdtsc x86 instruction to detect virtual machines

A new version of pafish has been recently released. It comes with a set of detections completely new for the project (read: not new techniques), which are based on CPUs information. To get this information, the code makes use of rdtsc and cpuid x86 instructions.

Here we are going to look at rdtsc instruction technique, and how it is used to detect VMs.

What is聽rdtsc?

Wikipedia's description is pretty聽straightforward [1]:

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of cycles since reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the higher 32 bits of RAX and RDX. Its opcode is 0F 31.

So it is a counter increased in each CPU cycle.

Well, it actually depends on the processor.

Initially, this value was used to count the actual聽internal processor clock cycles. It was meant for developers to measure how many cycles a routine takes to complete. It was good to measure performance.

In the latest Intel processor families, this counter increases at a constant rate, which is determined by the聽maximum聽frequency the processor can run at that boot. Maximum does not mean current, as聽power-saving measures can dynamically change the velocity of the processor. This means it is not good to measure performance anymore, because the processor frequency can change at runtime and ruin the metric. On the other hand, now it can be used to measure time.

This is explained much better in reference [2].

So, how is this used to detect VMs?

In a physical (host) system the counters聽subtraction of two consecutive聽rdtsc instructions will result in a very small amount of cycles.

On the other hand, doing the same in a virtualized (guest) system, the difference can be much bigger. This is caused by the overhead of actually run inside the virtual machine.

I wrote a small program to verify this behaviour, it will do the subtraction ten times with a sleeping period of time in between.聽You can get the source from聽here.

This is similar to what pafish does, the output in a physical machine looks like this:

$ gcc -O2 vmfun.c && ./a.out
(81889337556698 - 81889337556746) rdtsc difference: 48
(81891335245484 - 81891335245508) rdtsc difference: 24
(81893332927964 - 81893332927988) rdtsc difference: 24
(81895330659684 - 81895330659708) rdtsc difference: 24
(81897326984696 - 81897326984720) rdtsc difference: 24
(81899324782460 - 81899324782520) rdtsc difference: 60
(81901322471630 - 81901322471690) rdtsc difference: 60
(81903320069632 - 81903320069656) rdtsc difference: 24
(81905317727808 - 81905317727832) rdtsc difference: 24
(81907314531066 - 81907314531078) rdtsc difference: 12
difference average is: 32

Try to compile and run this code with different compiler optimizations if you want to have some fun ;)

This is the theory, but in practice it depends on the virtualization product, its configuration, and the number of cores assigned to the guest system.

For instance, VMware virtualizes the TSC by default. This can be disabled but it is not recommended, the TSC virtualization can also be tweaked in the configuration. Much more information about this in references [3] and [4].

There is also a substantial difference when the VM has two or more cores assigned. With one core, the differences are not that big, and it gets close to a physical processor聽although sometimes some聽peaks can happen. With two or more cores, the differences are much bigger and consistent.

I suspect the second behaviour is caused by CPU ready times, which is explained in references [5] and [6].

Have a look at the following example in VirtualBox:

One core assigned, note the peaks

Two cores assigned, the differences are large and consistent

So we can conclude two things.

The first one is, this method is not always reliable as it is heavily dependant on the processor and the virtualization product.

The second one is, if I were running a sandbox cluster, I would try to assign only one core to each guest machine. Not only because it would make this method a bit less reliable, but also for performance.

Our fabulous sandbox uses an emulator instead of a VM, should I care about this?

Well, generally speaking you should not care about this specific method then. Emulators replicate the whole machine hardware, including the CPU at the lowest level (binary translation), so it has its own TSC implementation, and the cycles usage for a routine should be similar to a physical CPU.

We can verify this running our testing program in QEMU:

QEMU is nice

I hope you enjoyed the post and this new pafish release, thanks to聽mlw.re members for helping me with the tests :)

Check out the references for more information on this topic and general understanding on how VMs / emulators work!

References:

[1]聽https://en.wikipedia.org/wiki/Time_Stamp_Counter

[2]聽https://randomascii.wordpress.com/2011/07/29/rdtsc-in-the-age-of-sandybridge/

[3]聽http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf

[4]聽http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf

[5]聽https://virtualblocks.wordpress.com/2010/06/22/cpu-ready-over-built-vm-or-over-utilized-host/

[6]聽http://www.spug.co.uk/?p=294

Analysis of a Win32 (Neutrino?)/n3nmtx Trojan

I detected this piece a while ago, but didn't have time to get deeper into it. The detections of the malware sample are quite generic, so for the purpose of this post I'll name it "n3nmtx", based on the mutex it creates at the beginning of the execution. More details on the name at the end of the post.

This sample caught my attention because of the huge number of anti-analysis tricks it deploys. Actually, some of them are stolen from pafish, which makes me feel really bad and forces me to do the proper analysis.

Anti-analysis tricks

Basically we will need to comply with some conditions to make the malware run. At the beginning of the execution, it will sleep for聽0x2710 (10000), then it will do a call to 0x400, which is the function that contains all anti-analysis tricks. If that procedure doesn't complain, it will check for internet connection. If it can't connect, the execution will come back to the beginning of the loop procedure.

So, instead of patching the whole system to run it, we will patch the malware itself (:D), and radare2 will help us!

To patch the sleep time, the best we can do is going to the value and change it for something smaller, let's say 0x05. So we copy the binary and open it with r2 in write mode:

$ r2 -w original_patch.exe

Let's seek the address of the sleep value:

[0x0040a3fa]> s 0x00004f32

And in visual mode (V), using the cursor, we can +/- the push value.

We do the same with the other sleep, and we're done.

radiff2 is quite handy to confirm the changes:

$ radiff2 original.exe original_patch.exe
0x00004f32 1027 => 0500 0x00004f32
0x00004fb7 1027 => 0500 0x00004fb7

Great, no more sleeps.

The anti-analysis procedure is just a call, and it doesn't check any return value. If it would, we could patch the conditional jmp, but as it doesn't, we will just put some NOPs there.

Open the file in write mode again, knowing where to put the NOPs and how many of them ...

[0x0040a3fa]> wx 9090909090 @0x00004f0d

And in visual mode we can confirm that the procedure will never be executed, and also that we didn't mess up the opcodes.

At this moment, the malware will run instantly in any system, including sandboxes!

Startup

To stay alive after reboots, it creates some entries in the registry, common stuff.

Communication with CnC

The communication with the CnC is done via HTTP requests. It will basically send pings and ask for tasks to do.

The task delivery is interesting because the server will answer with a 404 response, but at the end of the content we can find the command sent to the bot in base64.

Commands

The commands accepted by the bot are:

rate
update
loader
findfile
cmd
visit
open
spread
archive
usb
botkiller
http
slow
dwflood
tcp
udp
smart
https
keylogger

Most of them are聽self-explanatory.

Conclusion

I want to keep this short because while I was redacting the post, I found some public information about the piece.

You can check this McAfee post for more detail on the anti-analysis techniques.

This bot seems to be part of a Neutrino botnet, Kafeine also wrote about it, but hashes and CnC are different.

My hashes are:

8130f713b2464c77c6500d5d8d37a4b3e3c0e98f2607ca4262a87ee6ae0e88c3 <- the sample used for the post

eeb07cb8aa2cf2bee7b4a893892c08b9d39e90614bf2ff1d0f93a29f97f6815b

You can find them in VT or in our beloved Zoo :)

Home鈫 Older posts