Reversing an Android app Protector, Part 1 – Code Obfuscation & RASP

In this series: Part 1, Part 2, Part 3

What started as a ProGuard + basic string encryption + code reflection tool evolved into a multi-platform, complex solution including: control-flow obfuscation, complex and varied data and resources encryption, bytecode encryption, virtual environment and rooted system detection, application signature and certificate pinning enforcement, native code protection, as well as bytecode virtualization 1, and more.

This article presents the obfuscation techniques used by this app protector, as well as facility made available at runtime to protected programs 2. The analysis that follows was done statically, with JEB 3.20.

Identification

Identifying apps protected by this protector is relatively easy. It seems the default bytecode obfuscation settings place most classes in the o package, and some will be renamed to invalid names on a Windows system, such as con or aux. Closer inspection of the code will reveal stronger hints than obfuscated names: decryption stubs, specific encrypted data, the presence of some so library files, are all tell tale signs, as shown below.

Running a Global Analysis

Let’s run a Global Analysis (menu Android, Global analysis…) with standard settings on the file and see what gets auto-decrypted and auto-unreflected:

Results subset of Global Analysis (redacted areas are meant to keep the analyzed program anonymous; it is a clean app, whose business logic is irrelevant to the analysis of the app protector)

Lots of strings were decrypted, many of them specific to the app’s business logic itself, others related to RASP – that is, library code embedded within the APK, responsible for performing app signature verification for instance. That gives us valuable pointers into where we should be looking at if we’d like to focus on the protection code specifically.

Deobfuscating Code

The first section of this blog focuses on bytecode obfuscation and how JEB deals with it. It is mostly automated, but a final step requires manual assistance to achieve the best results.

Most obfuscated routines exhibit the following characteristics:

  • Dynamically generated strings via the use of per-class decryption routines
  • Most calls to external routines are done via reflection
  • Flow obfuscation via the use of a couple of opaque integer fields – let’s call them OPI0, OPI1. They are class fields generally initialized to 0 and 1.
  • Arithmetic operation obfuscation
  • Garbage code insertion
  • Unusual protected block structure, leading to fragmented try-blocks, unavoidable to produce semantically accurate raw code

As an example, the following class is used to perform app certificate validation in order, for instance, to prevent resigned apps from functioning. A few items were renamed for clarity; decompilation is done with disabled Deobfuscators (MOD1+TAB, untick “Enable deobfuscators”):

Take #1 (snippet) – The protected class is decompiled without deobfuscation in order to show semi-raw output (a few optimizers doing all sort of code cleanup are not categorized as deobfuscators internally, and will perform even if Deobfuscation is disabled). Note that a few items were also renamed for clarity.

In practice, such code is quite hard to comprehend on complex methods. With obfuscators enabled (the default setting), most of the above will be cleared.

See the re-decompilation of the same class, below.

  • strings are decrypted…
  • …enabling unreflection
  • most obfuscation is removed…
  • except for some control flow obfuscation that remains because JEB was unable to process OPI0/OPI1 directly (below,
Take #2 (full routine) – obfuscators enabled (default). The red blocks highlight use of opaque variables used to obfuscate control flow.

Let’s give a hint to JEB as to what OPI0/OPI1 are.

  • When analyzing protected apps, you can rename OPI0 and OPI1 to guard0 and guard1, respectively, to allow JEB go aggressively clean the code
  • Redecompile the class after renaming the fields
Take #3 (full routine) – with explicit guard0/guard1

That final output is clean and readable.

Other obfuscation techniques not exposed in this short routine above are arithmetic obfuscation and other operation complexification techniques. JEB will seamlessly deal with many of them. Example:

is optimized to

To summarize bytecode obfuscation:

  • decryption and unreflection is done automatically 3
  • garbage clean-up, code clean-up is also generic and done automatically
  • control flow deobfuscation needs a bit of guidance to operate (guard0/guard1 renaming)

Runtime Verification

RASP library routines are used at the developers’ discretion. They consist of a set of classes that the application code can call at any time, to perform tasks such as:

  • App signing verification
  • Debuggability/debugger detection
  • Emulator detection
  • Root detection
  • Instrumentation toolkits detection
  • Certificate pinning
  • Manifest check
  • Permission checks

The client decides when and where to use them as well as what action should be taken on the results. The code itself is protected, that goes without saying.

App Signing Verification

  • Certificate verification uses the PackageManager to retrieve app’s signatures: PackageManager.getPackageInfo(packageName, GET_SIGNATURES).signatures
  • The signatures are hashed and compared to caller-provided values in an IntBuffer or LongBuffer.

Debug Detection

Debuggability check

The following checks must pass:

  • assert that Context.ctx.getApplicationInfo().flags & ApplicationInfo.FLAG_DEBUGGABLE is false
  • check the ro.debuggable property, in two ways to ensure consistency
    • using android.os.SystemProperties.get() (private API)
    • using the getprop‘s binary
  • verify that no hooking framework is detected (see specific section below)

Debugging session check

The following checks must pass:

  • assert that android.os.Debug.isDebuggerConnected() is false
  • verify no tracer process: tracerpid entry in /proc/<pid>/status must be <= 0
  • verify that no hooking framework is detected (see specific section below)

Debug key signing

  • enumerate the app’s signatures via PackageInfo.signatures
  • use getSubjectX500Principal() to verify that no certificate has a subject distinguished name (DN) equals to "CN=Android Debug,O=Android,C=US", which is the standard DN for debug certificates generated by the SDK tools

Emulator Detection

Emulator detection is done by checking any of the below.

1) All properties defined in system/build.prop are retrieved, hashed, and matched against a small set of hard-coded hashes:

86701cb958c69d64cd59322dfebacede -> property ???
19385aafbb452f39b5079513f668bbeb -> property ???
24ad686ec83d904347c5a916acbe1779 -> property ???
b8c8255febc6c46a3e43b369225ded3e -> property ???
d76386ddf2c96a9a92fc4bc8f829173c -> property ???
15fed45d5ca405da4e6aa9805daf2fbf -> property ??? (unused)

Unfortunately, we were not able to reverse those hashes back to known property strings – however, it was tried only on AOSP emulator images. If anybody wants to help and run the below on other build.prop files, feel free to let us know what property strings those hashes match to. Here is the hash verification source, to be run be on build.prop files.

2) The following file is readable:

/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq

3) Verify if any of those qemu, genymotion and bluestacks emulator files exist and are readable:

/dev/qemu_pipe
/dev/socket/baseband_genyd
/dev/socket/genyd
/dev/socket/qemud
/sys/qemu_trace
/system/lib/libc_malloc_debug_qemu.so
/dev/bst_gps
/dev/bst_time
/dev/socket/bstfolderd
/system/lib/libbstfolder_jni.so

4) Check for the presence of wired network interfaces: (via NetworkInterface.getNetworkInterfaces)

eth0
eth1

5) If the app has the permission READ_PHONE_STATE, telephony information is verified, an emulator is detected if any of the below matches (standard emulator image settings):

- "getLine1Number": "15555215554", "15555215556", "15555215558", "15555215560", "15555215562", "15555215564", "15555215566", "15555215568", "15555215570", "15555215572", "15555215574", "15555215576", "15555215578", "15555215580", "15555215582", "15555215584"
- "getNetworkOperatorName": "android"
- "getSimSerialNumber": "89014103211118510720"
- "getSubscriberId": "310260000000000"
- "getDeviceId": "000000000000000", "e21833235b6eef10", "012345678912345"

6) /proc checks:

/proc/ioports: entry "0ff :" (unknown port, likely used by some emulators)
/proc/self/maps: entry "gralloc.goldfish.so" (GF: older emulator kernel name)

7) Property checks (done in multiple ways with a consistency checks, as explained earlier), failed if any entry is found and start with one of the provided values:

- "ro.product.manufacturer": "Genymotion", "unknown", "chromium"
- "ro.product.device": "vbox86p", "generic", "generic_x86", "generic_x86_64"
- "ro.product.model": "sdk", "emulator", "App Runtime for Chrome", "Android SDK built for x86", "Android SDK built for x86_64"
- "ro.hardware": "goldfish", "vbox86", "ranchu"
- "ro.product.brand": "generic", "chromium"
- "ro.kernel.qemu": "1"
- "ro.secure": "0"
- "ro.build.product": "sdk", "vbox86p", "full_x86", "generic_x86", "generic_x86_64"
- "ro.build.fingerprint": "generic/sdk/generic", "generic_x86/sdk_x86/generic_x86", "generic/google_sdk/generic", "generic/vbox86p/vbox86p", "google/sdk_gphone_x86/generic_x86"
- "ro.bootloader": "unknown"
- "ro.bootimage.build.fingerprint": "Android-x86"
- "ro.build.display.id": "test-"
- "init.svc.qemu-props" (any value)
- "qemu.hw.mainkeys" (any value)
- "qemu.sf.fake_camera" (any value)
- "qemu.sf.lcd_density" (any value)
- "ro.kernel.android.qemud" (any value)

Hooking Systems Detection

The term covers a wide range of techniques designed to intercept regular control flow in order to examine and/or modify execution.

1) Xposed instrumentation framework detection, by attempting to load any of the classes:

de.robv.android.xposed.XposedBridge
de.robv.android.xposed.XC_MethodHook

Class loading is done in different ways in an attempt to circumvent hooking itself, using Class.forName with a variety of class loaders, custom class loaders and ClassLoader.getLoadedClass, as well as lower-level private methods, such as Class.classForName.

2) Cydia Substrate instrumentation framework detection.

3) ADBI (Android Dynamic Binary Instrumentation) detection

4) Stack frame verification: an exception is generated in order to retrieve a stack frame. The callers are hashed and compared to an expected hard-coded value.

5) Native code checks. This will be detailed in another blog, if time allows.

Root Detection

While root detection overlaps with most of the above, it is still another layer of security a determined attacker would have to jump over (or walk around) in order to get protected apps to run on unusual systems. Checks are plenty, and as is the case for all the code described here, heavily obfuscated. If you are analyzing such files, keeping the Deobfuscators enabled and providing guard0/guard1 hints is key to a smooth analysis.

Static initializer of the principal root detection class. Most artifacts indicative of a rooted device are searched for by hash.

Build.prop checks. As was described in emulator detection.

su execution. Attempt to execute su, and verify whether su -c id == root

su presence. su is looked up in the following locations:

/data/local/
/data/local/bin/
/data/local/xbin/
/sbin/
/system/bin/
/system/bin/.ext/
/system/bin/failsafe/
/system/sd/xbin/
/system/usr/we-need-root/
/system/xbin/

Magisk detection through mount. Check whether mount can be executed and contains databases/su.db (indicative of Magisk) or whether /proc/mounts contains references to databases/su.db.

Read-only system partitions. Check if any system partition is mounted as read-write (when it should be read-only). The result of mount is examined for any of the following entries marked rw:

/system
/system/bin
/system/sbin
/system/xbin
/vendor/bin
/sbin
/etc

Verify installed apps in the hope of finding one whose package name hashes to the hard-coded value:

0x9E6AE9309DBE9ECFL

Unfortunately, that value was not reversed, let us know if you find which package name generates this hash – see the algorithm below:

    public static long hashstring(String str) {
        long h = 0L;
        for(int i = 0; i < str.length(); i++) {
            int c = str.charAt(i);
            h = h << 5 ^ (0xFFFFFFFFF8000000L & h) >> 27 ^ ((long)c);
        }
        return h;
    }

NOTE: App enumeration is performed in two ways to maximize chances of evading partial hooks.

  • Straightforward: PackageManager.getInstalledApplications
  • More convoluted: iterate over all known MAIN intents: PackageManager.queryIntentActivities(new Intent("android.intent.action.MAIN")), derive the package name from the intent via ResolveInfo.activityInfo.packageName

SElinux verification. If the file /sys/fs/selinux/policy cannot be read, the check immediately passes. If it is readable, the policy is examined and hints indicative of a rooted device are looked for by hash comparison:

472001035L
-601740789L

The hashing algorithm is extremely simple, see below. For each byte of the file, the crc is updated and compared to hard-coded values.

long h = 0L;
//for each byte:
    h = (h << 5 ^ ((long)(((char)b)))) & 0x3FFFFFFFL;
    // check h against known list

Running processes checks. All running processes and their command-lines are enumerated and hashed, and specific values are indirectly looked up by comparing against hard-coded lists.

APK Check

This verifier parses compressed entries in the APK (zip) file and compares them against well-known, hard-coded CRC values.

Manifest Check

Consistency checks on the application Manifest consists of enumerating the entries using two different ways and comparing results. Discrepancies are reported.

  • Open the archive’s MANIFEST.MF file via Context.getAssets(), parse manually
  • Use JarFile(Context.getPackageCodePath()).getManifest().getEntries()

Discrepancies in the Manifest could indicate system hooks attempting to conceal files added to the application.

Permissions Check

This routine checks for permission discrepancies between what’s declared by the app and what the system grant the app.

  • Set A: App permission gathering: all permissions requested and defined by the app, as well as all permissions offered by the system, plus the INTERACT_ACROSS_USERS and INTERACT_ACROSS_USERS_FULL permissions,
  • Set B: Retrieve all permissions that exist on the system
  • Define set C = B – A
  • For every permission in C, use checkCallingOrSelfPermission (API 22-) or checkSelfPermission (API 23+) to verify that the permission is not granted.

Permission discrepancies could be used to find out system hooks or unorthodox execution environments.

Note the “X & -(A+1) | ~X & A” checks. Several opaque arithmetic/binary expressions attempt to complicate the control flow. Here, that expression is never equals to v2, and therefore, the if-check will always fail. JEB 3.20 does not clean all those artifacts.

Miscellaneous

Other runtime components include library code to perform SSL certificate pinning, as well as obfuscated wrappers around web view clients. None of those are of particular interest.

Wrapper for android.webkit.WebViewClient. Make sure to enable deobfuscators and provide guardX hints. When this is done, most methods will be crystal clear. In fact, the majority of them are simple forwarders.

Conclusion

That’s it for the obfuscation and runtime protection facility. Key take-away to analyze such protected code:

  • Keep the obfuscators enabled
  • Locate the opaque integers, rename them to guard0/guard1 to give JEB a hint on where control flow deobfuscation should be performed, and redecompile the class

The second part in the series presents bytecode encryption and assets encryption.

  1. VM in VM, repeat ad nauseam – something not new to code protection systems, it’s existed on x86 for more than a decade, but new on Android, and other players in this field, commercial and otherwise, seem to be implementing similar solutions.
  2. So-called “RASP”, a relatively new acronym for Runtime Application Self-Protection
  3. Decryption and unreflection are generic processes of dexdec (the DEX Decompiler plugin); there is nothing specific to this protector here. The vast majority or encrypted data, regardless of the protection system in place, will be decrypted.

Improved Documentation and Manual for JEB

We have refreshed and added lots of contents to our online manual for JEB. Have a look at it here:

You will also find a copy of this manual for offline viewing in the [JEB]/doc/manual folder.

In particular, those pages contain lots of material:

It is still a work-in-progress, and more contents is added to it regularly. We are planning to kee the manual properly sync’ed with JEB’s capabilities. The next page that will receive a very large update is the Native Code Analysis section.

Again, thank you for your support. Drop us a line at support@pnfsoftware.com or come over on our Slack channel.

JEB Android Updates – Generic String Decryption, Lambda Recovery, Unreflecting Code, and More

Updated on March 11.

A note about 2020 Q1 updates (versions 3.10 to 3.16) regarding the DEX/Dalvik decompiler modules:

  • Generic String Decryption
  • Lambda Recovery
  • Unreflecting Code
  • Decompiling Java Bytecode
  • Auto-Rename All

Generic String Decryption

JEB ships with a generic deobfuscator that can perform on-the-fly string decryption and other complex optimizations. Although this optimizer performs safe (i.e., guaranteed) optimizations in most cases, it is unsafe in the general case case and therefore, may be disabled in the options. Refer to the Engines options .parsers.dcmp_dex.EnableDeobfuscators and .parsers.dcmp_dex.EmulationSupport.

Many code protectors offer options to replace immediate string constants by method invocations that perform on-the-fly decryption.

A variety of techniques exist, ranging from simple one-off trivial decryptor methods, to complex schemes involving object(s) creation, complicated decryptors injected in third-party packages, non-trivial logic, junk code meant to slow down analyzers, use of opaque predicates, etc. They are implemented in an infinite number of ways. JEB’s generic deobfuscator can perform quick, safe emulation of the intermediate representation to provide a replacement. It may sometimes fail or bail out due to several reasons, such as performance or pitfalls like anti-emulation and anti-sandboxing techniques.

Example 1

The string decryptor is a static method reading encrypted string data in a class byte array. Also note that code reflection is used.

The above code (blue box) ends up being deobfuscated to:

Example 2:

The decryption methods were injected into library packages, e.g. Gson’s

The above code is deobfuscated to:

With the generic string decryptor optimizer ON

Below, a decryptor that had been injected into the com.google.gson.Gson() class:

The concat(String, int) method is not part of standard Gson, of course. It was injected by the protector and is used by (some) code to perform on-the-fly string decryption.

Example 3:

One last example, which was involuntarily – yet, quite timely! – provided by a user:

Static fields initialized with decrypted strings.
After decryption.

Decrypting all strings: The decryptor kicks in when decompiling methods only. At the moment, if a string happens to be successfully decrypted, the optimizer does not attempt to recover all similarly encrypted strings in the code, although it is most certainly an addition that will make it in a future software update.

Rendering: You may quickly identify decrypted strings in the client as they are rendered using a special color associated with the itemId STRING_GENERATED, by default rendered in a flashy pink color in light and dark themes. Hovering over such items will bring up a pop-up with additional origin information, like the underlying code that would have generated that string:

Auto-decrypted strings in pink; overlays with source/origin information.

API:
– From a DEX perspective: Generated strings are artificial. Therefore, IDexString.isArtificial() would return true.
– From a Java/AST perspective: IJavaConstant objects that embed origin information do so using the “origin” tag. Use IJavaConstant.getTags().get("origin") to retrieve it.

Lambda Recovery

JEB attempts to perform Java 8 style lambda recovery and reconstruction.

Desugared Lambdas

Recovery and reconstruction does not rely on any type of metadata 1, such as special prefixes -$$Lambda$ for classes and methods implementing desugared lambdas in dex 37-.

You may therefore see constructs like this:

This DEX file contains desugared, non-obfuscated lambdas.
This DEX file contained desugared, obfuscated lambdas

Options: Lambda reconstruction can be disabled in the options (Edit, Options, Engines, …). Lambda rendering can also be disabled in the options, as well as on-demand by right-clicking a decompiled view, Rendering Options….

Lambdas options

API Note: In the above cases, the underlying Java AST may be a IJavaNew or IJavaStaticField node. This is not the case for real (not desugared) lambdas, which map to an IJavaCall node – see below.

Real Lambdas

Lambda reconstruction also takes place when the code has not been desugared (which is rare!), i.e. code relying on dex38’s invoke-custom and invoke-polymorphic.

This DEX file contains real lambdas implemented via invoke-custom

API Note: Such lambdas map to an IJavaCall node for which isCustomCall() will return true.

Unreflecting Code

Many code protectors make heavy use of reflection – combined with string encryption, as we’ll see below – to obfuscate code. In practice, reflection is limited to method invocation (static and virtual), static and non-static field setting and getting, and new instance creation. A few examples:

v = Class.forName("java.lang.Integer").getMethod("valueOf",
        String.class).invoke(null, str);
// instead of 
v = Integer.valueOf(str);
Class.forName("SomeClassName").getField("b").setInt(x, 4);
// instead of 
x.b = 4;
Class.forName("java.lang.String").getConstructor(byte[].class)
        .newInstance(val);
// instead of
new String(arg6);

Such code is generally protected by a catch-all handler that forwards the cause of any exception raised by a reflection issue:

try {
    // ...
}
catch(Throwable e) {
    throw e.getCause();
}

By default, JEB will attempt to unreflect code. This deobfuscator is potentially unsafe and may be disabled in the options. Note that you always have the ability to choose, for a particular decompilation, whether some options should be temporarily enabled or disabled, by pressing CTRL+TAB (or COMMAND+TAB on macOS) to decompile (same as menu Action, Decompile with options…).

Unsafe deobfuscators can be globally disabled.

So, in a nutshell, code normally decompiled to:

Reflection, not cleaned (malware was obfuscated)

will be decompiled to:

With reflection cleaned

Technical Note: This optimizer works on the Intermediate Representation manipulated by the decompiler, not to be confused with the AST rendered as its output. (The AST cleaner that was described in an older post is more limited than this IR optimizer.)

Last-step failures: Successfully unreflecting code eventually depends on being able to find the intended target method or field matching the provided description (method parameter types or field type). Failure to do so will generate a log like "A candidate field/method/constructor for unreflection was not found".

Decompiling Java Bytecode

JEB supports JLS bytecode decompilation for *.class files and jar-like archives (jar, war, ear, etc.). The Java bytecode is converted to Dalvik using Android’s dx by default. Users may choose to use d8 (not recommended for now) instead by selecting so in the Options.

The resulting DEX file(s) are processed as usual.

You may use this to decompile Android Library files (*.aar files) in JEB.

Examining the android-arch-core-runtime library

Auto-Rename All

JEB 3.13 introduced a new generic action, Auto-Rename All. Its implementation is at the discretion of code plugins. The DEX plugin implements it, therefore users may execute Action, Auto-Rename All… at any time (generally after processing an obfuscated file) in order to rename code items such as field, method, or class names, to something more easily processable for our -limited- human brains.

Look at this horrendous obfuscation scheme below. It’s using right-to-left unicode characters to seriously mess up rendering:

Obfuscated name using RTL Arabic characters

Let’s run Action, Auto-Rename All… on this file:

Auto-Renaming capabilities are provided (optionally) by plugins.
After auto-renaming code items in the above file. Not clearer in terms of meaning, but at least, it’s something we can start working on.

As usual, feel free to join us on Slack, message us on Twitter, or email us privately at support@pnfsoftware.com.

Until next time!

  1. Relying on metadata leads to false negatives in the best case – e.g., when the code has been minified by something like ProGuard; it leads to false positives in the worst case – e.g. forged metadata to incite the decompiler to generate inaccurate or wrong code.

Analyzing Golang Executables

The Go programming language (also known as Golang) has gained popularity during the last few years among malware developers . This can certainly be explained by the relative simplicity of the language, and the cross-compilation ability of its compiler, allowing multi-platform malware development without too much effort.

In this blog post, we dive into Golang executables reverse engineering, and present a Python extension for JEB decompiler to ease Golang analysis; here is the table of content:

  1. Golang Basics for Reverse Engineers
  2. Making JEB Great for Golang
    1. Current Status
    2. Finding (and Naming) Routines
    3. Strings Recovery
    4. Types Recovery
  3. Use-Case: Analysis of StealthWorker Malware

The JEB Python script presented in this blog can be found on our GitHub page. Make sure to update JEB to version 3.7+ before running it.

Disclaimer: the analysis in this blog post refers to the current Golang version (1.13) and part of it might become outdated with future releases.

Golang Basics for Reverse Engineers

Feel free to skip this part if you’re already familiar with Golang reverse engineering.

Let’s start with some facts that reverse engineers might find interesting to know before analyzing their first Golang executable.

1. Golang is an open-source language with a pretty active development community. The language was originally created at Google around 2007, and version 1.0 was released in March 2012. Since then, two major versions are released each year.

2. Golang has a long lineage: in particular many low-level implementation choices — some would say oddities — in Golang can be traced back to Plan9, a distributed operating system on which some Golang creators were previously working.

3. Golang has been designed for concurrency, in particular by providing so-called “goroutines“, which are lightweight threads executing concurrently (but not necessarily in parallel).

Developers can start a new goroutine simply by prefixing a function call by go. A new goroutine will then start executing the function, while the caller goroutine returns and continues its execution concurrently with the callee. Let’s illustrate that with the following Golang program:

func myDummyFunc(){
	time.Sleep(1 * time.Second)
	fmt.Println("dummyFunc executed")
}

func main(){
	
	myDummyFunc() // normal call
	fmt.Println("1 - back in main")
	
	go myDummyFunc() // !! goroutine call
	fmt.Println("2 - back in main")

	time.Sleep(3 * time.Second)
}

Here, myDummyFunc() is called once normally, and then as a goroutine. Compiling and executing this program results in the following output:

dummyFunc executed
1 - back in main
2 - back in main
dummyFunc executed

Notice how the execution was back in main() before executing the second call to dummyFunc().

Implementation-wise, many goroutines can be executed on a single operating system thread. Golang runtime takes care of switching goroutines, e.g. whenever one executes a blocking system call. According to the official documentationIt is practical to create hundreds of thousands of goroutines in the same address space“.

What makes goroutines so “cheap” to create is that they start with a very limited stack space (2048 bytes — since Golang 1.4), which will be increased when needed.

One of the noticeable consequence for reverse engineers is that native routines (almost) all start with the same prologue. Its purpose is to check if the current goroutine’s stack is large enough, as can be seen in the following CFG:

Fig. 1: Simplified x86 CFG with Golang prologue for stack growth

When the stack space is nearly exhausted, more space will be allocated — actually, the stack will be copied somewhere with enough free space. This particular prologue is present only in routines with local variables.

How to distinguish a goroutine call from a “normal” call when analyzing a binary? Goroutine calls are implemented by calling runtime.newproc, which takes in input the address of the native routine to call, the size of its arguments, and then the actual routine’s arguments.

4. Golang has a concurrent garbage collector (GC): Golang’s GC can free memory while other goroutines are modifying it.

Roughly speaking, when the GC is freeing memory, goroutines report to it all their memory writes — to prevent concurrent memory modifications to be missed by the current freeing phase. Implementation-wise, when the GC is in the process of marking used memory, all memory writes pass through a “write barrier, which performs the write and informs the GC.

For reverse engineers this can result in particularly convoluted control flow graphs (CFG). For example, here is the CFG when a global variable globalString is set to newValue:

Fig. 2: Write to global variable globalString (x86 CFG):
before doing the memory write, the code checks if the write barrier is activated,
and if yes calls runtime.gcWriteBarrier()

Not all memory writes are monitored in that manner; the rules for write barriers’ insertion are described in mbarrier.go.

5. Golang comes with a custom compiler tool chain (parser, compiler, assembler, linker), all implemented in Golang. 1 2

From a developer’s perspective, it means that once Go is installed on a machine, one can compiled for any supported platform (making Golang a language of choice for IoT malware developers). Examples of supported platforms include Windows x64, Linux ARM and Linux MIPS (see “valid combinations of $GOOS and $GOARCH“).

From a reverse engineer’s perspective, the custom Go compiler toolchain means Golang binaries sometimes come with “exotic” features (which therefore can give a hard time to reverse engineering tools).

For example, symbols in Golang Windows executables are implemented using the COFF symbol table (while officiallyCOFF debugging information [for executable] is deprecated“). The Golang COFF symbol implementation is pretty liberal: symbols’ type is set to a default value — i.e. there is no clear distinction between code and data.

As another example, Windows PE read-only data section “.rdata” has been defined as executable in past Go versions.

Interestingly, Golang compiler internally uses pseudo assembly instructions (with architecture-specific registers). For example, here is a snippet of pseudo-code for ARM (operands are ordered with source first):

MOVW $go.string."hello, world\n"(SB), R0
MOVW R0, 4(R13)
MOVW $13, R0
MOVW R0, 8(R13)
CALL "".dummyFunc(SB)
MOVW.P 16(R13), R15


These pseudo-instructions could not be understood by a classic ARM assembler (e.g. there is no CALL instruction on ARM). Here are the disassembled ARM instructions from the corresponding binary:

LDR R0, #451404h // "hello, world\n" address
STR R0, [SP, #4]
MOV R0, #13
STR R0, [SP, #8]
BL main.dummyFunc
LDR PC, [SP], #16

Notice how the same pseudo-instruction MOVW got converted either as STR or MOV machine instructions. The use of pseudo-assembly comes from Plan9, and allows Golang assembler parser to easily handle all architectures: the only architecture-specific step is the selection of machine instructions (more details here).

6. Golang uses by default a stack-only calling convention.

Let’s illustrate that with the following diagram, showing the stack’s state when a routine with two integer parameters a and b, and two return values — declared in Go as “func myRoutine(a int, b int) (int, int)” — is called:

Fig. 3: Simplified stack view (stack grows downward), when a routine with two parameters and two return values is called . The return values are reserved slots for the callee.

It is the caller’s responsibilities to reserve space for the callees’ parameters and returned values, and to free it later on.

Note that Golang’s calling convention situation might soon change: since version 1.12, several calling conventions can coexist — the stack-only calling convention remaining the default one for backward compatibility reasons.

7. Golang executables are usually statically-linked, i.e. do not rely on external dependencies 3. In particular they embed a pretty large runtime environment. Consequently, Golang binaries tend to be large: for example, a “hello world” program compiled with Golang 1.13 is around 1.5MB with its symbols stripped.

8. Golang executables embed lots of symbolic information:

  • Debug symbols, implemented as DWARF symbols. These can be stripped at compilation time (command-line option -ldflags "-w") .
  • Classic symbols for each executable file format (PE/ELF/Mach-O). These can be stripped at compilation time (command-line option -ldflags "-s").
  • Go-specific metadata, including for example all functions’ entry points and names, and complete type information. These metadata cannot (easily) be stripped, because Golang runtime needs them: for example, functions’ information are needed to walk the stack for errors handling or for garbage collection, while types information serve for runtime type checks.

Of course, Go-specific metadata are very good news for reverse engineers, and parsing these will be one of the purpose of the JEB’s Python extension described in this blog post.

Making JEB Great for Golang

Current Status

What happens when opening a Golang executable in JEB? Let’s start from the usual “hello world” example:

package main

import "fmt"

func main() {
	fmt.Printf("hello, world\n")
}

If we compile it for as a Windows x64 PE file, and open it in JEB, we can notice that its code has only been partially disassembled. Unexplored memory areas can indeed be seen next to code areas in the native navigation bar (right-side of the screen by default):

Fig.4: Navigation bar for Golang PE file
(blue is code, green is data, grey represents area without any code or data)

We can confirm that the grey areas surrounding the blue areas are code, by manually disassembling them (hotkey ‘C’ by default).

Why did JEB disassembler miss this code? As can be seen in the Notifications window, the disassembler used a CONSERVATIVE strategy, meaning that it only followed safe control flow relationships (i.e. branches with known targets) 4.

Because Go runtime calls most native routines indirectly, in particular when creating goroutines, JEB disassembler finds little reliable control flow relationships, explaining why some code areas remain unexplored.

Before going on, let’s take a look at the corresponding Linux executable, which we can obtain simply by setting environment variable $GOOS to linux before compiling. Opening the resulting ELF file in JEB brings us in a more positive situation:

Fig. 5: Navigation bar for Golang ELF file
(blue is code, green is data, grey represents area without any code or data)

Due to the use by default of AGGRESSIVE strategy for disassembling ELF files, JEB disassembler found the whole code area (all code sections were linearly disassembled). In particular this time we can see our main routine, dubbed main.main by the compiler:

Fig. 6: Extract of main.main routine’s disassembly

Are data mixed with code in Golang executables? If yes, that would make AGGRESSIVE disassembly a risky strategy. At this moment (version 1.13 with default Go compiler), this does not seem to be the case:

– Data are explicitly stored in different sections than code, on PE and ELF.

Switch statements are not implemented with jumptables — a common case of data mixed with code, e.g. in Visual Studio or GCC ARM. Note that Golang provides several switch-like statements, as the select statement or the type switch statement.

As anything Golang related, the situation might change in future releases (for example, there is still an open discussion to implement jumptables for switch).

Yet, there is still something problematic in our ELF disassembly: the “hello world” string was not properly defined. Following the reference made by LEA instruction in the code, we reach a memory area where many strings have indeed been misrepresented as 1-byte data items:

Fig. 7: Dump of the memory area containing strings. Only the first byte of the strings is defined.

Now that we have a better idea of JEB’s current status, we are going to explain how we extended it with a Python script to ease Golang analysis.

Finding and Naming Routines

The first problem on our road is the incomplete control flow, specially on Windows executables. At first, it might seem that PE files disassembly could be improved simply by setting disassembler’s strategy to AGGRESSIVE, exactly as for ELF files. While it might be an acceptable quick solution, we can actually improve the control flow in a much safer way by parsing Go metadata.

Parsing “Pc Line Table”

Since version 1.2, Golang executables embed a structure called “pc line table”, also known as pclntab. Once again, this structure (and its name) is an heritage from Plan9, where its original purpose was to associate a program counter value (“pc”) to another value (e.g. a line number in the source code).

The structure has evolved, and now contains a function symbol table, which stores in particular the entry points and names of all routines defined in the binary. The Golang runtime uses it in particular for stack unwinding, call stack printing and garbage collection.

In others words, pclntab cannot be easily stripped from a binary, and provide us a reliable way to improve our disassembler’s control flow!

First, our script locates pclntab structure (refer to locatePclntab() for the details):

  # non-stripped binary: use symbol
  if findSymbolByName(golangAnalyzer.codeContainerUnit, 'runtime.pclntab') != None:
      pclntabAddress = findSymbolByName(..., 'runtime.pclntab')
  
  # stripped binary
  else:
    # PE: brute force search in .rdata. or in all binary if section not present
    if [...].getFormatType() == WellKnownUnitTypes.typeWinPe
    [...]
  
    # ELF: .gopclntab section if present, otherwise brute force search
    elif [...].getFormatType() == WellKnownUnitTypes.typeLinuxElf:
    [...]

On stripped binaries (i.e. without classic symbols), we search memory for the magic constant 0xFFFFFFFB starting pclntab, and then runs some checks on the possible fields. Note that it is usually easier to parse Golang ELF files, as important runtime structures are stored in distinct sections.

Second, we parse pclntab and use its function symbol table to disassemble all functions and rename them:

[...]
# enqueue function entry points from pclntab and register their names as labels
for myFunc in pclntab.functionSymbolTable.values():   
 nativeCodeAnalyzer.enqueuePointerForAnalysis(EntryPointDescription(myFunc.startPC), INativeCodeAnalyzer.PERMISSION_FORCEFUL)
 if rename:
   labelManager.setLabel(myFunc.startPC, myFunc.name, True, True, False)

# re-run disassembler with the enqueued entry points
self.nativeCodeAnalyzer.analyze()

Running this on our original PE file allows to discover all routines, and gives the following navigation bar:

Fig. 8: Navigation bar for Golang PE file after running the script
(blue is code, green is data, grey represents area without any code or data)

Interestingly, a few Golang’s runtime routines provide hints about the machine used to compile the binary, for example:

runtime.schedinit(): references Go’s build version. Knowing the exact version allows to investigate possible script parsing failures (as some internal structures might change depending on Go’s version).

runtime.GOROOT(): references Go’s installation folder used during compilation. This might be useful for malware tracking.

These routines are present only if the rest of the code relies on them. If it is the case, FunctionsFinder module highlights them in JEB’s console, and the user can then examine them.

The Remaining Unnamed Routines

Plot twist! A few routines found by the disassembler remain nameless even after FunctionsFinder module parsed pclntab structure. All these routines are adjacent in memory and composed of the same instructions, for example:

Fig. 9: Series of unnamed routines in x86

Long story short, these routines are made for zeroing or copying memory blobs, and are part of two large routines respectively named duff_zero and duff_copy.

These large routines are Duff’s devices made for zeroing/copying memory. They are generated as long unrolled loops of machine instructions. Depending on how many bytes need to be copied/zeroed the compiler will call directly on a particular instruction. For each of these calls, a nameless routine will then be created by the disassembler.

DuffDevicesFinder module identifies such routines with pattern matching on assembly instructions. By counting the number of instructions, it then renames them duff_zero_N/duff_copy_N, with N the number of bytes zeroed/copied.

Source Files

Interestingly, pclntab structure also stores original source filespaths. This supports various Golang’s runtime features, like printing meaningful stack traces, or providing information on callers from a callee (see runtime.Caller()). Here is an example of a stack trace obtained after a panic():

PANIC
goroutine 1 [running]:
main.main()
        C:/Users/[REDACTED]/go/src/hello_panic/hello_panic.go:4 +0x40

The script extracts the list of source files and print them in logs.

Strings Recovery

The second problem we initially encountered in JEB was the badly defined strings.

What Is a String?

Golang’s strings are stored at runtime in a particular structure called StringHeader with two fields:

type StringHeader struct {
        Data uintptr       // string value
        Len  int           // string size 
}

The string’s characters (pointed by the Data field) are stored in data sections of the executables, as a series of UTF-8 encoded characters without null-terminators.

Dynamic Allocation

StringHeader structures can be built dynamically, in particular when the string is local to a routine. For example:

Fig. 10: StringHeader instantiation in x86

By default JEB disassembler defines a 1-byte data item (gvar_4AFB52 in previous picture) for the string value, rather than a proper string, because:

  • As the string value is referenced only by LEA instruction, without any hints on the data type (LEA is just loading an “address”), the disassembler cannot type the pointed data accordingly.
  • The string value does not end with a null-terminator, making JEB’s standard strings identification algorithms unable to determine the string’s length when scanning memory.

To find these strings, StringsBuilder module searches for the particular assembly instructions usually used for instantiating StringHeader structures (for x86/x64, ARM and MIPS architectures). We can then properly define a string by fetching its size from the assembly instructions. Here is an example of recovered strings:

Of course, this heuristic will fail if different assembly instructions are employed to instantiate StringHeader structures in future Golang compiler release (such change happened in the past, e.g. x86 instructions changed with Golang 1.8).

Static Allocation

StringHeader can also be statically allocated, for example for global variables; in this case the complete structure is stored in the executable. The code referencing such strings employs many different instructions, making pattern matching not suitable.

To find these strings, we scan data sections for possible StringHeader structures (i.e. a Data field pointing to a printable string of size Len). Here is an example of recovered structures:

Fig. 13: Reconstructed StringHeader

The script employs two additional final heuristics, which scan memory for printable strings located between two already-defined strings. This allows to recover strings missed by previous heuristics.

When a small local string is used for comparison only, no StringHeader structure gets allocated. The string comparison is done directly by machine instructions; for example, CMP [EAX], 0x64636261 to compare with “abcd” on x86.

Types Recovery

Now that we extended JEB to handle the “basics” of Golang analysis, we can turn ourselves to what makes Golang-specific metadata particularly interesting: types.

Golang executables indeed embed descriptions for all types manipulated in the binary, including in particular those defined by developers.

To illustrate that, let’s compile the following Go program, which defines a Struct (Golang’s replacement for classes) with two fields:

package main

type DummyStruct struct{
	boolField bool
	intField int
}

func dummyFunc(s DummyStruct) int{
	return 13 * s.intField
}

func main(){
	s := DummyStruct{boolField: true, intField:37}
	t := dummyFunc(s)
	t += 1
}

Now, if we compile this source code as a stripped x64 executable, and analyze it with TypesBuilder module, the following structure will be reconstructed:

Fig. 14: Structure reconstructed by TypesBuilder, as seen in JEB’s type editor

Not only did we get the structure and its fields’ original names, but we also retrieved the structure’s exact memory layout, including the padding inserted by the compiler to align fields. We can confirm DummyStruct‘s layout by looking at its initialization code in main():

Fig. 15: DummyStruct initialization: intField starts at offset 8, as extracted from type information

Why So Much Information?

Before explaining how TypesBuilder parses types information, let’s first understand why these information are needed at all. Here are a few Golang features that rely on types at runtime:

  • Dynamic memory allocation, usually through a call to runtime.newobject(), which takes in input the description of the type to be allocated
  • Dynamic type checking, with statements like type assertions or type switches. Roughly speaking, two types will be considered equals if they have the same type descriptions.
  • Reflection, through the built-in package reflect, which allows to manipulate objects of unknown types from their type descriptions

Golang type descriptions can be considered akin to C++ Run-Time Type Information, except that there is no easy way to prevent their generation by the compiler. In particular, even when not using reflection, types descriptors remain present.

For reverse engineers, this is another very good news: knowing types (and their names) will help understanding the code’s purpose.

Of course, it is certainly doable to obfuscate types, for example by giving them meaningless names at compilation. We did not find any malware using such technique.

What Is A Type?

In Golang each type has an associated Kind, which can take one the following values:

const (
    Invalid Kind = iota
    Bool
    Int
    Int8
    Int16
    Int32
    Int64
    Uint
    Uint8
    Uint16
    Uint32
    Uint64
    Uintptr
    Float32
    Float64
    Complex64
    Complex128
    Array
    Chan
    Func
    Interface
    Map
    Ptr
    Slice
    String
    Struct
    UnsafePointer
)

Alongside types usually seen in programming languages (integers, strings, boolean, maps, etc), one can notice some Golang-specific types:

  • Array: fixed-size array
  • Slice: variable-size view of an Array
  • Func: functions; Golang’s functions are first-class citizens (for example, they can be passed as arguments)
  • Chan: communication channels for goroutines
  • Struct: collection of fields, Golang’s replacement for classes
  • Interface: collection of methods, implemented by Structs

The type’s kind is the type’s “category”; what identifies the type is its complete description, which is stored in the following rtype structure:

    type rtype struct {
      size       uintptr
      ptrdata    uintptr  // number of bytes in the type that can contain pointers
      hash       uint32   // hash of type; avoids computation in hash tables
      tflag      tflag    // extra type information flags
      align      uint8    // alignment of variable with this type
      fieldAlign uint8    // alignment of struct field with this type
      kind       uint8    // enumeration for C
      alg        *typeAlg // algorithm table
      gcdata     *byte    // garbage collection data
      str        nameOff  // string form
      ptrToThis  typeOff  // type for pointer to this type, may be zero
    }

The type’s name is part of its description (str field). This means that, for example, one could define an alternate integer type with type myInt int, and myInt and int would then be distinct types (with distinct type descriptors, each of Int kind). In particular, assigning a variable of type myInt to a variable of type int would necessitate an explicit cast.

The rtype structure only contains general information, and for non-primary types (Struct, Array, Map,…) it is actually embedded into another structure (as the first field), whose remaining fields provides type-specific information.

For example, here is strucType, the type descriptor for types with Struct kind:

    type structType struct {
      rtype
      pkgPath name          
      fields  []structField
    }

Here, we have in particular a slice of structField, another structure describing the structure fields’ types and layout.

Finally, types can have methods defined on them: a method is a function with a special argument, called the receiver, which describes the type on which the methods applies. For example, here is a method on MyStruct structure (notice receiver’s name after func):

func (myStruct MyStruct) method1() int{
    ...
}

Where are methods’ types stored? Into yet another structure called uncommonType, which is appended to the receiver’s type descriptor. In other words, a structure with methods will be described by the following structure:

type UncommonStructType struct {
      rtype
      structType
      uncommonType
}

Here is an example of such structure, as seen in JEB after running TypesBuilder module:

Fig. 16: Type descriptor for a structure with methods:
StrucType (with embedded rtype, and referencing StructField),
followed by UncommonType (referencing MethodType)

Parsing type descriptors can therefore be done by starting from rtype (present for all types), and adding wrapper structures around it, if needed. Properly renaming type descriptors in memory greatly helps the analysis, as these descriptors are passed as arguments to many runtime routines (as we will see in StealthWorker’s malware analysis).

The final step is to transform the type descriptors into the actual types — for example, translating a structType into the memory representation of the corresponding structure –, which can then be imported in JEB types. For now, TypesBuilder do this final import step for named structures only.

Describing in details all Golang’s type descriptors is out-of-scope for this blog. Refer to TypesBuilder module for gory details.

Locating Type Descriptors

The last question we have to examine is how to actually locate type descriptors in Golang binaries. This starts with a structure called moduledata, whose purpose is to “record information about the layout of the executable“:

    type moduledata struct {
      pclntable    []byte
      ftab         []functab
      filetab      []uint32
      findfunctab  uintptr
      minpc, maxpc uintptr

      text, etext           uintptr
      noptrdata, enoptrdata uintptr
      data, edata           uintptr
      bss, ebss             uintptr
      noptrbss, enoptrbss   uintptr
      end, gcdata, gcbss    uintptr
      types, etypes         uintptr

      textsectmap []textsect
      typelinks   []int32 // offsets from types
      itablinks   []*itab

      [...REDACTED...]
    }

This structure defines in particular a range of memory dedicated to storing type information (from types to etypes). Then, typelink field stores offsets in the range where type descriptors begin.

So first we locate moduledata, either from a specific symbol for non-stripped binaries, or through a brute-force search. For that, we search for the address of pclntab previously found (first moduledata field), and then apply some checks on its fields.

Second, we start the actual parsing of the types range, which is a recursive process as some types reference others types, during which we apply the type descriptors’ structures.

There is no backward compatibility requirement on runtime’s internal structures — as Golang executables embed their own runtime. In particular, moduledata and type descriptions are not guaranteed to stay backward compatible with older Golang release (and they were already largely modified since their inception).

In others words, TypesBuilder module’s current implementation might become outdated in future Golang releases (and might not properly work on older versions).

Use-Case: StealthWorker

We are now going to dig into a malware dubbed StealthWorker. This malware infects Linux/Windows machines, and mainly attempts to brute-force web platforms, such as WordPress, phpMyAdmin or Joomla. Interestingly, StealthWorker heavily relies on concurrency, making it a target of choice for a first analysis.

The sample we will be analyzing is a x86 Linux version of StealthWorker, version 3.02, whose symbols have been stripped (SHA1: 42ec52678aeac0ddf583ca36277c0cf8ee1fc680)

Reconnaissance

Here is JEB’s console after disassembling the sample and running the script with all modules activated (FunctionsFinder, StringsBuilder, TypesBuilder, DuffDevicesFinder, PointerAnalyzer):

>>> Golang Analyzer <<<
> pclntab parsed (0x84B79C0)
> first module data parsed (0x870EB20)
> FunctionsFinder: 9528 function entry points enqueued (and renamed)
> FunctionsFinder: running disassembler... OK
 > point of interest: routine runtime.GOROOT (0x804e8b0): references Go root path of developer's machine (sys.DefaultGoroot)
 > point of interest: routine runtime.schedinit (0x8070e40): references Go version (sys.TheVersion)
> StringsBuilder: building strings... OK (4939 built strings)
> TypesBuilder: reconstructing types... OK (5128 parsed types - 812 types imported to JEB - see logs)
> DuffDevicesFinder: finding memory zero/copy routines... OK (93 routines identified)
> PointerAnalyzer: 5588 pointers renamed
> see logs (C:\[REDACTED]\log.txt)

Let’s start with some reconnaissance work:

  • The binary was compiled with Go version 1.11.4 (referenced in runtime.schedinit‘s code, as mentioned by the script’s output)
  • Go’s root path on developer’s machine is /usr/local/go (referenced by runtime.GOROOT‘s code)
  • Now, let’s turn to the reconstructed strings; there are too many to draw useful conclusions at this point, but at least we got an interesting IP address (spoiler alert: that’s the C&C’s address):
Fig. 17: Extract of StealthWorker’s strings
as seen in JEB after running the script
  • More interestingly, the list of source files extracted from pclntab (outputted in the script’s log.txt) shows a modular architecture:
> /home/user/go/src/AutorunDropper/Autorun_linux.go
> /home/user/go/src/Check_double_run/Checker_linux.go
> /home/user/go/src/Cloud_Checker/main.go
> /home/user/go/src/StealthWorker/WorkerAdminFinder/main.go
> /home/user/go/src/StealthWorker/WorkerBackup_finder/main.go
> /home/user/go/src/StealthWorker/WorkerBitrix_brut/main.go
> /home/user/go/src/StealthWorker/WorkerBitrix_check/main.go
> /home/user/go/src/StealthWorker/WorkerCpanel_brut/main.go
> /home/user/go/src/StealthWorker/WorkerCpanel_check/main.go
> /home/user/go/src/StealthWorker/WorkerDrupal_brut/main.go
> /home/user/go/src/StealthWorker/WorkerDrupal_check/main.go
> /home/user/go/src/StealthWorker/WorkerFTP_brut/main.go
> /home/user/go/src/StealthWorker/WorkerFTP_check/main.go
> /home/user/go/src/StealthWorker/WorkerHtpasswd_brut/main.go
> /home/user/go/src/StealthWorker/WorkerHtpasswd_check/main.go
> /home/user/go/src/StealthWorker/WorkerJoomla_brut/main.go
> /home/user/go/src/StealthWorker/WorkerJoomla_check/main.go
> /home/user/go/src/StealthWorker/WorkerMagento_brut/main.go
> /home/user/go/src/StealthWorker/WorkerMagento_check/main.go
> /home/user/go/src/StealthWorker/WorkerMysql_brut/main.go
> /home/user/go/src/StealthWorker/WorkerOpencart_brut/main.go
> /home/user/go/src/StealthWorker/WorkerOpencart_check/main.go
> /home/user/go/src/StealthWorker/WorkerPMA_brut/main.go
> /home/user/go/src/StealthWorker/WorkerPMA_check/WorkerPMA_check.go
> /home/user/go/src/StealthWorker/WorkerPostgres_brut/main.go
> /home/user/go/src/StealthWorker/WorkerSSH_brut/main.go
> /home/user/go/src/StealthWorker/WorkerWHM_brut/main.go
> /home/user/go/src/StealthWorker/WorkerWHM_check/main.go
> /home/user/go/src/StealthWorker/WorkerWP_brut/main.go
> /home/user/go/src/StealthWorker/WorkerWP_check/main.go
> /home/user/go/src/StealthWorker/Worker_WpInstall_finder/main.go
> /home/user/go/src/StealthWorker/Worker_wpMagOcart/main.go
> /home/user/go/src/StealthWorker/main.go

Each main.go corresponds to a Go package, and its quite obvious from the paths that each of them targets a specific web platform. Moreover, there seems to be mainly two types of packages: WorkerTARGET_brut, and WorkerTARGET_check.

There are no information regarding the time of compilation in Golang executables. In particular executables’ timestamps have been set to a fixed value at compilation, in order to always generate the same executable from a given input.

  • Let’s dig a bit further by looking at main package, which is where execution begins; here are its routines with pretty informative names:
Fig. 18: main’s package routines

Additionally there is a series of type..hash* and type..eq* methods for main package:

Fig. 19: Hashing methods (automatically generated for complex types)

These methods are automatically generated for types equality and hashing, and therefore their presence indicates that non-trivial custom types are used in main package (as we will see below).

We can also examine main.init() routine. The init() routine is generated for each package by Golang’s compiler to initialize others packages that this package relies on, and the package’s global variables:

Fig. 20: Packages initialization from main.init()

Along the previously seen packages, one can notice some interesting custom packages:

  • github.com/remeh/sizedwaitgroup: a re-implementation of Golang’s WaitGroup — a mechanism to wait for goroutines termination –, but with a limit in the amount of goroutines started concurrently. As we will see, StealthWorker’s developer takes special care to not overload the infected machine.
  • github.com/sevlyar/go-daemon: a library to write daemon processes in Go.

Golang packages’ paths are part of a global namespace, and it is considered best practice to use GitHub’s URLs as package paths for external packages to avoid conflicts.

Concurrent Design

In this blog, we will not dig into each StealthWorker’s packages implementation, as it has been already been done several times. Rather, we will focus on the concurrent design made to organize the work between these packages.

Let’s start with an overview of StealthWorker’s architecture:

Fig. 21: StealthWorker’s design overview

At first, a goroutine executing getActiveProject() regularly retrieves a list of “projects” from the C&C server. Each project is identified by a keyword (wpChk for WordPress checker, ssh_b for SSH brute-forcer, etc).

From there, the real concurrent work begins: five goroutines executing PrepareTaskFunc() retrieve a list of targets for each project, and then distribute work to “Workers”. There are several interesting quirks here:

  • To allow PrepareTaskFunc() goroutines to communicate with Worker() goroutines, a Channel is instantiated:
Fig. 22: Channel’s instantiation

As can be seen from the channel type descriptor — parsed and renamed by the script –, the Channel is made for objects of type interface {}, the empty interface. In others words, objects of any type can be sent and received through it (because “direction:both”).

PrepareTaskFunc() will then receive from the C&C server a list of targets for a given project — as JSON objects –, and for each target will instantiate a specific structure. We already noticed these structures when looking at main package’s routines, here are their reconstructed form in the script’s logs:

> struct main.StandartBrut (4 fields):
    - string Host (offset:0)
    - string Login (offset:8)
    - string Password (offset:10)
    - string Worker (offset:18)

> struct main.StandartChecker (5 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string Worker (offset:20)

> struct main.WPBrut (5 fields):
    - string Host (offset:0)
    - string Login (offset:8)
    - string Password (offset:10)
    - string Worker (offset:18)
    - int XmlRpc (offset:20)

> struct main.StandartBackup (7 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string FileName (offset:20)
    - string Worker (offset:28)
    - int64 SLimit (offset:30)

> struct main.WpMagOcartType (5 fields):
    - string Host (offset:0)
    - string Login (offset:8)
    - string Password (offset:10)
    - string Worker (offset:18)
    - string Email (offset:20)

> struct main.StandartAdminFinder (6 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string FileName (offset:20)
    - string Worker (offset:28)

> struct main.WPChecker (6 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string Worker (offset:20)
    - int Logins (offset:28)

Note that all structures have Worker and Host fields. The structure (one per target) will then be sent through the channel.

  • On the other side of the channel, a Worker() goroutine will fetch the structure, and use reflection to generically process it (i.e. without knowing a priori which structure was sent):
Fig. 23: StealthWorker’s use of reflection to retrieve a field from an unknown structure

Finally, depending on the value in Worker field, the corresponding worker’s code will be executed. There are two types of workers: brute-forcing workers, which try to login into the target through a known web platform, and checking workers, which test the existence of a certain web platform on the target.

From a design point-of-view, there is a difference between the two types of workers: checking workers internally relies on another Channel, in which the results are going to be written, and fetched by another goroutine named saveGood(), which reports to the C&C. On the other hand, brute-forcing workers do their task and directly report to the C&C server.

  • Interestingly, the maximum number of Worker() goroutines can be configured by giving a parameter to the executable (preceded by the argument dev). According to the update mechanism, it seems that the usual value for this maximum is 400. Then, the previously mentioned SizedWaitGroup package serves to ensure the number of goroutines stay below this value:
Fig. 24: Worker’s creation loop
SizeWaitGroup.Add() is blocking when the maximum number of goroutines has been reached. Each main.Worker() will release its slot when terminating.

We can imagine that the maximum amount of workers is tuned by StealthWorker’s operators to lower the risk of overloading infected machines (and drawing attention).

There are two additional goroutines, respectively executing routines KnockKnock() and CheckUpdate(). Both of them simply run specific tasks concurrently (and infinitely): the former sends a “ping” message to the C&C server, while the latter asks for an updated binary to execute.

What’s Next? Decompilation!

The provided Python script should allow users to properly analyze Linux and Windows Golang executables with JEB. It should also be a good example of what can be done with JEB API to handle “exotic” native platforms.

Regarding Golang reverse engineering, for now we remained at disassembler level, but decompiling Golang native code to clean pseudo-C is clearly a reachable goal for JEB. There are a few important steps to implement first, like properly handling Golang stack-only calling convention (with multiple return values), or generating type libraries for Golang runtime.

So… stay tuned for more Golang reverse engineering!

As usual, if you have questions, comments or suggestions, feel free to:

References

A few interesting reading for reverse engineers wanting to dig into Golang’s internals:

  1. The Golang compiler was originally inherited from Plan9 and was written in C, in order to solve the bootstrapping problem (how to compile a new language?), and also to “easily” implement segmented stacks — the original way of dealing with goroutines stack. The process of translating the original C compiler to Golang for release 1.5 has been described in details here and here.
  2. There are alternate compilers, e.g. gccgo and a gollvm
  3. Golang also allows to compile ‘modules’, which can be loaded dynamically. Nevertheless, for malware writers statically-linked executables remain the usual choice.
  4. Readers interested in the internals of JEB disassembler engine should refer to our recent REcon presentation

The (Long) Journey To A Multi-Architecture Disassembler

Last week we presented a talk at REcon on the internals of JEB’s native disassembler.

During this talk, we focused on some of the research problems we encountered while developing our custom disassembler engine — the foundation for JEB native decompiler.

The talk’s video can be found here, and interested readers can find an extended version of the slide deck here.

If you have questions, comments or suggestions, feel free to:

Android metaresources.arsc

One of our users recently reported an Android resources.arsc file seemingly unprocessed by JEB. Upon closer inspection, it turned out this file was not a regular binary resources file, but instead, a compressed resources container serving as a generator for localized resources.arsc. Older versions of Google Play (eg, com.android.vending 11.6.18) and other official Google applications have been using this type of file, which is stored as a raw asset and sometimes named metaresources.arsc.

I decided to have a quick look. However, for better or worse, what was planned as a superficial exploration turned into a deep-dive into the rabbit-hole that was the “meta-arsc” parsing code.

Those files, as said above, are used to generate localized (non-English) resources.arsc files. That means that the client application can generate lightweight resources files on the fly. And presumably, APKs as well. Since this mechanism seems to be primarily used by the Play Store app, a reasonable use case could be Dynamic Delivery.

  • Full support was added into JEB (3.5-Beta)
  • A brief description of the file format can be found below
  • The fully annotated JDB2 is here (as well as the source apk) if you’d like to write your own implementation of a parser and localized arsc generator. The parser and generator have been thoroughly deobfuscated and commented out where need be. Package: com.google.d.a.a.a.a. Client code: FilteredResourceHelper.
    Drop both files (jdb2, apk) in a folder and open the JDB2 file in JEB

What does it look like when metaresources.arsc is processed in JEB?

JEB arsc_meta plugin, here seen processing a metaresources.arsc file. All localized resources.arsc files are generated and attached as children of the original meta file.
A french localized resources generated from a metaresources container. JEB processes those files as regular, stand-alone arsc files, and provides textual output similar to the one generated by the aapt2-dump tool from the Android SDK.

Binary Format

Disclaimer: Specification is a work-in-progress. Refer to the JDB2 annotation and code to fill in the gaps.

metaresources.arsc=
BE_UINT32                     cnt           count of languages
BE_UINT16[cnt]                langs         2-char language codes
MetaEntry[cnt]                metaentries   meta entries matching the language codes
CompressedResourceTableChunk  restab        a compressed resource table (code: 0x1002)
EOF                           -             not necessarily the EOF, but all metares
files examined contained a single resource chunk, which is a compressed resource tab MetaEntry= BE_UINT32 magic the value 'META' BE_UINT32 entrysize complete entry size (including the above
magic) in bytes BE_UINT16 lang language code VAR_INT32 cnt1 . VAR_INT32[cnt1] offsets1 a custom serialization of java.util.BitSet
(refer to JDB2 for details) holding
positions for strings and string styles
stored in string pool chunks VAR_INT32 cnt2 . VAR_INT32[cnt2] offsets2 offsets to Table Package chunk entries
(types, typespecs) => Compressed entries: - 5 types exist, basically non-XML chunk types - Their type code is the same as arsc's with the 0x1000 bit set - List of chunks: StringPool= refer to JDB2, class CompressedStringPoolChunk ResourceTable= refer to JDB2, class CompressedResourceTableChunk Package= refer to JDB2, class CompressedPackageChunk Type= refer to JDB2, class CompressedTypeChunk TypeSpec= refer to JDB2, class CompressedTypeSpecChunk

New version of Androsig

This post is a follow-up on a previous article: we have updated the Androsig plugin and the pre-generated set of library signatures.

Reminder: Androsig is a JEB plugin used to sign and match library code for Android applications.

The purpose of the plugin is to help deobfuscate lightly-obfuscated applications that perform name mangling and hierarchy flattening (such as Proguard and other common Java and Dalvik protectors). Using our collection of signatures for common libraries, library code can be recognized; methods and classes can be renamed; package hierarchies can be rebuilt

Examples

Below, an example of what that looks like on a test app:

Matched libraries on a sample app bundling the Android Support package

Another example: running Androsig on a large app (Vidmate 4.0809), see the reconstructed glide/… sub-packages below:

Matched libraries on a PlayStore app

Installation

1) Download the latest version of the compiled binary plugin and drop it into the JEB coreplugins/ folder. If you are running JEB 3.4+, the plugin should come bundled with your .

Link: JebAndroidSigPlugin-1.1.x.jar

This single JAR offers two plugin entry-points, as can be seen in the picture below:

2) Then download and extract the latest signatures package to your [JEB]/coreplugins/android_sigs/ folder.

Link: androsig_1.1_db_20190515.zip

The user interface was unchanged so you can refer to previous article for matching, generating, results and parameters.

Native Signatures Generation

JEB 3.3 ships with our internal tool SiglibGen to generate signatures for native routines. Until now, users could sign individual routines only from JEB user interface (menu Native> Create Signature for Procedure), or with the auto-signing mode.

With the release of SiglibGen, users can now create signatures for whole files in batch mode, notably executables (PE, ELF) libraries (Microsoft COFF and AR files) and JDB2 (JEB project files)1.

In this post, we will explain how SiglibGen allows power-users to generate custom signature libraries, in order to quickly identify similar code between different executables.

Signature Libraries (siglibs)

Signature libraries are stored in <JEB install folder>/siglibs folder. Each signature contains a set of features identifying a routine (detailed below), and a set of attributes representing the knowledge about the routine (name, internal labels, comments…).

JEB currently ships with signature libraries for x86/x64 Microsoft Visual Studio libraries (from Visual Studio 2008 to 2017), and for ARM/ARM64 Android NDKs (from NDKr10 to NDKr19). These signatures will be automatically loaded when a suitable file is opened (see File>Engines>Signature Libraries for the complete list of available signature libraries).

These compiler signatures are intended to be “false positive free”, i.e. they should only identify the exact same routine (though it can be mapped at a different location). Therefore, the signatures can be blindly trusted by users, and by JEB automatic analysis2.

But users might want to generate their own signature libraries, for example in the following scenarios:

  • User analyzed an unknown executable. The resulting JDB2 file can then be signed, such that all routines can be identified in others executables and related information (name, comments, labels) be imported.
  • User found out that an executable is statically linked with a public library. The library can then be compiled with symbols and signed such that the library routines will be renamed in the analyzed executable3.

Use Case: Operation ShadowHammer

To illustrate the signatures generation process, we are going to use the recent attack dubbed “Operation ShadowHammer” as an example. This operation was originally documented by Kaspersky. Roughly summarized, malicious code was inserted into a legitimate ASUS’s automatic update tool named “ASUS Live Update Utility” 4 .

In this use case, we are going to put ourselves in the shoes of an analyst willing to understand the trojanized ASUS installers. We do not intend to analyze them in-depth – it has been done several times already -, but rather show how SiglibGen can accelerate the analysis.

At first, we got our hands on three samples, originally mentioned in CounterCept’s analysis with their date of use:

SHA-256Date Of Use
6aedfef62e7a8ab7b8ab3ff57708a55afa1a2a6765f86d581bc99c738a68fc74June
736bda643291c6d2785ebd0c7be1c31568e7fa2cfcabff3bd76e67039b71d0a8September
9a72f971944fcb7a143017bc5c6c2db913bbb59f923110198ebd5a78809ea5fcOctober

Oldest Sample

Quick Analysis

An analyst would likely start looking at the oldest sample (6aedfef6…), in order to investigate possible evolution of the attack. In this sample, the installer’s main() routine was modified to load a malicious PE executable from its resources:

JEB Project View. The embedded executable can be seen in resources5.

Here is the memory map after opening the malicious executable in JEB:

Embedded PE navigation view. Blue is code, cyan is code identified by siglib, green is data.

The large chunks of cyan correspond to routines identified as being part of “Microsoft Visual C++ 2010 /MT” libraries. Then, we analyzed the remaining seven routines (the blue chunk in the navigation view), and renamed them as follow:

Malicious Routines (our names)

These routines implement the following logic: check if one of the machine’s MAC address match a hard coded list, and if it’s the case download a payload (otherwise a .idx log file is dropped).

Now in order to re-use this knowledge on more recent trojanized ASUS installers, let’s generate signatures for this first sample.

Generating Signatures

In order to sign the analyzed file, we are going to create a configuration file from the sample file provided in <JEB install folder>/siglibs/custom:

;------------------------------------------------------------------------------
; *** SAMPLE *** JEB Signature Library configuration file
;------------------------------------------------------------------------------

;template file used to configure the generation of a *.siglib file for JEB

;how to generate the siglib specified by this file?
;open a terminal and execute: (eg, on Windows)
;  $ ..\..\jeb_wincon.bat -c --siglibgen=sample-siglib.cfg

;(mandatory) name of the folder containing files to sign
; must be in the same folder as this configuration file 
input_folder_name=

;(mandatory) processor type 
; see com.pnfsoftware.jeb.core.units.codeobject.ProcessorType 
; eg: X86, X86_64, ARM, ARM64, MIPS, MIPS64
processor=

;(mandatory) output siglib file name
; '.siglib' extension will be appended to it
; IMPORTANT! once generated, this file must be moved to the <JEB>/siglibs/ folder
; (user generated siglibs have to be manually loaded)
output_file_name=mysiglib

;(mandatory) unique identifier for your siglib
; keep it < 0 and decrement for each package you generate
uuid=-1

;(mandatory) *absolute* path to JEB typelibs folder, usually <JEB>/typelibs
typelibs_folder=

;(mandatory) name of your package
; e.g. 'Microsoft Visual C++ 2008 signatures' (without '')
package_name=

;(mandatory) package version
package_version=0

;(optional) description of your package
package_description=

;(optional) package author
package_author=

;(mandatory) list of features included in each signature
; i.e. the characteristics of the signed routines serving to identify them
; see com.pnfsoftware.jeb.core.units.code.asm.sig.NativeFeatureSignerID
; note: defaults should be suitable for most cases. ROUTINE_SIZE must always be included. 
features=ROUTINE_SIZE,ROUTINE_CODE_HASH,CALLED_ROUTINE_NAME_ONLY_EXTERN 

;(mandatory) list of attributes included in each signature
; i.e. additional knowledge on the signed routines conveyed by signatures 
; (other than routine name)
; see com.pnfsoftware.jeb.core.units.code.asm.sig.NativeAttributeSignerID
attributes=COMMENT,LABEL

A particularly interesting part of this configuration is the features field, where users can select the characteristics of the routine they want to put in signatures. The complete feature list can be found here; here are the features we included in our case (the default ones):

Feature NameDescription
ROUTINE_SIZESize of the routine (number of instructions).
ROUTINE_CODE_HASHCustom hash computed from the routine assembly code.
CALLED_ROUTINE_NAME_ONLY_EXTERNNames of the external routines called by the signed routine.

Note that by including ROUTINE_CODE_HASH, our signatures will only match routines with the exact same code (but possibly mapped at a different location). The use of
CALLED_ROUTINE_NAME_ONLY_EXTERN allows to distinguish different wrapper routines calling different API routines, but having the same code.

Here is the specific configuration file shadowhammer-oldest.cfg we made for this first sample:

input_folder_name=input
processor=X86
output_file_name=shadowhammer-6aedfef6
uuid=-1
typelibs_folder=[...REDACTED...]\typelibs
package_name=ShadowHammer -- sample 6aedfef6 (oldest)
package_version=0
package_description=Signatures generated from the analysis of the oldest sample known
package_author=Joan Calvet
features=ROUTINE_SIZE,ROUTINE_CODE_HASH,CALLED_ROUTINE_NAME_ONLY_EXTERN 
attributes=COMMENT,LABEL

Then we put the JDB2 file of the analyzed sample into the input folder (see configuration’s input_folder_name field). SiglibGen can then be called by executing JEB startup script (e.g. jeb_wincon.bat) with the following flags:

$jeb -c --siglibgen=shadowhammer-oldest.cfg

The generated signature libraries will then be written in the output folder. In our case, SiglibGen signed our seven routines, as indicated in siggen_stat.log file 6:

> Package created on 2019.05.01.15.29.23
> metadata: X86/ShadowHammer -- sample 6aedfef6 (oldest)/0/Signatures generated from the analysis of the oldest sample known/Joan Calvet/1556738959
> # sigs created: 7
> # very small routines: 0
> # small routines: 0
> # medium routines: 6
> # large routines: 1
> # unnamed routines: 1
> # blacklisted routines: 0
> # duplicated routines: 0

We can now copy shadowhammer-6aedfef6.siglib to <JEB>/siglibs/ folder. It will now be available under File>Engines>Signature Libraries to be manually loaded.

Second Sample Analysis

Now, it is time to turn to the second sample (736bda6432…). The workflow is quite different from the previous one: a routine call has been inserted into Visual Studio library method __crtExitProcess, which is called whenever the program exists:

Trojanized __crtExitProcess. Call to __crtCorExitProcess was replaced by a call to malicious code.

The astute reader might wonder why the routine is still named __crtExitProcess(), as if it was the original one, if one of its call has been rewritten to point elsewhere. In this case, the routine’s name comes from the fact that several caller routines were identified as library code (and are known to call __crtExitProcess()), as indicated by the routine header comment “Routine’s name comes from a caller […]”.

Following the dubious call, we end up decrypting the malicious payload, which is then executed. We can load the malicious dump in JEB with the x86 processor and the correct base address. After manually defining the code area, we obtain the following navigation view:

Memory dump’s initial navigation view. Blue is code.

For now, no compiler signature libraries were loaded because it is a memory dump without a proper PE header. As we know the previous malicious sample was compiled with Visual Studio 2010 /MT libraries, we can manually load the corresponding signatures (File>Engines>Signature Libraries). Here is the navigation bar at this time:

Memory dump’s navigation view with Visual Studio 2010 /MT signature libraries loaded. Blue is code, cyan is library code.

Most of the code has been identified. Now, we can load the custom signatures we generated from the previous sample, and we end up with two more routines being identified (i.e. miscreants directly re-used them from the first sample):

We can now look at the non-identified routines, without having to reanalyze the duplicates.

Finally, after having analyzed the remaining routines, we can generate a new signature library, following the same steps previously described. This time we put two samples in the input folder (the trojanized installer’s JDB2, and the memory dump’s JDB2). Eight routines are then signed.

Third Sample Analysis

The most recent sample (9a72f971944f…) follows the same logic as the previous one, namely it dynamically decrypts the malicious code, which is then executed. As previously, we load the memory dump in JEB with Visual Studio 2010 /MT signatures:

Memory dump’s navigation view with Visual Studio 2010 /MT signature libraries loaded.

Finally, we load the ShadowHammer signature libraries generated from the previous two samples:

Memory dump’s navigation view with
Visual Studio 2010 /MT and ShadowHammer signature libraries loaded.

At this point, only one malicious routine has not been identified (the large blue area in the navigation view). We can now focus on it, knowing that the rest of the code is the same.

If we open the two binaries side-by-side, we can rapidly pinpoint that the unidentified routine has indeed been modified between the two samples. For example:

It appears the hardcoded list of searched MAC addresses (represented by their MD5 hashes) has been modified between the two samples.

Conclusion

We hope this blog post demonstrated how SiglibGen allows users to speed up their analysis by easily re-using their work. Remember that signatures can be generated in a lighter manner directly from JEB UI (as shown in the auto-signing mode video). As usual, do not hesitate to contact us if you have any questions (emailTwitterSlack).

Note: SiglibGen might set .parsers.*.AnalysisStyle and .parsers.*.AllowAdvancedAnalysis engines option to specific values suitable for signatures generation, without restoring the original values after the generation. For now, JEB power-users have to manually restore these two engines options to the intended values after having generated signatures (menu Edit>Options>Engines). This will be fixed in next release JEB 3.4.

Annex: SiglibGen Log Files

A typical SiglibGen run will produce several log files (in the same folder):

File NamePurpose
siggen_stat.logSummary log (number of signatures created, etc). A new entry is appended to the log file at each signature generation.
siggen_report.htmlComplete HTML log file; each signed routine is shown with the corresponding features and attributes.
conflicts.txtConflict resolution file; users can tweak here the decisions taken when several routines have the same features (and then regenerate the signatures).
removals.txtRemovals resolution file; users can tweak here the automatic decisions regarding removing certain signatures (and then regenerate the signatures) .

  1. More formats could be handled, do not hesitate to contact us if you have such needs.
  2. While the signatures shown in this blog post will also be generated in a false positive free manner, SiglibGen allows to build more flexible signatures; this will the topic of another blog post.
  3. If signatures were built to be strict (i.e. not allowing any modifications to the original routine), this can be far from trivial, as the library needs to be compiled with the exact same options as the analyzed executable.
  4. There are numerous excellent analysis available for Operation ShadowHammer, like the one from CounterCept.
  5. Note that thanks to JEB recursive processing, the embedded executable does not need to be extracted, and can be directly analyzed within the original JEB’s project
  6. See Annex for a description of all log files produced by SiglibGen.

Debugging Android apps on Android Pie and above

Update (May 2024): Debugging oddities with API levels 34+, native code debugging issues with API level 34. Refer to the end of this post.

Update (March 2020): The original post describing the impossibility to read locals that do not have associated DebugLocalInfo on Android 9 (P) and 10 (Q) was fixed in Android 11 (R).

[Original Post] Issues related to reading local vars

Lower-level components of the Dalvik debugging stack, namely JDWP, JVM TI, and JVM DI implementations, were upgraded in Android Pie. It is something we indirectly noticed after installing P.beta-1 in the Spring of 2018. For lack of time, and because our recommendation is to debug apps (non-debuggable and debuggable alike) using API levels 21 (Lollipop) to 27 (Oreo), reversers could easily avoid road blocks which manifested in JEB as the following:

  • An empty local variable panel (with the exception of this for non-static methods)
  • Type 35 JDWP errors reported in the console, indicating that an invalid slot was being accessed
Missing locals in a debugging session

A type 35 error in this context means an invalid local slot is being accessed. In the example shown above, it would mean accessing a slot outside of [0, 10] (per the .registers directive) since the method declares a frame of 11 registers.

The second type of noticeable errors (not visible in the screenshot) were mix-ups between variable indices. Normally, and up to the JDWP implementation used in Android Pie, indices used to access slots were Java-style parameter indices (represented in Dalvik as pX), instead of Dalvik-style indices (vX). Converting from one to the other is trivial assuming the method staticity and prototype is known. It is a matter of generating pX so that they end up at the bottom of the frame. In the case above:

v0    p2
v1    p3
v2    p4
..
v9
v10   p0
v11   p1

When issuing a JDWP request (16,1) to read frame slots, we would normally use pX indices. It is no longer the case with Android P and Q: vX indices are to be used.

Open up JEB, start debugging any APK, switch to the Terminal view, and type ‘info’. In the context of JDWP, this JEB Debugger command issues Info and Sizes requests. Notice the differences:

=> On Android Oreo (API 27):

VM> info
Debuggee is running on ?
VM information: JDWP:"Android Runtime 2.1.0" v1.6 (VM:Dalvik v1.6.0)
VM identifier sizes: f=8,m=8,o=8,rt=8,fr=8

=> On Android Pie (API 28) and Android Q:

VM> info
Debuggee is running on ?
VM information: JDWP:"Java Debug Wire Protocol (Reference Implementation) version 1.8
JVM Debug Interface version 1.2
JVM version 0 (Dalvik, )" v1.8 (VM:Dalvik v0)
VM identifier sizes: f=4,m=4,o=8,rt=8,fr=8

Notice the reported version 1.2 for JVM DI, previously unspecified, and reported version 1.8 for JDWP, likely the cause of the breakage. Also note ID encoding size updates. JDWP had been reported a 1.6 version number, as well as field and method IDs encoded on 8 bytes, for as long as I can remember.

The vX/pX index issue was easily solved. It took a little while to crack the second issue. A superficial browsing of AOSP did not show anything fruitful, but after digging around, it seemed clear that this updated implementation of JDWP used CodeItem variables’ debug information to determine which variables are worth checking, and using what type.

In JEB, right-click and select Rendering Options, tick Show Debug Directives to display variable definition and re-definition information. In the example above, the APK holds information stating that v0 is being using as a boolean starting at address 2, and v1 a String starting at address 4. Android P+’s JDWP implementation does use this information to validate local variables accesses.

See below: at address 2, v0 has been declared and is rendered. v1 has not been declared yet, the debugger cannot read it (we’ll get error 35).

Single-step: at address 4, v1 is declared. Although it is uninitialized, the debugger can successfully read the var:

So – Up until P, this metadata information, when present (almost all Release-type builds of legitimate and malware files alike discard it), had been considered indicative. Now, the debugger takes it literally. There are multiple candidate reasons as of why, but an obvious one is Safety. JDWP has been known to have the potential to crash the VM when receiving reading requests for frame variables using a bad type. E.g., requesting to read an integer-holding slot as a reference would most likely crash the target VM. Using type information providing in metadata, a debugger server can now ensure that a debugger requesting to read a slot as type T is indeed a valid request – assuming the metadata is legitimate, and since the primary use case is to debug applications inside IDE, which hold source information used to generate valid debug metadata, the assumption is fair.

Validating access to local vars has the interesting side-effect to act as an anti-debugging feature. While debugging the app remains possible, not being able to easily read some locals (parameters pX are always readable though), can be quite an annoyance.

In the future, how could we work around the JDWP limitation? Well, aside from the obvious cop out “use Oreo or below”, an idea would be to extend JEB’s –makeapkdebug option (that generates a debuggable version of a non-debuggable APK) to insert DEX metadata information specifying that all variables of a frame are used and of a given type. That may not work depending on the type of validation performed by the DEX verifier, but it’s something worth exploring. Maybe more simply, an alternative could be a custom AOSP build that disabled that feature. Or better yet, finding if a system property exists to disable/enable that JDWP functionality.

A final note: debugging non-debuggable APKs on Android Pie or above also proved more difficult, if not practically impossible, than on Oreo and below. Assuming your phone is rooted, here’s a solution (found when browsing around AOSP commits). On a rooted phone:

> adb root
> adb shell setprop dalvik.vm.dex2oat-flags --debuggable
> adb shell stop
> adb shell start

[May 2024] JDWP DDMS packet on Android 14 and above

With API levels 34+ (Android 14 and above), JEB 5.12 and below may report JDWP errors regarding unhandled JDWP packets:

[E] Unknown JDWP command packet: [JDWP:id=19,fl=00h,cc=(-57,1),dl=14016]

The command set -57 is a custom JDWP command set for Android, used for DDMS (Dalvik Debug Monitor System). JEB does not support those requests. Starting with JEB 5.13, those target-issued requests will be made more explicit (specifically saying they are DDM messages), but also less verbose (below INFO level) to avoid cluttering the logger.

[May 2024] Problems debugging native code on Android 14

JEB has difficulties debugging native code with Android 14 / API level 34. The native threads may suspend over an invalid memory access (SIGSEGV, signal 11) received when simply interacting with the app:

[I] [GDB] [SIGNAL] type=GENERIC signal=11 tid=7185 @ 58C00ECCh

The issue is only present with API 34. It is not present with API levels <=33 and seems to have disappeared with the current preview of API 35 (upcoming Vanilla Ice Cream release).

We do not have a workaround in place at this time, and can only recommend avoiding debugging of native code with Android 14 / API level 34. We will update this post if/when a workaround is found.

Android Updates in JEB 3.2

As the latest update makes its way to all users (changelog), it is a good time to quickly recap additions related to Android analysis that made it into JEB versions 3.1.4, 3.1.5, and 3.2.

Dalvik Decompiler Updates

The newest releases of JEB contain several improvements to the Dalvik decompiler. I will highlight only a couple that users may find interesting. 1

Enumerations

Compiled Java enumerations can be complicated beasts. JEB attempts to re-sugar them to the best of its ability. On failure, regular classes extending java.lang.Enum will be rendered.

Obfuscation sometimes destroy important synthetic fields and structures that allow recovery heuristics to work. However, support should function reasonably well, even on enumeration data that was intentionally shuffled to generate decompilation errors. Moreover, and to keep with the spirit of interactivity in JEB, enumerated fields can be renamed – and it is done consistently over the code base, including over reconstructed switches making use of such enums.

Decompiled enums in android.arch.lifecycle. Renaming and cross-referencing enumerated constants is supported.

Custom enumerated constants are also properly reconstructed, including:

  • Field annotations
  • Custom initializers (see below)
  • Additional methods and method overrides
In this complex enumeration, the red block shows a custom initializer. Other interesting bits are the use of overrides and custom methods, annotations, as well as default and non-default constructors.

Switches

Support was recently added for switch-on-enum and switch-on-string (partial support for the latter, to be continued in the next software update).

This successfully reconstructed switch-on-string is implemented as a double-switch idiom by dx (a sparse switch on hashCode/equals to generate custom indices i, followed by a packed-switch on i). Not all switches are implemented like this. Regular if-conditional trees may be strategically generated by optimizing compilers.

Inner classes, Anonymous classes

We improved rendering support for named- and anonymous-inner classes. Properly rendering anonymous classes in particular is made difficult by the fact that some of its arguments are captured from the outer classes. Properly rendering anonymous constructors, with exact argument types and position, is also challenging.

Lately, a user sent us a sample making use of an anonymous class initializer to hide string decryption code. See below:

  • The anonymous class extends Android’s OnActivityResultListener, instantiates the object, and tosses it immediately.
  • Decryption code takes place in the initializer. Note the captured arguments from the outer container method __m: i, _b. Access to other private class fields is made via synthetic accessor calls that were re-sugared into seemingly direct field access (BA._b).
Pseudo-moot anonymous class with an instance initializer attempting to conceal string decryption code.

Plugin options

Remember that some decompiler properties are publicly available in the options: (menu: Edit, Options, Advanced, Engines)

  • All Dalvik decompilation options: see the .parsers.dcmp_dex.* namespace
  • All Java rendering options of decompiled code: see the .parsers.dcmp_dex.text.* namespace

1)Rendering options are real-time options that can be changed after the fact to customize the output. Right-click on a decompiled class output, and select Rendering Options:

2) Decompilation options are used to guide and customize the decompilation. They can be changed in the Engines options, or more simply, when performing a decompilation itself, by invoking “Decompile with Options…” instead of “Decompile”.

Keyword for “Decompile with Options”:
CTRL+TAB (Windows, Linux) or COMMAND+TAB (macOS)

Bring up the “Decompile with Options” dialog by using CTRL+TAB/COMMAND+TAB when decompiling. Hover over properties to get extra documentation in the tooltip.

API additions

Essential updates to:

  • IJavaSwitch: additional methods to access switch-on-enum and switch-on-string data
  • IJavaForEach: additional type introduced to manipulate for-each statements: for(Type var: iterator_or_array) { … }

Other changes, What next

JEB 3.2 contains other improvements, such as:

  • Better auto-naming, including default usage of debug data, if present (can be disabled in the options)
  • Improved typing and type propagation
  • Additional IR and AST optimizations
  • Better exceptional flow processing
  • Rendering of try-catch, synchronized blocks, etc.
  • Decompilation of invoke-polymorphic (invoke-custom is not supported, see below the part on lambdas on method handles)

We have more planned for the coming releases, including:

  • Improved support for switch-on-string. As said earlier, some of those switches, when properly detected, are re-sugared into legal Java-8 switch-on-string. However, the nature of those high-level constructs (they are implemented as double-conditionals, sometimes double-switches) makes it quite hard in some cases to provide proper reconstruction. It is something that will be improved in the future.
  • Support for generics. We had decided to not implement Java 5-style type generic since the information, when provided, is stored as pure metadata and should not be trusted. However, in practice, it turns out to be helpful when auditing legitimate, non-obfuscated compiled apps. We will add optional support for that in a coming release.
  • Support for try-with-resources. try(resource)/catch/finally are difficult very-high-level idioms to reconstruct. Optimizing compilers generate a substantial amount of additional, highly optimized code to implicitly catch exceptions and auto-close resources, making it extra difficult to reconstruct in the general case. We will likely introduce partial support before the summer.
  • Lambdas. It is a planned addition. We will soon be re-sugaring Android implementations of Java 8+ lambdas into proper lambda functions. Same goes for method handles (::). That’s quite exciting and may pave the way for a hypothetical Kotlin decompiler, since that language implicitly and explicitly rely on lambdas extensively.

Debuggable APK Generation

For several reasons, it is easier to debug Android applications explicitly marked debuggable in their Manifest.

  • Debugging non-debuggable APK requires root access to the operating system. Which means rooting a production phone, using an emulator
    2 image built as userdebug, or building a custom userdebug image from AOSP.
  • Any of the above solutions have shortcomings: rooted production builds and userdebug builds expose features that non-rooted production builds do not have, and can be fingerprinted as such; Debugging native code of applications on non-rooted devices requires replacing system-level utilities; the API level and OS features also play a role, eg, SE-Android needs to be disabled on recent OS in order for debugging to work.

In many cases, rebuilding a release app into a debug-mode app (with <application android:debuggable=”true” …>) is a viable solution, and one that does not require using root, obviously. Many users are implementing this solution via apktool. However it is frequent for the tool to fail decoding complex APKs, let alone rebuild them with different settings.

We have introduced a feature in JEB that makes rebuilding non-debuggable APK to debuggable APK easy and fast:

$ jeb_wincon.bat -c --makeapkdebug -- file.apk

Upon success, file_debuggable.apk will be generated. Sign it (Android SDK’s apksigner), install it on your device, and start debugging. Remember that this solution has its shortcomings as well! Anti-debugging code may check at runtime that the app is not debuggable, as would be expected. More elaborate solutions implement certificate pinning-style checks, where the code verifies that it is signed using a specific certificate. Be careful when debugging rebuilt APK.

This malware app was made debuggable

Keyboard Shortcuts for Script

Bind your JEB Python scripts to keyboard shortcut by adding a line at the top of your script:

#?shortcut=xxx

where xxx is your keyboard shortcut, eg: Ctrl+Shift+T

Permitted keyboard modifiers are Ctrl, Shift, Alt, as well as the generic Mod1, mapping to macOS’s Command (Apple) key, or Control on Windows/Linux.

Sublime Text 3 Extension

Are you writing Python scripts to automate your JEB reversing tasks? If so, give a try to using the “JEB Script Development Helper” package available on Sublime Text’s Package Control.

JEB Python scripts with Sublime Text

To install it:

  • Install ST3
  • Install Package Control
  • Open the Package Control and Install a new extension
  • Search for “JEB” and install the extension

The extension allows you to:

  • Auto-completion on JEB types and attributes
  • Auto-import JEB classes: CTRL+ALT+I on a class names
  • Easily create script skeleton (CTRL+SHIFT+P, “JEB: Create a new script”)
  • Easily update to the latest API doc, usually published right after a new release (CTRL+SHIFT+P, “JEB: Update to latest API doc file”)

API changes

Recent API changes are not specific to Android components of JEB. You will find updated sample code on GitHub.

  1. If you are seeing unintended changes or bugs related to this update, let us know so that we can fix things quickly.
  2. Emulator here means an emulator running a userdebug Android build, as Google-provided images are