Deobfuscation ratings, inlining “fat” functions, and breaking opaque predicates

In this post, we are having a quick look at a relatively novel protection techniques found in the wild. The class we are looking at is com.X (SHA256: a519e4a20586807665d82ea28892e2ede184807868552f23210bf10c05727980).

Have a look at the decompiled code, with standard JEB options. It was auto-deobfuscated and thoroughly cleaned by dexdec, JEB’s Dalvik decompiler:

Decompilation of com.X with standard options (it’s been deobfuscated, and JEB is letting you know about it by providing deobfuscation ratings or scores as method comments)

A note on deobfuscation ratings

Two items to notice:

  • Some methods outputs are collapsed: their direct output was deemed useless because their code were inlined in corresponding callers. You may re-expand them with the Dash (-) action key, or via the Action menu, Collapse/Expand command.
  • Some decompiled methods have an auto-comment specifying a deobfuscation rating and score. This score is calculated from the result of IR optimizers tagged as DEOBFUSCATOR. If the score reaches a threshold, the rating (LOW – not shown-, MEDIUM, HIGH, EXTRA) is specified in the decompilation output, to give a hint to the user that the low-level code is protected, and that the high-level decomp was deobfuscated and cleaned.

The deobfuscation ratings for several methods of com.X are high. It looks like this class received a significant amount of protection. However, after clean-up, the meaningful code consists of two one-liner methods: one storing a timestamp (method gg), the other one calculating an elapsed time (method gf).

Let’s have a look at the decompiled code with deobfuscators disabled: Redecompile the code with CMD1+TAB (Action menu, Decompile with Options…), and untick “Enable deobfuscator optimizers”.

dexdec options when redecompiling with Action, Decompile with options…

The re-decompilation result is as follows:

Decompilation of com.X with deobfuscators disabled

There is quite a lot to look at here, mainly, the fat routines and the opaque predicates.

Inlining “fat” functions

We see that gf calls new with a set of fixed integer (v, v1) as well as the identityHashCode of itself (v3, essentially a pseudo-random number). Similarly, gg also redirects to new, with a different set of arguments.

The methods gf() and gg() are wrappers calling the method new() with various keys

A quick examination of new shows that two code paths may be executed, based on the values of the provided triplet (v, v1, v2):

Decompilation of the synthetic “fat” function new, holding the real code of gf and gg

So, what happened? The protection of class com.X consisted of taking the bodies of code of gf and gg, merge them into a single method new (hence the name “fat”), and change the codes of gf and gg to trampoline into new with selectors to execute the proper code.

Here is an easier representation of that process, with a single selector (instead of a triplet):

// UNPROTECTED CLASS C
class C 
  int fld1;
  
  int f1(int x) {
    return 25 + x;
  }
  
  int f2() {
    return 31 * fld1;
  }
}

// PROTECTED CLASS C
class protected_C
  int fld1;

  int f1(int x) {
    return (int)fat_routine(new Object[]{this, x}, 1);
  }
  
  int f2() {
    return (int)fat_routine(new Object[]{this}, 2);
  }
  
  static Object fat_routine(Object[] params, int selector) {
    if(selector == 1) {
      return 25 + (int)params[1];
    }
    else if(selector == 2) {
      return 31 * ((C)params[0]).fld1;
    }
    throw new RuntimeException();  // should not happen
  }
}

Although the above code is trivial, we can use it to highlights two complications the decompiler will face when dealing with the more complex implementations made by the a real code protection system:

  • When to decide to inline, i.e. how to detect fat functions? (that question is outside the scope of this blog, and would not be of much interest to most readers)
  • What about complex selectors, such as a triplet with a pseudo-random int?

If JEB’s dexdec were to inline the calls to new as it is, we’d end up with the following decomps – not quite what we saw at the beginning of this article!

Decompilation the deobfuscators re-enabled, however the opaque predicate breaker was disabled

Resolving opaque predicates

Let’s look at method gf. We can see that the pseudo-random selector, after inlining, is used to calculate a predicate that will determine which path to take, i.e. do we execute the actual code for gf, or the code for gg?

The predicate seen in gf can be re-written as:

PRED = 0xDE9B00B0 + (~(0x99525D4B | X) | 1 & X) * 520 + (0x99525D4B & X | ~(X | 0x66ADA2B5)) * -1040 + (0x99525D4A | 0x66ADA2B5 & X | ~(X | 0x66ADA2B5)) * 520 != 1

Internally, JEB does quite a bit to simplify it, and ultimately, when all fast reductions and simplifications are applied, it will use the well-known Z3 SMT solver to break the predicate. In this case, regardless of the value of X, the predicate is true. Therefore, gf will be simplified to:

return X.iz(arr_object);

(Note that method iz is itself a candidate for inlining! At the end, the cleaned-up code shown in the introduction of this article will be generated.)

The use of Z3 and other external theorem provers that may be used by JEB and its plugins can be disabled in the option (see “Enable predicate breaker”):

The external predicate solver can be disabled in the options

Conclusion

We hope this quick note will shed some light on some newer features or recent upgrades that went into dexdec. Many of those were already present in gendec, the generic decompiler used for anything non-Dalvik, and it was about time to add those advanced clean-up passes into the Dalvik decompiler as well. In a sense, dexdec has caught up and even gone further than gendec on these aspects.

Which leads me to say there will likely be a Part 2 or at least an update for this blog, to highlight another complex deobfuscating task: the simplification of arithmetic operations consisting of bitwise operations and mixed boolean/arithmetic (MBA) expressions.

Stay tuned! Thank you to all our users and readers of this blog 🙂 Do not hesitate to reach out through the usual channels (Slack, email, X).

– Nicolas

Generic Unpacking for APK

Updated on March 19 2024: cover the additions of JEB 5.10 (auto-integration of dex, so files) and JEB 5.11 (unpacker report).

This post presents one of JEB components used for Android app reverse engineering: the Generic Unpacker for APK. 1

The unpacker will attempt to emulate the app’s execution in order to collect dex files and native libraries (so files, arm64 only) that would be dynamically generated at runtime. Many APK protectors, legitimate or otherwise – used for malicious purposes -, employ such techniques to make the payload Dalvik bytecode more difficult to access and analyze.

How to use the APK unpacker

First, open the target APK in JEB. In some cases, the unpacker module will let you know that there is a high-probability that the APK was packed:

In many cases, that heuristic won’t be triggered and no specific hint issued. Either way, you may start the unpacker via the Android menu, Generic Unpacking…

Start the Generic Unpacker via the Android menu

An options dialog will be displayed. The available options are:

  • Maximum duration after which the unpacking process should be aborted (the default is set to 3 minutes, although in most cases, unpacking will stop well before that time-out).
  • Whether or not collected dex should be used during the unpacking process itself (if so, they would be integrated in the current dex unit, to allow their emulation).
  • Whether or not collected so files should be used during the unpacking process itself.
  • If monitoring hooks should be set up to allow the generation of a report after the unpacking process completes (the report contains a trace of useful events, that could be used to quickly determine how the unpacking process works).
Options dialog for the unpacker

Press “Start” and let the unpacker attempt to recover hidden dex files and so libraries.

After it’s done, a frame dialog will list the unpacker results, consisting of dexdec MESSAGE notifications indicating which dex files were recovered, and where. The logger will display similar information. If the option was selected, the unpacker will also generate and display a report.

For each recovered dex and native library, a corresponding unit will be created under a sub-folder named “unpacked” (highlighted in green, located under the APK unit).

The unpacker has completed and is displaying its results (one dex file was recovered)

Analyzing the collected files

At this point, you may decide to analyze the recovered dex and so files(s) separately. In this case, simply open up the dex/elf unit(s) under “unpacked”, and proceed as normal (another code hierarchy, disassembly view, etc. will be opened).

Dex files integration

You may want to integrate the recovered dex with the already existing bytecode. If you ticked the options “Auto-integrate unpacked dex code to main dex unit”, the integration is automatic (and in many cases, it will allow the unpacker to proceed even further). Else, to do it manually, follow these steps:

  • Right-click on the recovered dex unit, select Extract to… and save the dex to a location of your choice
  • Navigate to the primary dex unit (generally named “Bytecode”), to which you want to integrate that saved dex to, and open it with a double-click
  • Go to the Android menu, select Add/Merge additional Dex files… and select the file previously saved
  • The collected dex will be integrated with the existing bytecode unit, and the bytecode hierarchy will reflect that update

Native libs analysis

The recovered arm64 library files may be analyzed separately. If the option “Allow use of unpacked libraries” was ticked, the recovered so files will be used by the unpacker, during unpacking. As was mentioned for dex above, in many cases, it will allow the unpacker to proceed further than normal.

Unpacking report

If the corresponding option was enabled before unpacking, a report will be generated after unpacking. It contains a detailed event trace of what happened, as well as a useful list of the most important unpacking events, that reverse engineers may view as a high-level “signature” of the unpacking code itself. A few examples follow.

Note that the full reports were trimmed, only their first section (“interesting records”) is displayed. The first colon indicates the emulation counter when the event occured, prefixed with either ‘j’ (java) or ‘n’ (native). The second item is the record type. Record specific strings follow, such as the method signature, string-marshalled parameters, program counter, memory addresses, register values, etc.

Report sample 1

This packer does not employ native code. The malware was provided by one of our users. The records indicate that:

  • the custom app’s attachBaseContext() was called
  • an asset was retrieved
  • from it, a custom jar was written
  • that jar (containing a dex, accessible in “upacked”) was loaded into the app’s process via DexClassLoader
INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#191 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#3614186 JAVA_INVOKE: android.content.res.AssetManager.openNonAssetFd ? ["tracks/radio.ogg"]
- j#15485592 JAVA_NEW: java.io.FileOutputStream ["/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline/wyhatiq.jar"]
- j#18119837 JAVA_NEW: dalvik.system.DexClassLoader [":/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline/wyhatiq.jar", "/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline", "/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline", ?]
- j#21005588 JAVA_FIELD_GET: android.app.ContextImpl.mPackageInfo ? [?]
- j#21006978 JAVA_FIELD_GET: android.app.ContextImpl.mPackageInfo ?

Report sample 2

This packer does not employ native code. The malware was provided by one of our users.

INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#1 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#16 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.metaData ? [?]
- j#38 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.sourceDir ? ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/ad.kokolzxs-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#70 JAVA_INVOKE: java.io.File.delete /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app []
- j#73 JAVA_NEW: java.util.zip.ZipFile [/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/ad.kokolzxs-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk]
- j#128 JAVA_INVOKE: java.io.File.mkdirs /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/META-INF []
- j#130 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/META-INF/123.SF]
- j#446 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/META-INF/123.RSA]
- j#496 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/AndroidManifest.xml]
- j#595 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/androidsupportmultidexversion.txt]
- j#646 JAVA_INVOKE: java.io.File.mkdirs /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets []
- j#648 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets/39285EFA.dex]
- j#951 JAVA_INVOKE: java.io.File.mkdirs /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets/apps/H5BF09C00/www/css []
- j#953 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets/apps/H5BF09C00/www/css/mui.css]
...
- j#145678 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/resources.arsc]
- j#146652 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/secret-classes.dex]
- j#173441 JAVA_INVOKE: javax.crypto.Cipher.getInstance ["AES/ECB/PKCS5Padding"]
- j#173445 JAVA_INVOKE: javax.crypto.Cipher.getInstance ["AES/ECB/PKCS5Padding"]
- j#173452 JAVA_NEW: javax.crypto.spec.SecretKeySpec [(97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112), "AES"]
- j#173455 JAVA_INVOKE: javax.crypto.Cipher.init ? [1, ?]
- j#173458 JAVA_INVOKE: javax.crypto.Cipher.init ? [2, ?]
- j#173479 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/secret-classes.dex]
- j#173504 JAVA_FIELD_GET: dalvik.system.BaseDexClassLoader.pathList ? [?]
- j#173519 JAVA_FIELD_GET: dalvik.system.DexPathList.dexElements ? [(?)]
- j#173559 JAVA_INVOKE: dalvik.system.DexPathList.makePathElements [[/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/secret-classes.dex], /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0, []]
- j#173586 JAVA_FIELD_SET: dalvik.system.DexPathList.dexElements ? [(?, ?)]

Report sample 3

This packer does not employ native code. The malware was analyzed by @cryptax here.

INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#1 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#3444 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.sourceDir ? ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.pmmynubv.nommztx-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#3447 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.dataDir ? ["/data/user/0/com.pmmynubv.nommztx"]
- j#3457 JAVA_FIELD_GET: android.os.Build$VERSION.SDK_INT [33]
- j#6276 JAVA_INVOKE: java.lang.System.getProperty ["java.vm.version"]
- j#6389 JAVA_INVOKE: java.io.File.mkdir /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g []
- j#6396 JAVA_INVOKE: java.io.File.mkdir /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo []
- j#9473 JAVA_NEW: java.util.zip.ZipFile [/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.pmmynubv.nommztx-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk]
- j#10254 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/tmp-base.apk.hFGg8tq17304470999884300019.weg]
- j#10259969 JAVA_INVOKE: java.io.File.renameTo /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/tmp-base.apk.hFGg8tq17304470999884300019.weg [/data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/base.apk.hFGg8tq1.weg]
- j#10259974 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/tmp-base.apk.hFGg8tq17304470999884300019.weg []
- j#10262055 JAVA_FIELD_GET: dalvik.system.BaseDexClassLoader.pathList ? [?]
- j#10262352 JAVA_FIELD_GET: android.os.Build$VERSION.SDK_INT [33]
- j#10262737 JAVA_INVOKE: dalvik.system.DexPathList.makePathElements [[/data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/base.apk.hFGg8tq1.weg], /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo, []]
- j#10262752 JAVA_FIELD_GET: dalvik.system.DexPathList.dexElements ? [(?)]
- j#10262770 JAVA_FIELD_SET: dalvik.system.DexPathList.dexElements ? [(?, ?)]
- j#10262792 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/base.apk.hFGg8tq1.weg []
- j#10262802 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/T9etIiaI.uw87 []
- j#10262808 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo []
- j#10263604 JAVA_INVOKE: android.app.ActivityThread.currentActivityThread []
- j#10264168 JAVA_FIELD_GET: android.app.ActivityThread.mBoundApplication ? [?]
- j#10264725 JAVA_FIELD_GET: android.app.ActivityThread$AppBindData.info ? [?]
- j#10265796 JAVA_FIELD_GET: android.app.ActivityThread.mInitialApplication ? [?]
- j#10266370 JAVA_FIELD_GET: android.app.ActivityThread.mAllApplications ? [[?]]
- j#10266905 JAVA_FIELD_GET: android.app.LoadedApk.mApplicationInfo ? [?]
- j#10267542 JAVA_FIELD_GET: android.app.ActivityThread$AppBindData.appInfo ? [?]
- j#10267551 JAVA_FIELD_SET: android.content.pm.ApplicationInfo.className ? ["com.pmmynubv.nommztx.App"]
- j#10267554 JAVA_FIELD_SET: android.content.pm.ApplicationInfo.className ? ["com.pmmynubv.nommztx.App"]
- j#10268095 JAVA_INVOKE: android.app.LoadedApk.makeApplication ? [false, null]
- j#10268749 JAVA_FIELD_SET: android.app.ActivityThread.mInitialApplication ? [?]
- j#10269322 JAVA_FIELD_GET: android.app.ActivityThread.mProviderMap ? [?]

Report sample 4

This packer employs a mix of dex and native code. The malware APK was provided by one of our users.

INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#2 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#25 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.sourceDir ? ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.ddbewkjewujiijejk2ijfe.security-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#28 JAVA_NEW: java.util.zip.ZipFile ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.ddbewkjewujiijejk2ijfe.security-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#97 JAVA_INVOKE: java.io.File.mkdir /data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir []
- j#869 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l]
- j#969 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/o0oooOO0ooOo.dat]
- j#1044 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/tosversion]
- j#1150 JAVA_INVOKE: java.lang.System.loadLibrary ["shell-super.2019"]
- n#94557 REGISTERED_NATIVE: PC=0x7B000D80: msig=Lcom/wrapper/proxyapplication/WrapperProxyApplication;->Ooo0ooO0oO()V @0x100005250
- n#94572 REGISTERED_NATIVE: PC=0x7B000D80: msig=Lcom/wrapper/proxyapplication/CustomerClassLoader;->ShowLogs(Ljava/lang/String;I)I @0x10000318C
- j#1151 JAVA_FIELD_GET: android.app.ContextImpl.mPackageInfo ? [?]
- j#1151 JAVA_FIELD_GET: android.app.LoadedApk.mActivityThread ? [?]
- n#96934 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/o0oooOO0ooOo.dat flags=0x0
- n#97957 FILE_ACCESS: PC=0x74446BC008: path=/proc/self/maps flags=0x0
- n#125423 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l flags=0x2
- n#125432 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l flags=0x0
- n#126040 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l.lock flags=0x42
- n#129378 FILE_ACCESS: PC=0x74446BC008: path=/proc/self/maps flags=0x0
- n#152428 FILE_ACCESS: PC=0x744465FA2C: path= flags=0x0
- n#152476 FILE_ACCESS: PC=0x744466A178: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir flags=0x0
- n#152484 FILE_ACCESS: PC=0x744466A178: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir flags=0x0
- n#154816 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir flags=0x0
- n#156501 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/tosversion flags=0x0
- n#162810 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/tx_shell flags=0x0
- n#164410 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/.updateIV.dat flags=0x42
- n#165717 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex flags=0x0
- n#1863052 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex flags=0x42
- n#1863114 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex flags=0x0
- n#1865062 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/odexdir/ flags=0x0
- n#1867557 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/oat/ flags=0x0
- n#1867590 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/oat/arm64/ flags=0x0
- n#1867629 FILE_ACCESS: PC=0x74446BC008: path=/proc/self/maps flags=0x0
- n#1886913 MEMORY_READ: PC=0x100035640: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#1886915 MEMORY_READ: PC=0x100035648: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#1890133 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/oat/arm64/00O000ll111l_0.odex flags=0x0
- j#1184 JAVA_FIELD_GET: dalvik.system.BaseDexClassLoader.pathList ? [?]
- j#1305 JAVA_INVOKE: dalvik.system.DexPathList.makePathElements [[/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex], /data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/odexdir/.oat, []]
- j#1337 JAVA_FIELD_GET: dalvik.system.DexPathList$Element.dexFile ? [?]
- j#1387 JAVA_FIELD_GET: dalvik.system.DexPathList.dexElements ? [(?)]
- j#1402 JAVA_FIELD_GET: android.os.Build$VERSION.SDK_INT [33]
- j#1411 JAVA_FIELD_SET: dalvik.system.DexPathList.dexElements ? [(?, ?)]
- n#1892649 MEMORY_READ: PC=0x100035640: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#1892651 MEMORY_READ: PC=0x100035648: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#4903698 FILE_ACCESS: PC=0x74446BC008: path=/proc/12624/maps flags=0x0
- n#4903744 FILE_ACCESS: PC=0x744466AE7C: path=/proc/12624/maps flags=0x0
- n#4903776 FILE_ACCESS: PC=0x74446B0634: path=/proc/12624/maps flags=0x0
- n#4904889 MEMORY_READ: PC=0x10001A540: addr=0x0 size=0x8

API

An unpacker is represented by the IGenericUnpacker interface.

The unpacker API

To create an APK unpacker, you may use the IApkUnit.createGenericUnpacker() method. (To retrieve an APK unit from a JEB project, use the project’s findUnit method, or any other IUnit search related method — please refer to sample scripts for example).

Limitations

The unpacker will not be able to handle all cases. Please report any problem or bug you are encountering, we will see if anything can be done to support most cases.

In an upcoming update, the IGenericUnpacker API will offer a way for users to write plugins in the form of dex-emulator and native-emulated hooks to do whatever is needed to perform an unpacking task that the built-in code would fail at.

Until next time!

Nicolas

  1. The unpacker was introduced in JEB 5.9; it received significant upgrades in versions 5.10, 5.11.

How To Use JEB – Auto-decrypt strings in protected binary code

This is the second entry in our series showing how to use JEB and its well-known and lesser-known features to reverse engineer malware more efficiently. Part 1 is here.

Today, we’re having a look at an interesting portion of a x86-64 Windows malware that carries encrypted strings. Those strings happen to be decrypted on the fly, the first time they’re required by some calling routine.

SHA256: 056cba26f07ab6eebca61a7921163229a3469da32c81be93c7ee35ddec6260f1. The file is not packed, it was compiled for Intel x86 64-bit processors, using an unknown version of Visual Studio. The file is dropped by another malware and its purpose is reconnaissance and information gathering. Let’s load it in JEB 5.8 and do a standard analysis (default settings).

Initial decompilations

For the sake of showing what mechanism is at play, we’re first looking at sub_1400011F0. Let’s decompile it by pressing the TAB key (menu: Action, Decompile…).

Raw decompilation of sub_1400011F0, before examining its callees.

Then, let’s decompile the callee sub_140001120.

JEB can now thoroughly look at the routine and refines the initial prototype that was applied earlier, when the caller sub_1400011F0 was decompiled. It is now set to: void(LPSTR).

The code itself is a wrapper around CreateProcess; it executes the command line provided as argument.

sub_140001120 executes a command-line with CreateProcess. Note the refined prototype, void(LPSTR).

Press escape to navigate back to the caller, or alternatively, examine the callers by pressing X (menu: Action, Cross-references…) and select sub_1400011F0. You will notice that JEB is now warning us that the decompilation is “stale”.

The initial decompilation of sub_1400011F0 is stale after the decompilation of sub_140001120 yielded a better prototype.

Second decompilation

The reason is that the prototype of sub_140001120 was refined by the second decompilation (to void(LSPTR)), and the method can be re-decompiled to a more accurate version.

Let’s redecompile it: press F5 (menu: Window, Refresh). You can see that second decompilation below. What happened to the calls to sub_140001040?

Second decompilation of sub_1400011F0, showing some decrypted strings instead of calls to sub_140001040.

String auto-decryption

Notice the following:

  • A “deobfuscation score” note was added as a method comment (refer to part 1 of the series)
  • The calls to sub_140001040 are gone, they have been replaced by dark-pink strings

JEB also notified us in the console:

Notifications about decrypted strings replace in decompiled code.

Dark-pink strings represent synthetic strings not present in the binary itself. Here, they are the result of JEB auto-decrypting buffers by emulating the calls to routine sub_140001040, which was identified as a string provider. Indeed, the decompilation of sub_140001120 helped, since the inferred parameter LPSTR was back-propagated to the callers, which in that case, was the return value of sub_140001040.

Auto-decryption can be very handy. In the case of this malware, we can immediately see what will be executed by CreateProcess: shells executing whoami and dir and redirecting outputs to files in the local folder. However, if necessary, this feature can be disabled via the “Decryptor Options” in the decompiler properties:

  • Menu: Options, Back-end properties… to globally disable this in the future, except for your current project
  • Menu: Options, Specific Project properties… for the current project only
  • Or you may simply redecompile the method with CTRL+TAB (menu: Action, Decompile with options…) and disable string decryptor for specific code
The string auto-decryptor may be enabled or disabled in the options

The decryptor routine

What is sub_140001040 anyway? Let’s navigate to the routine in the disassembly and decompile it.

A raw decompilation of the decryptor code, sub_140001040

After examination of the code, we can adjust things slightly:

  • The global gvar_140022090 is an array of PCHAR (double-click on the item; rename it with N; change the type to a PCHAR using Y; create an array from that using the * key).
  • The prototype is really PCHAR(int), we can adjust that with Y.
  • The first byte of an entry into encrypted_strings is the number of encrypted bytes remaining in the string; if 0, it is fully decrypted and subsequent calls will not attempt to decrypt bytes again.
  • The key variable is v3 is the key; let’s rename it with N. Note that the key at (i) is the sum of the previous two keys used by indices (i-1), (i-2); the initial tuple is (0, 1). This looks like a Fibonacci sequence.1
The decryptor (sub_140001040) after analysis.

Comparison with GHIDRA

For comparison sake, here are GHIDRA 11 decompilations.

The caller (sub_1400011F0) decompiled by GHIDRA 11.0.
The decryptor (sub_140001040) decompiled by GHIDRA 11.0.
The CreateProcess wrapper (sub_140001120) decompiled by GHIDRA 11.0. Notice that the low-level structure initialization code adds quite a bit of confusion.

Conclusion

JEB decompilers2 do their best to clean-up and restore code, and that includes decrypting strings when it is deemed reasonable and safe.

That concludes our second entry in this “How to use JEB” series. In the next episodes, we will look at other features and how to write interesting IR and AST plugins to help us further deobfuscate and beautify decompiled code.

As always, thank you for your support, and happy new year 2024 to All 😊 – Nicolas

  1. Interestingly, the JEB assistant (call it with the BACKTICK key, or menu: Action, Request Assistant…) would like to rename this method to “fibonacci_sequence“! Not quite it, but that’s a relevant hint!)
  2. Note the plural: dexdec – the Dex decompiler – has had string auto-decryption via emulation for a while; its users are well-accustomed to seeing dark-pink strings in deobfuscated code!

How To Use JEB – Analyze an obfuscated win32 crypto clipper

We’re kicking off a malware analysis series explaining how to use JEB Decompiler to perform reverse engineering tasks ranging from out-of-the-box actions to complex use cases requiring scripts or custom plugins.

In this first entry, we look at a Windows malware compiled for x86 32-bit targets. The malware is an Ethereum cryptocurrency stealer. It monitors and intercepts clipboard activity to find and replace wallet addresses by an address of its own — presumably, one controlled by the malware authors to collect stolen ether.

Quick look at the malware

The file has a size of 81Kb, is compiled for x86 platforms. Although it does not appear to be packed, most metadata elements of the PE header were scraped. There is no rich data or timestamp.

SHA256: 503b2dc50262be583633db7b52dca9bcadc698413270047c209818436196c987

Quick look at the file in Hiew

If you are familiar with JEB, its terminology, and the organization of its UI elements, you may skip the next section and go directly to “Examining the code”.

Opening the file in JEB

Let’s fire up JEB. Any recent build (5.7+) with the x86 analysis modules and decompiler will do, i.e. JEB Community Edition or JEB Pro.

We open the file and keep the default settings
A view of the GUI after the initial analysis (from top-left, clockwise: project explorer, main workspace, and code hierarchy)

Project and units

The top-left view shows the project, along with a single artifact (the input file) and the analysis units created by JEB:

  • The artifact file has a blue-round icon
  • The top-level unit is a winpe unit
  • It has one child unit at the moment, named “x86 image”, of type x86.

The bottom-left view shows a list of code routines resulting from the analysis of the file.

Disassembly

By default, the main panel shows the disassembly window.

You may press the SPACE bar to switch to a graph view of the code (menu: Action, Graph…). In the graph view, only a single method is rendered at a time.

CFG (control flow graph) view of a disassembled routine

PE unit

If you wish to have a look at the PE file in more details, open the winpe unit. Double-click the corresponding node in the project hierarchy.

View of a winpe unit’s “Overview” fragment

The winpe unit view provides several information, organized in fragments that can be seen below the unit view: Description, Hex Dump, Overview (the default fragment), Sections, Directory Entries, Symbols, etc.

Note that if the PE had not been stripped, we would probably see a compilation timestamp as well as additional sub-units detailing the Rich Header data. For Windows executables, that data is important to perform fine-grained compiler identification.

The Symbols tab lists all symbols advertised by the PE, including imported and exported routines. For example, if you filter on “clip”, you can see multiple win32 routines relating to clipboard access, such as OpenClipboard or SetClipboardData:

The Symbols fragment of the winpe unit view, with a filter applied (“clip”)

Examining the code

Let’s go back to the disassembly offered by the x86 unit. First, notice that the code hierarchy view does not seem to contain well-known methods (static code), typically standard library routines linked at compile-time.

Let’s see why by looking at which siglibs (signature libraries) were applied during the initial analysis (menu: Native, Signature Libraries…). It looks like none were loaded:

The Signatures Libraries dialog

Library code identification

Normally, when JEB performs the initial auto-analysis of the code, compiler identification is used to determine whether well-known signature libraries of static code (siglibs) should be loaded and applied to the binary. In this case, compiler identification failed because all header data had been discarded. JEB decided to not load and apply signatures.

To apply them manually, tick the “MSVC x86” boxes. (An alternative is to let JEB know that the file was compiled with MSVC before the analysis starts: when opening the artifact, when the Options panel is displayed, the user may decide to force the compiler to a set-value.)

Forcing a compiler setting before the initial analysis

After doing either of the above ((a) file re-analysis with a compiler identification pre-set; or (b) manual siglibs application), several methods are identified as MSVC code:

Light-blue areas mean the code was matched against well-known signatures

Entry-point and WinMain

Navigate to the executable entry-point (menu: Native, Go to entry-point…).

In the general case, the entry-point of a Windows PE compiled with MSVC is not the high-level entry-point that will contain meaningful code. Although it is relatively easy to find WinMain with a bit of experience, there is a JEB script to help you as well, FindMain.py (available in the samples-script folder, also available on GitHub). Open up the script selector with F2 (menu: File, Scripts, Script selector…).

Run a JEB Python script inside the GUI client

Select the desired script and execute it. The result is displayed in the console:

...
Found high-level entry-point at 0x401175 (branched from 0x401D38)
Renaming entry-point to 'winmain'
...

The code at 0x401175 was auto-renamed to winmain (menu: Action, Rename…).

Initial decompilation

Let’s decompile that method by pressing the TAB key (menu: Action, Decompile…).

Initial decompilation of WinMain

Two items of interest to note at this point:

  • There is lots of code that appears to be junk or garbage
  • There is a note about some “deobfuscation score”

Junk code

The decompiled WinMain method is about 300 lines of C code. A lot of it are assignments writing to program globals. At first glance, it looks like it could be some sort of obfuscation. Let’s look at the corresponding assembly code:

Press TAB to go back from a decompilation to the closest matching machine code disassembly line

The snippets have the following structure:
push GARBAGE / pop dword [gXXX]

Or that, assuming edi is callee-saved:
mov edi, gXXX / ... / mov dword [edi+offset], GARBABE

Later on, we will see how to remove this clutter to make the analysis more pleasant.

Deobfuscation score

A note “deobfuscation score: 6” was inserted as a method comment. That score indicates that some “advanced” clean-up was performed. In this case, a careful examination (as well as a comparison against a decompilation with UNSAFE optimizers turned off, which you can do by redecompiling the method with CTRL+TAB (menu: Action, Decompile with Options…)) will point to this area of code:

The opaque predicate calculation is highlighted in green using CTRL+M (menu: Action, Toggle Highlight…)

This predicate looks like the following: if(X*(X+1) % 2 == 0) goto LABEL.

With X being an integer, X*(X+1) is always even. Therefore, the predicate will always evaluate to true. JEB cleaned this up automatically. (While this particular predicate is trivial, truly opaque predicates will also be attempted to be broken up by JEB, using the Z3 SMT solver.)

Comparison with GHIDRA

For a point of comparison, you may have a look at the same method decompiled by GHIDRA 10.4 here (default settings were used, just like we did with JEB). The predicate is not cleaned-up adequately, extra control-flow edges are left over, leading to AST structuring confusion.

Cleaning up the code

Let’s start with decluttering this code. First of all, why couldn’t the decompiler clean it up on its own? If the globals written to are never read with meaningful intent, then they could be discarded.

The issue is that this is very hard to ensure in the general case. However, in specific cases, sometimes involving manual review, some global written-to memory range may be deemed useless, as it is the case here. How do we provide this information to the decompiler? Well, as of version 5.7, we cannot! 1 What we can do though is write a decompiler plugin to clean-up the offending IR, and in the process, generate clean(er) code.

IR cleaner plugin

The decompiler accept several types of plugins, including IR Optimizers (they work on the Intermediate Representation of a routine, as it moves up the decompilation pipeline), and AST optimizers (to clean-up or reformat the generated abstract syntax tree of the pseudo-code). In most cases, IR optimizers are well-suited to perform code clean-up or deobfuscation tasks (refer to this blog post for a detailed comparison).

We will write the plugin in Java (we could also write it in Python). It will do the following:

  • Examine each IR statement of a CFG
  • Check if the statement is writing an immediate to some global array: *(array + offset) = value
  • If so, check the array name. If it starts with the prefix “garbage”, consider the statement useless and replace it by a Nop statement

Writing IR plugins is out-of-scope in this post; we will go over that in details in a future entry. In the meantime, you can download the plugin code here. Dump the Java file in your JEB’s coreplugins/scripts/ folder. There is no need to close and re-open JEB; it will be picked up at the next decompilation.

public class GarbageCleaner extends AbstractEOptimizer {

	@Override
	public int perform() {
		int cnt = 0;

		for (BasicBlock<IEStatement> b : cfg) {
			for (int i = 0; i < b.size(); i++) {
				IEStatement stm = b.get(i);
				if (stm instanceof IEAssign && stm.asAssign().getDstOperand() instanceof IEMem
						&& stm.asAssign().getSrcOperand() instanceof IEImm) {
					IEMem dst = stm.asAssign().getDstOperand().asMem();
					IEGeneric e = dst.getReference();
					// [xxx + offset] = immediate
					if (e.isOperation(OperationType.ADD)) {
						IEOperation op = e.asOperation();
						if (op.getOperand1().isVar() && op.getOperand2().isImm()) {
							IEVar v = op.getOperand1().asVar();
							IEImm off = op.getOperand2().asImm();
							if (v.isGlobalReference()) {
								long addr = v.getAddress();
								INativeContinuousItem item = ectx.getNativeContext().getNativeItemAt(addr);
								// logger.info("FOUND ITEM %s", item.getName());
								if (item != null && item.getName().startsWith("garbage")) {
									long itemsize = item.getMemorySize();
									if (off.canReadAsLong() && off.getValueAsLong() + dst.getBitsize() / 8 < itemsize) {
										logger.info("FOUND GARBAGE CODE");
										b.set(i, ectx.createNop(stm));
										cnt++;
									}
								}
							}
						}
					}
				}
			}
		}

		if (cnt > 0) {
			cfg.invalidateDataFlowAnalysis();
		}
		return cnt;
	}
}

Note that by design, the plugin is not specific to this malware. We will be able to re-use it in future analyses: all global arrays prefixed with “garbage” will be treated by the decompiler as junk recipients, and cleaned-up accordingly!

Defining the garbage array

At this point, we need to determine where that array is. Some examination of the code leads to the following boundaries (roughly): start at 0x41597E, spans over 0x100 bytes. Navigate to the disassembly; create an array using the STAR key (menu: Native, Create/Edit Array…); specify its characteristics.

Creating a global array of 0x100 bytes. This is the garbage array.

As soon as the array is created, the disassembly will change to what can be seen below. At the same time, the decompilations using that array will be invalidated; that is the case for WinMain. You may see that another extra-comment was added by the decompiler: “Stale decompilation – Refresh this view to re-decompile this code”. Such decompilations are read-only until a new one is generated.

The array is now created. The decompilation of WinMain becomes stale.

Before redecompiling, remember we need to rename our array with a label starting with “garbage”. Set the caret on the array, hit the key N (menu: Actions, Rename…) and set your new name, e.g., garbageArray1.

Now you may go back to the decompilation view of WinMain and hit F5 (menu: Windows, Refresh…) to regenerate a decompilation.

Decompiled WinMain after the garbage array-assigns were cleaned-up by the plugin

The code above is much nicer to look at – and much easier to work on!

Quick analysis

The method at 0x401000, called by WinMain, is decrypting the thief’s wallet address, and generating two hexstring versions of it (ascii and unicode).

Decrypting the target wallet address. The decompilation is shown after proper types were applied on the data structures accessed (encrypted wallet address, hexstrings, etc.) and better names given to those vars

The loop in WinMain is doing the following:

  • Every second, it queries the Windows clipboard with OpenClipboard
  • It checks if it contains text strings or unicode strings
  • If the string is 42 characters in length and starts with “0x”, it proceeds (an Ethereum wallet address is 20 bytes, therefore its hexadecimal representation would be 40 characters)
  • It checks if the string is not the attacker’s wallet address
  • If not, it replaces the contents of the clipboard data by the attacker’s wallet address using SetClipboardData
  • Finally, the other contents found in the clipboard is discarded

Well-known literals

In JEB, you may replace immediates by well-known literals found in type libraries (aka typelibs, such as the win32 typelibs, which were automatically loaded when the analysis of the PE file started). To do that, select the immediate, then hit CTRL+N (menu: Action, Replace…), and select the desired literal 2

For example, per the MSDN, GetClipboardData uses CF_xxx constants to indicate the type of data. We can ask JEB to replace GetClipboardData(13) by GetClipboardData(CF_UNICODETEXT) using the Action/Replace handler:

Replacing 13 by CF_UNICODE in a call to GetClipboardData

Conclusion

That concludes the first blog in this “How to use JEB” series. In the next episodes, we will look at other features, dig deeper into writing IR plugins, look into types and types creation, and reverse other architectures, including exotic code.

To learn more, we encourage you to:

  • Explore this blog, as it contains many technical entries and how-to’s.
  • Look at the sample code (scripts and plugins) shipping with JEB, it will get you started on using the API to write your own extensions.
  • Join our Slack channel to engage with other users in the community and ask questions if you’re stuck on anything.

Thank you very much & Stay tuned 🙂 Happy Holiday to All 🎄

  1. The plugin written to analyze this malware may ship in some upcoming version of JEB.
  2. In many cases, JEB will do that automatically, and it should be the case here.

JEB Assistant

Update: With JEB 5.6, several restrictions are lifted to make the Assistant available for Java decompiled output generated by dexdec (it is currently limited to C output generated by gendec).

Starting from JEB 5.2, you may use the experimental “JEB Assistant” to infer names for decompiled methods and method parameters.

Below is a decompiled aarch64 routine found in the BPFDoor malware. A raw decompilation does not produce any useful name (the default routine name is sub_40157C).

An unnamed arm64 decompiled routine

You may click the “Call the Assistant” button (also available via the Action menu, Request Assistant handler, or the back-tick keyboard shortcut) to query the assistant via JEB.IO. At the time of writing, a JEB.IO account is not required to access the assistant.

Upon first request, a disclaimer will be shown, letting you know that the decompiled code must be sent to our server:

The disclaimer is shown the first time the assistant is called

The assistant may return a better name for the method and its parameters. Sometimes, the names may be incorrect, yet provide some insight into what the method is doing. Other times, they may be entirely out of scope! It is always better to take the provided results as hints, rather than absolute truths.

In the case of our mysterious method, the assistant did provide valuable information: decryptData(data, size, key). Indeed, the method is a decryption function — more specifically, rc4 with a pre-computed sbox. The parameter names are (almost) correct.

You may decide to apply the suggested method name directly. The suggested parameter names are not applied automatically.

The assistant is providing the suggestions, it is up to the user to apply them

This feature is experimental. Currently, several limitations apply:

  • The assistant is limited to decompiled native routines. It will not work for dex/dalvik decompilations. The assistant works with routines as well as decompiled classes.
  • The assistant will refuse to work on overly long routines (whose decompilation exceeds several thousand characters).
  • The assistant is not available via the JEB API and requests are rate-limited (at most one every 5 seconds).

On the plus side, a JEB.IO account is not required at this time to use the assistant! Anybody can use it to (sometimes) gain insight into obscure decompilations. We hope it will help you in your reverse-engineering efforts. Please let us know your feedback through the usual channels (email, Slack, etc.).

Until next time 🙂 — Nicolas.

Control-flow unflattening in the wild

Both JEB decompiler engines 1 ship with code optimizers capable of rebuilding methods whose control-flow was transformed by flattening obfuscators.

Image © Tigress (University of Arizona)

Control-flow flattening, sometimes referred to as chenxification2, is an obfuscation technique employed to destructure a routine control-flow. While a compiled routine is typically composed of a number of basic blocks having low ingress and egress counts, a flattened routine may exhibit an outlier node having high input and high output edge counts, and generally, a very high centrality in the graph (in terms of vertex betweenness). Practically speaking, the original method M is reduced to a many-way conditional block H evaluating an expression VPC, dispatching the flow of execution to units of code, each one performing a part of M, updating VPC, and looping back to H. In effect, the original structured code is reduced to a large switch-like block, whose execution is guided by a synthetic variable VPC. Therefore, the original flow of control, critical to infer meaning while performing manual reverse-engineering, is lost. 3

We upgraded dexdec‘s control flow unflattener earlier this year. 4 The v2 of the unflattener is more generic than our original implementation. It is able to cover cases in which the obfuscated does not map to the clean model presented above, e.g. cases where the dispatcher stands out.

This week, we encountered an instance of code that was auto-deobfuscated to clean code and thought it’d be a good example to show how useful generic deobfuscation of such code can be. It seems that the obfuscator that was used to protect the original code was BlackObfuscator, a project used by clean apps and malware alike.

Hash: 92ae23580c83642ad0e50f19979b9d2122f28d8b3a9d4b17539ce125ae8d93eb

Before deobfuscation.

After deobfuscation, the code looks like:

After deobfuscation.

If you encounter examples where the unflattener does not perform adequately, please let us know. We’ll see if they can be fixed or upgraded to cover obfuscation corner-cases.

Thank you & until next time — Nicolas.

  1. dexdec is JEB’ dex/dalvik decompiler, gendec is JEB’s generic decompiler used for native code and any code other than dex/dalvik
  2. A term coined by University of Arizona’s Pr. Christian Collberg for the fact that an early description of this technique was presented by Dr. Chenxi Wang in her PhD thesis
  3. Control-flow flattening can be seen as a particular case of code virtualization, which was covered in previous blog entries.
  4. JEB 4.25 released on Jan 17 2023

Recovering JNI registered natives, recovering protected string constants

This is part 2 of the blog that introduced the major addition that shipped with JEB Pro 4.29: the ability for the dex decompiler to call into the native analysis pipeline, the generic decompiler and native code emulator.

Today, we demo how to use two plugins shipping with JEB 4.30, making use of the emulators to recover information protected by a native code library found in several APKs, libpairipcore.so.

Recovering statically registered native routines

The first plugin can be used to discover native routines registered via JNI’s RegisterNatives. As a reminder, when calling a native method from Java, the JNI will see if exported routines with specific names derived from the Java method signature exist in the process. Alternatively, bindings between a Java native method and its actual body can be done with RegisterNatives. Typically, this is achieved in JNI_OnLoad, the primary entry-point. However, it does not need to; other techniques exist to further obfuscate the target call site of a Java native method, such as unregistration/re-registration, the obfuscation of JNI_OnLoad, etc. More information can be found here.

In its current state, the plugin will attempt to emulate a SO library’s JNI_OnLoad on its own, without the context of the app process it would normally run on. The advantage is that the plugin is useable on libraries recovered without their container app (APK or else). The drawback is that it may fail in complex cases, since the full app context is not available to this plugin. (Note that the second plugin does not suffer this limitation).

Open an APK or Elf SO file(s), run the “Recover statically-registered natives (Android)” plugin.
Set optional name filters or architecture filters as needed.
The results will be visible in the log. In this case, it looks like the aarch64 library libpairipcore.so registered one method for com.pairip.VMRunner.executeVM, and mapped it to a routine at 0x5F180.

Recovering constants removed from the Dex

The second plugin makes use of an IEmulatedAndroid object to simulate an execution environment and execute code that may be restoring static string constants removed from the Dex by code protection systems.

We can imagine that the code protection pass works as such:

String constants are being removed during a protection pass.

The implementation details of restore() are not relevant to this blog entry. In the case of that particular app, it involves calling into a highly obfuscated native library called libpairipcore.so.

The plugin requires a full APK. It will emulate a static method selected by the user and let them know about the constants that were restored.

The plugin workflow is as follows:

After loading an APK, the plugin may let the user know that the code was protected.
Execute the “Recover removed Dex constants” plugin.
The user will be asked to input the no-arg static method that should be simulated. If a suitable one is found, it may be pre-populated by the plugin.
The execution can be lengthy, from several seconds to several minutes. Recovered strings are registered as fields comments as well as decompiler events in the relevant dexdec unit of your project.

Conclusion

That’s it for today. Make sure to update to JEB Pro 4.30 if you want to use those plugins.

I would encourage power-users to explore the JEB’s API, in particular IDState, EState/EEmulator and IEmulatedAndroid, if they want to experiment or work on code that requires specific hooks (dex hooks, jvm sandbox hooks, native emu hooks, native memory hooks – refer to the registerXxxHooks methods in IDState) for the emulators to operate properly.

Until next time — Nicolas.

Android JNI and Native Code Emulation

JEB 4.29 finally bridges the gap between the dex analysis modules in charge of code emulation (dexdec‘s IDState and co.) and their counterparts in the native code analysis pipeline (gendec‘s EEmulator, EState and co.).

The emulation of JNI routines from dexdec unlocks use-cases that are now becoming commonplace, such as:

  • Object consumption relying on native code calls to make reverse-engineering harder. The typical case is the retrieval of encrypted strings where part of the decryption code is bytecode, part is native code.
  • General app tweaking done on the native side, such as field setting, field reading, method invocation, object creation, etc.

Example

Here is an example of what could not be done by JEB <4.29:

//
// dex code:
//

package a.b;

class X {
  ...
  native String decrypt(char[] array, int key1, int key2);
  ...
  void f() {
    return decrypt(new char[]{'K', 'F', 'C'}, 4, 3);
  }
  ...
}

//
// native code:
//

// pseudo-code for method `dec` mapping to `a.b.X.decrypt`
jstring dec(JNIEnv* env, jobject this, jcharArray array, int a, int b) {
  int len = (*env)->GetArrayLength(env, array);
  uint16_t out[len];
  for(int i = 0; i < len; i++) {
    out[i] = array[i] - (a - b);
  }
  return (*env)->NewString(env, out, len);
}

JEB used to decompile X.f() to:

void f() {
  return decrypt(new char[]{'K', 'F', 'C'}, 4, 3);
}

JEB 4.29, if the native emulator is enabled, is able to return a simpler version:

void f() {
  return "JEB";
}

Preparation

Currently, the native emulator is disabled by default. In order to let dexdec use it, edit your dexdec-emu.cfg file (located in your coreplugins/ folder, or in the GUI, Android menu, handler Emulator Settings…):

  • Mandatory: set enable_native_code_emulator to true
  • Recommended: increase the values of emu_max_duration and emu_max_itercount (the reason being the the analysis of native images by the native code plugins can be quite time-consuming).

You will also need a JEB Pro license to use this feature.

Output

As usual, the auto-decryption of an item will also emit an event, which can be collected programmatically, and visible in the Decompiler’s “Events” fragment in the GUI.

Items whose address is formatted as @LIB:<lib.so>@NativeAddress are decrypted native items that were found in the SO image at some point.

Decrypted strings collected by the decompiler

Similarly, decrypted items found in decompiled code are rendered using a purple’ish pink (by default) in the GUI.

If native code was involved in the decryption, the on-hover pop-up will let you know:

Decryption of that string required emulation of native code

API

The native emulator(s) managed by a dexdec‘s IDState can be customized with the following newly-added methods and types:

  • enableNativeCodeEmulator / isNativeCodeEmulatorEnabled : enable or disable the native emulator (the master setting is pulled from your config file, dexdec-emu.cfg)
  • registerNativeEmulatorHooks / unregisterNativeEmulatorHooks : hooks into the evaluation (emulation) of the native code – refer to the appropriate hooks interfaces. The hooks receives a reference to the controlling EEmulator.
  • unregisterNativeEmulatorHooks / ununregisterNativeEmulatorHooks : hooks into the memory accesses of the emulator’s state – refer to the appropriate hooks interfaces. The hooks receives a reference to the target EState object.

Conclusion

Interfacing both emulators offers many possibilities to improve the reverse-engineering experience of complex binaries and applications.

There is more that can be done, which will be discussed further blog posts:

  • Retrieval of statically registered natives (through JNIEnv’s RegisterNatives) as opposed to native routines automatically resolved using the JNI naming conventions.
  • Automatic unpacking of native code.
  • Use of the native emulator in custom scripts and plugins.

Note that this feature is currently limited to JEB Pro.

The JNI native code emulator will work with x86, x64, and arm64 code (we may add support for arm in the near future). Needless to say, it is still in experimental mode! Therefore, you may encounter strange results or problems while analyzing code making use of it. Please send us error reports to support@pnfsoftware.com.

Until next time, and once again, thank you to our amazing users for their continued support and kind words 🙂 — Nicolas.

IR and AST Optimizers in Decompilers

The following is a small guide that will help users writing decompiler plugins decide whether they need to work at the IR (Intermediate Representation) level or at the AST (Abstract Syntax Tree) level. The recommendations apply to both JEB decompiler engines, dexdec (for Android Dex/Dalvik) and gendec (generic decompiler engine.

Decompilation Pipeline

A method undergoing decompilation goes through the following simplified pipeline:

  1. The low-level native code (machine code or bytecode) is converted to low-level IR
  2. Some augmentation take place, including SSA transformation and typing
  3. IR processors lift and clean the low-level IR
  4. The final high-level IR is converted to an AST
  5. AST processors clean and beautify the code
  6. The final AST is rendered as pseudo-code

The steps 3 (IR processing) and 5 (AST processing) are customizable by the user through JEB’s API. Indeed, custom plugins are sometimes necessary to perform work not done by JEB’s built-in optimizers.

IR vs AST

The following comparison between IR and AST will help you decide which plugin is better suited to perform some type of work.

  • The number of IR elements to deal with is substantially smaller than the AST counterpart. As such, it may be easier to learn at first. The AST being more abstract and closer to final pseudo code, there are necessarily more types of elements (e.g. a Break element, representing a break; statement, does not exist at the IR level). However, modifying IR statements requires more care than modifying the AST tree.
  • The IR of a method is a flat sequence of instructions, organized into basic blocks. The flow of execution between the blocks is clear and concise. On the other hand, the AST being a tree, its navigation is not as straight-forward as a flat IR listing. While the concept of blocks exists, they are not necessarily basic blocks, and the flow of execution in the AST is not trivial to determine.
  • A consequence of the above is that data analysis is easier done at the IR level than at the AST level. The IR framework provides Data Flow Analysis objects with easy-to-use ways to determine where and by what variables are being accessed. This is a fundamental prerequisite for many non-trivial optimizers whose goal is code cleaning or restructuring (e.g. constant and variable propagation, dead code elimination, etc.).
  • Continuing the above, the IR framework generally offers more facility and helpers to perform advanced optimization, such as deobfuscation. Examples: dexdec offers an emulator and sandbox engine at the IR level, something unavailable at the AST level; gendec offers pattern matching facility making the development of complex IR rewriting rules easy.
  • The AST is closer to the final generated pseudo-code. As such, it is a place of choice to perform final beautification or clean-up passes. High-level clean-up, requiring the insertion of AST elements with no IR equivalents, can only be done at the AST level.

Generally, working at the AST level will seem more approachable and an easiest entry-point to writing decompiler plugins. However, in most cases, IR processors will be better suited to perform non-trivial optimizations and deobfuscation.

Development

For dexdec, IR and AST plugins can be developed as compiled jar, or plugin scripts (Java or Python). Plugin scripts are extremely convenient for quick prototyping. See example code in your JEB coreplugins/scripts/ folder.

For gendec, IR and AST plugins can be developed as compiled jar only. Support for plugin scripts will come soon.

Resources

This blog contains several tutorials on how to get started with writing IR and AST plugins for both dexdec and gendec.

You will also find examples in this GitHub repository.

API Reference: dexdec IR, dexdec AST, gendec IR, gendec AST

Reversing dProtect

In this post, we’re having a look at the first release of dProtect (v 1.0) by Romain Thomas. dProtect is a fork of ProGuard that provides four additional self-explanatory configuration flags:

  • -obfuscate-strings
  • -obfuscate-constants
  • -obfuscate-arithmetic
  • -obfuscate-control-flow (via flattening & opaque predicates — unfortunately, I was unable to get this flag to work, so it’s something we’ll have to revisit in the future.)

Let’s see how JEB’s dexdec’s built-in optimizers as well as custom IR plugins can be used to defeat some implementations of strings obfuscation, constants obfuscation, and arithmetic operations obfuscation.

Strings Obfuscation

The test method is as follows:

// targeted by: -obfuscate-strings
public String provideString() {
    return "hello dProtect";
}

Let’s disable dexdec’s built-in deobfuscators (CTRL+TAB to decompile, untick “Enable deobfuscators”) to get a chance to look at the obfuscated code. It decompiles to:

public static String a(String arg14) {
    StringBuilder v0 = new StringBuilder();
    int v1 = ((int)DPTest1.b[4]) ^ 1684628051;
label_8:
    while(v1 < arg14.length()) {
        int v2 = arg14.charAt(v1);
        while(true) {
            int v9 = v2 ^ -1;
            v0.append(((char)((((int)DPTest1.b[10]) ^ 0x2AE022E9) + v2 + (((int)DPTest1.b[3]) ^ 0x35A299BD) + (((int)DPTest1.b[10]) ^ 0x2AE022E9 ^ -1 | v9) - ((((int)DPTest1.b[10]) ^ 0x2AE022E9) + v2 - ((((int)DPTest1.b[10]) ^ 0x2AE022E9) + v2 + (((int)DPTest1.b[3]) ^ 0x35A299BD) + (((int)DPTest1.b[10]) ^ 0x2AE022E9 ^ -1 | v9))))));
            long[] v3 = DPTest1.b;
            int v6 = v1 ^ -1;
            v1 = v1 + (((int)v3[3]) ^ 0x35A299BD) + (((int)v3[3]) ^ 0x35A299BD) + (((int)v3[3]) ^ 0x35A299BD ^ -1 | v6) + ((((int)v3[3]) ^ 0x35A299BD) + v1 - ((((int)v3[3]) ^ 0x35A299BD) + v1 + (((int)v3[3]) ^ 0x35A299BD) + (((int)v3[3]) ^ 0x35A299BD ^ -1 | v6)));
            if((DPTest1.a + (((int)v3[3]) ^ 0x35A299BD)) % (((int)v3[7]) ^ 0x2B0F969A) != 0) {
                continue label_8;
            }
        }
    }

    return v0.toString();
}

public String provideString() {
    return DPTest1.a("歬歡歨歨歫欤歠歔歶歫歰歡歧歰");
}

A decryptor method a(String):String was generated by dProtect. It performs various computations to decrypt the input string.

One built-in optimizer that ships with JEB’s dexdec uses the IDState object to perform emulation (explained in a previous blog). It cleans up such code automatically:

provideString() is auto-deobfuscated by JEB’s dexdec

Arithmetic Operations Obfuscation

The test method is as follows:

// targeted by: -obfuscate-arithmetic
public int calculate(int x) {
    return 100 + x;
}

With standard JEB settings (re-tick “Enable deobfuscators” if you had disabled it), the obfuscated code decompiles to:

static {
    long[] v0 = new long[12];
    DPTest1.b = v0;
    v0[0] = 0x371C2961L;
    v0[1] = 0x13DD5724L;
    v0[2] = 0x17EB3014L;
    v0[3] = 0x35A299BCL;
    v0[4] = 1684628051L;
    v0[5] = 1720310111L;
    v0[6] = 0x576F77CBL;
    v0[7] = 0x2B0F9698L;
    v0[8] = 360862103L;
    v0[9] = 0x5A9D6037L;
    v0[10] = 0x2AE049EDL;
    v0[11] = 2060383159L;
    DPTest1.a = ((int)v0[11]) ^ 1305664179;
}

public int calculate(int arg4) {
    return arg4 + (((int)DPTest1.b[0]) ^ 0x371C2905);
}

As can be seen, the constant 100 has been replaced by an arithmetic operation, here, a XOR operating on an immediate and a static array element set up in the class initializer.

JEB does not ship with overly complex deobfuscators operating on arrays, because it is near-impossible in the general case to assess their finality (i.e. answer the question “will values be changed during the program execution?” definitively). However, to solve particular cases of obfuscation, writing a custom IR plugin to tackle this obfuscation is an acceptable solution. (Have a look at this post to get started on dexdec IR plugins.)

Let’s check DOptUnsafeArrayAccessSubst.java, a sample IR plugin that ships with JEB (folder coreplugins/scripts/) and does does exactly what we need: detecting the use of static array elements and replacing them by their actual values. We can enable the plugin by removing the “.DISABLED” extension. Now redecompile (CTRL+TAB). And… well, nothing has changed! It is time to examine the plugin code carefully, maybe even use your favorite IDE to troubleshoot and augment it. Here is what prevented the original plugin from kicking in: the plugin was looking for IR elements such as: IDArrayElt ^ IDImm. However, the IR it got was: (<int>IDArrayElt) ^ IDImm, that is, the array element was cast to int, making the IR expression an IDOperation, not an IDArrayElt.

The DOptUnsafeArrayAccessSubstV2.java plugin takes care of that (refer to isLikeArrayElt method).

Now we can redecompile. and things were deobfuscated as expected:

calculate() is deobfuscated by DOptUnsafeArrayAccessSubstV2

Constants Scrambling

Finally, let’s have a look at how constants obfuscation is achieved. The documentation gives examples of cryptographic-like S-boxes being initialized. The test method is as follows:

// targeted by: -obfuscate-constants
public void initArray(int[] a) {
    a[0] = 0x61707865;
    a[1] = 0x3320646e;
    a[2] = 0x79622d32;
    a[3] = 0x6b206574;
}

Out of the box, JEB decompiles the obfuscated code to:

static {
    long[] v0 = new long[12];
    DPTest1.b = v0;
    v0[0] = 0x371C2961L;
    v0[1] = 0x13DD5724L;
    v0[2] = 0x17EB3014L;
    v0[3] = 0x35A299BCL;
    v0[4] = 1684628051L;
    v0[5] = 1720310111L;
    v0[6] = 0x576F77CBL;
    v0[7] = 0x2B0F9698L;
    v0[8] = 360862103L;
    v0[9] = 0x5A9D6037L;
    v0[10] = 0x2AE049EDL;
    v0[11] = 2060383159L;
    DPTest1.a = ((int)v0[11]) ^ 1305664179;
}

public void initArray(int[] arg5) {
    long[] v0 = DPTest1.b;
    arg5[1684628051 ^ ((int)v0[4])] = 133800250 ^ ((int)v0[5]);
    arg5[0x35A299BD ^ ((int)v0[3])] = 0x644F13A5 ^ ((int)v0[6]);
    arg5[0x2B0F969A ^ ((int)v0[7])] = 0x6CE07CA5 ^ ((int)v0[8]);
    arg5[0x13DD5727 ^ ((int)v0[1])] = ((int)v0[9]) ^ 0x31BD0543;
}

Note that the use of synthetic static arrays is made, as was the case for the arithmetic operations obfuscation pass. Therefore, let’s try the DOptUnsafeArrayAccessSubstV2 plugin. As careful examination of the above code may give in, the plugin fails to deobfuscate this code on the first go. The reason: if you examine the IR produced while debugging the plugin, you will notice that the static array elements are accessed via a variable (v0, above). In IR, those elements are IDVar. Therefore, we need to check whether this variable references a static array. We will do that by using the data flow analysis facility made available to all dexdec plugins (public field dfa of optimizers sub-classing AbstractDOptimizer):

...
analyzeChains();  // initialize the `dfa` member field
Long defaddr = dfa.checkSingleDef(insnAddress, varid);  // use-def chains
...

The improved plugin can be found here: DOptUnsafeArrayAccessSubstV3.java

The obfuscated code is now processed as expected, and dexdec generates the following decompilation:

initArray() is deobfuscated by DOptUnsafeArrayAccessSubstV3

Conclusion and Future Work

dProtect is a great project to provide code obfuscation for the masses. Its compatibility with ProGuard makes integration into new and existing Android projects a breeze. I have little doubt many developers will try it out in the future. Let’s see how upcoming upgrades to the obfuscators fare against the decompiler!

In future blogs, we will have a look at dProtect’s control-flow obfuscation (once I’ve got it to work!) and we will see how O-MVLL, the LLVM-based native code obfuscator counterpart, does against JEB’s gendec (generic decompiler for native code).

Until next time! – Nicolas