Deobfuscation ratings, inlining “fat” functions, and breaking opaque predicates

In this post, we are having a quick look at a relatively novel protection techniques found in the wild. The class we are looking at is com.X (SHA256: a519e4a20586807665d82ea28892e2ede184807868552f23210bf10c05727980).

Have a look at the decompiled code, with standard JEB options. It was auto-deobfuscated and thoroughly cleaned by dexdec, JEB’s Dalvik decompiler:

Decompilation of com.X with standard options (it’s been deobfuscated, and JEB is letting you know about it by providing deobfuscation ratings or scores as method comments)

A note on deobfuscation ratings

Two items to notice:

  • Some methods outputs are collapsed: their direct output was deemed useless because their code were inlined in corresponding callers. You may re-expand them with the Dash (-) action key, or via the Action menu, Collapse/Expand command.
  • Some decompiled methods have an auto-comment specifying a deobfuscation rating and score. This score is calculated from the result of IR optimizers tagged as DEOBFUSCATOR. If the score reaches a threshold, the rating (LOW – not shown-, MEDIUM, HIGH, EXTRA) is specified in the decompilation output, to give a hint to the user that the low-level code is protected, and that the high-level decomp was deobfuscated and cleaned.

The deobfuscation ratings for several methods of com.X are high. It looks like this class received a significant amount of protection. However, after clean-up, the meaningful code consists of two one-liner methods: one storing a timestamp (method gg), the other one calculating an elapsed time (method gf).

Let’s have a look at the decompiled code with deobfuscators disabled: Redecompile the code with CMD1+TAB (Action menu, Decompile with Options…), and untick “Enable deobfuscator optimizers”.

dexdec options when redecompiling with Action, Decompile with options…

The re-decompilation result is as follows:

Decompilation of com.X with deobfuscators disabled

There is quite a lot to look at here, mainly, the fat routines and the opaque predicates.

Inlining “fat” functions

We see that gf calls new with a set of fixed integer (v, v1) as well as the identityHashCode of itself (v3, essentially a pseudo-random number). Similarly, gg also redirects to new, with a different set of arguments.

The methods gf() and gg() are wrappers calling the method new() with various keys

A quick examination of new shows that two code paths may be executed, based on the values of the provided triplet (v, v1, v2):

Decompilation of the synthetic “fat” function new, holding the real code of gf and gg

So, what happened? The protection of class com.X consisted of taking the bodies of code of gf and gg, merge them into a single method new (hence the name “fat”), and change the codes of gf and gg to trampoline into new with selectors to execute the proper code.

Here is an easier representation of that process, with a single selector (instead of a triplet):

// UNPROTECTED CLASS C
class C 
  int fld1;
  
  int f1(int x) {
    return 25 + x;
  }
  
  int f2() {
    return 31 * fld1;
  }
}

// PROTECTED CLASS C
class protected_C
  int fld1;

  int f1(int x) {
    return (int)fat_routine(new Object[]{this, x}, 1);
  }
  
  int f2() {
    return (int)fat_routine(new Object[]{this}, 2);
  }
  
  static Object fat_routine(Object[] params, int selector) {
    if(selector == 1) {
      return 25 + (int)params[1];
    }
    else if(selector == 2) {
      return 31 * ((C)params[0]).fld1;
    }
    throw new RuntimeException();  // should not happen
  }
}

Although the above code is trivial, we can use it to highlights two complications the decompiler will face when dealing with the more complex implementations made by the a real code protection system:

  • When to decide to inline, i.e. how to detect fat functions? (that question is outside the scope of this blog, and would not be of much interest to most readers)
  • What about complex selectors, such as a triplet with a pseudo-random int?

If JEB’s dexdec were to inline the calls to new as it is, we’d end up with the following decomps – not quite what we saw at the beginning of this article!

Decompilation the deobfuscators re-enabled, however the opaque predicate breaker was disabled

Resolving opaque predicates

Let’s look at method gf. We can see that the pseudo-random selector, after inlining, is used to calculate a predicate that will determine which path to take, i.e. do we execute the actual code for gf, or the code for gg?

The predicate seen in gf can be re-written as:

PRED = 0xDE9B00B0 + (~(0x99525D4B | X) | 1 & X) * 520 + (0x99525D4B & X | ~(X | 0x66ADA2B5)) * -1040 + (0x99525D4A | 0x66ADA2B5 & X | ~(X | 0x66ADA2B5)) * 520 != 1

Internally, JEB does quite a bit to simplify it, and ultimately, when all fast reductions and simplifications are applied, it will use the well-known Z3 SMT solver to break the predicate. In this case, regardless of the value of X, the predicate is true. Therefore, gf will be simplified to:

return X.iz(arr_object);

(Note that method iz is itself a candidate for inlining! At the end, the cleaned-up code shown in the introduction of this article will be generated.)

The use of Z3 and other external theorem provers that may be used by JEB and its plugins can be disabled in the option (see “Enable predicate breaker”):

The external predicate solver can be disabled in the options

Conclusion

We hope this quick note will shed some light on some newer features or recent upgrades that went into dexdec. Many of those were already present in gendec, the generic decompiler used for anything non-Dalvik, and it was about time to add those advanced clean-up passes into the Dalvik decompiler as well. In a sense, dexdec has caught up and even gone further than gendec on these aspects.

Which leads me to say there will likely be a Part 2 or at least an update for this blog, to highlight another complex deobfuscating task: the simplification of arithmetic operations consisting of bitwise operations and mixed boolean/arithmetic (MBA) expressions.

Stay tuned! Thank you to all our users and readers of this blog 🙂 Do not hesitate to reach out through the usual channels (Slack, email, X).

– Nicolas

Generic Unpacking for APK

Updated on March 19 2024: cover the additions of JEB 5.10 (auto-integration of dex, so files) and JEB 5.11 (unpacker report).

This post presents one of JEB components used for Android app reverse engineering: the Generic Unpacker for APK. 1

The unpacker will attempt to emulate the app’s execution in order to collect dex files and native libraries (so files, arm64 only) that would be dynamically generated at runtime. Many APK protectors, legitimate or otherwise – used for malicious purposes -, employ such techniques to make the payload Dalvik bytecode more difficult to access and analyze.

How to use the APK unpacker

First, open the target APK in JEB. In some cases, the unpacker module will let you know that there is a high-probability that the APK was packed:

In many cases, that heuristic won’t be triggered and no specific hint issued. Either way, you may start the unpacker via the Android menu, Generic Unpacking…

Start the Generic Unpacker via the Android menu

An options dialog will be displayed. The available options are:

  • Maximum duration after which the unpacking process should be aborted (the default is set to 3 minutes, although in most cases, unpacking will stop well before that time-out).
  • Whether or not collected dex should be used during the unpacking process itself (if so, they would be integrated in the current dex unit, to allow their emulation).
  • Whether or not collected so files should be used during the unpacking process itself.
  • If monitoring hooks should be set up to allow the generation of a report after the unpacking process completes (the report contains a trace of useful events, that could be used to quickly determine how the unpacking process works).
Options dialog for the unpacker

Press “Start” and let the unpacker attempt to recover hidden dex files and so libraries.

After it’s done, a frame dialog will list the unpacker results, consisting of dexdec MESSAGE notifications indicating which dex files were recovered, and where. The logger will display similar information. If the option was selected, the unpacker will also generate and display a report.

For each recovered dex and native library, a corresponding unit will be created under a sub-folder named “unpacked” (highlighted in green, located under the APK unit).

The unpacker has completed and is displaying its results (one dex file was recovered)

Analyzing the collected files

At this point, you may decide to analyze the recovered dex and so files(s) separately. In this case, simply open up the dex/elf unit(s) under “unpacked”, and proceed as normal (another code hierarchy, disassembly view, etc. will be opened).

Dex files integration

You may want to integrate the recovered dex with the already existing bytecode. If you ticked the options “Auto-integrate unpacked dex code to main dex unit”, the integration is automatic (and in many cases, it will allow the unpacker to proceed even further). Else, to do it manually, follow these steps:

  • Right-click on the recovered dex unit, select Extract to… and save the dex to a location of your choice
  • Navigate to the primary dex unit (generally named “Bytecode”), to which you want to integrate that saved dex to, and open it with a double-click
  • Go to the Android menu, select Add/Merge additional Dex files… and select the file previously saved
  • The collected dex will be integrated with the existing bytecode unit, and the bytecode hierarchy will reflect that update

Native libs analysis

The recovered arm64 library files may be analyzed separately. If the option “Allow use of unpacked libraries” was ticked, the recovered so files will be used by the unpacker, during unpacking. As was mentioned for dex above, in many cases, it will allow the unpacker to proceed further than normal.

Unpacking report

If the corresponding option was enabled before unpacking, a report will be generated after unpacking. It contains a detailed event trace of what happened, as well as a useful list of the most important unpacking events, that reverse engineers may view as a high-level “signature” of the unpacking code itself. A few examples follow.

Note that the full reports were trimmed, only their first section (“interesting records”) is displayed. The first colon indicates the emulation counter when the event occured, prefixed with either ‘j’ (java) or ‘n’ (native). The second item is the record type. Record specific strings follow, such as the method signature, string-marshalled parameters, program counter, memory addresses, register values, etc.

Report sample 1

This packer does not employ native code. The malware was provided by one of our users. The records indicate that:

  • the custom app’s attachBaseContext() was called
  • an asset was retrieved
  • from it, a custom jar was written
  • that jar (containing a dex, accessible in “upacked”) was loaded into the app’s process via DexClassLoader
INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#191 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#3614186 JAVA_INVOKE: android.content.res.AssetManager.openNonAssetFd ? ["tracks/radio.ogg"]
- j#15485592 JAVA_NEW: java.io.FileOutputStream ["/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline/wyhatiq.jar"]
- j#18119837 JAVA_NEW: dalvik.system.DexClassLoader [":/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline/wyhatiq.jar", "/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline", "/data/user/0/com.sekcbrgl.lodczqgwkhw/app_offline", ?]
- j#21005588 JAVA_FIELD_GET: android.app.ContextImpl.mPackageInfo ? [?]
- j#21006978 JAVA_FIELD_GET: android.app.ContextImpl.mPackageInfo ?

Report sample 2

This packer does not employ native code. The malware was provided by one of our users.

INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#1 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#16 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.metaData ? [?]
- j#38 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.sourceDir ? ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/ad.kokolzxs-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#70 JAVA_INVOKE: java.io.File.delete /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app []
- j#73 JAVA_NEW: java.util.zip.ZipFile [/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/ad.kokolzxs-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk]
- j#128 JAVA_INVOKE: java.io.File.mkdirs /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/META-INF []
- j#130 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/META-INF/123.SF]
- j#446 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/META-INF/123.RSA]
- j#496 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/AndroidManifest.xml]
- j#595 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/androidsupportmultidexversion.txt]
- j#646 JAVA_INVOKE: java.io.File.mkdirs /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets []
- j#648 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets/39285EFA.dex]
- j#951 JAVA_INVOKE: java.io.File.mkdirs /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets/apps/H5BF09C00/www/css []
- j#953 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/assets/apps/H5BF09C00/www/css/mui.css]
...
- j#145678 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/resources.arsc]
- j#146652 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/secret-classes.dex]
- j#173441 JAVA_INVOKE: javax.crypto.Cipher.getInstance ["AES/ECB/PKCS5Padding"]
- j#173445 JAVA_INVOKE: javax.crypto.Cipher.getInstance ["AES/ECB/PKCS5Padding"]
- j#173452 JAVA_NEW: javax.crypto.spec.SecretKeySpec [(97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112), "AES"]
- j#173455 JAVA_INVOKE: javax.crypto.Cipher.init ? [1, ?]
- j#173458 JAVA_INVOKE: javax.crypto.Cipher.init ? [2, ?]
- j#173479 JAVA_NEW: java.io.FileOutputStream [/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/secret-classes.dex]
- j#173504 JAVA_FIELD_GET: dalvik.system.BaseDexClassLoader.pathList ? [?]
- j#173519 JAVA_FIELD_GET: dalvik.system.DexPathList.dexElements ? [(?)]
- j#173559 JAVA_INVOKE: dalvik.system.DexPathList.makePathElements [[/data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0/app/secret-classes.dex], /data/user/0/ad.kokolzxs/app_io.dcloud.application.DCloudApplication_exDir_1.0, []]
- j#173586 JAVA_FIELD_SET: dalvik.system.DexPathList.dexElements ? [(?, ?)]

Report sample 3

This packer does not employ native code. The malware was analyzed by @cryptax here.

INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#1 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#3444 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.sourceDir ? ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.pmmynubv.nommztx-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#3447 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.dataDir ? ["/data/user/0/com.pmmynubv.nommztx"]
- j#3457 JAVA_FIELD_GET: android.os.Build$VERSION.SDK_INT [33]
- j#6276 JAVA_INVOKE: java.lang.System.getProperty ["java.vm.version"]
- j#6389 JAVA_INVOKE: java.io.File.mkdir /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g []
- j#6396 JAVA_INVOKE: java.io.File.mkdir /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo []
- j#9473 JAVA_NEW: java.util.zip.ZipFile [/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.pmmynubv.nommztx-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk]
- j#10254 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/tmp-base.apk.hFGg8tq17304470999884300019.weg]
- j#10259969 JAVA_INVOKE: java.io.File.renameTo /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/tmp-base.apk.hFGg8tq17304470999884300019.weg [/data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/base.apk.hFGg8tq1.weg]
- j#10259974 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/tmp-base.apk.hFGg8tq17304470999884300019.weg []
- j#10262055 JAVA_FIELD_GET: dalvik.system.BaseDexClassLoader.pathList ? [?]
- j#10262352 JAVA_FIELD_GET: android.os.Build$VERSION.SDK_INT [33]
- j#10262737 JAVA_INVOKE: dalvik.system.DexPathList.makePathElements [[/data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/base.apk.hFGg8tq1.weg], /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo, []]
- j#10262752 JAVA_FIELD_GET: dalvik.system.DexPathList.dexElements ? [(?)]
- j#10262770 JAVA_FIELD_SET: dalvik.system.DexPathList.dexElements ? [(?, ?)]
- j#10262792 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/base.apk.hFGg8tq1.weg []
- j#10262802 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo/T9etIiaI.uw87 []
- j#10262808 JAVA_INVOKE: java.io.File.delete /data/user/0/com.pmmynubv.nommztx/tgqaqjyg7g/hf8U6UUIwiqaGgo []
- j#10263604 JAVA_INVOKE: android.app.ActivityThread.currentActivityThread []
- j#10264168 JAVA_FIELD_GET: android.app.ActivityThread.mBoundApplication ? [?]
- j#10264725 JAVA_FIELD_GET: android.app.ActivityThread$AppBindData.info ? [?]
- j#10265796 JAVA_FIELD_GET: android.app.ActivityThread.mInitialApplication ? [?]
- j#10266370 JAVA_FIELD_GET: android.app.ActivityThread.mAllApplications ? [[?]]
- j#10266905 JAVA_FIELD_GET: android.app.LoadedApk.mApplicationInfo ? [?]
- j#10267542 JAVA_FIELD_GET: android.app.ActivityThread$AppBindData.appInfo ? [?]
- j#10267551 JAVA_FIELD_SET: android.content.pm.ApplicationInfo.className ? ["com.pmmynubv.nommztx.App"]
- j#10267554 JAVA_FIELD_SET: android.content.pm.ApplicationInfo.className ? ["com.pmmynubv.nommztx.App"]
- j#10268095 JAVA_INVOKE: android.app.LoadedApk.makeApplication ? [false, null]
- j#10268749 JAVA_FIELD_SET: android.app.ActivityThread.mInitialApplication ? [?]
- j#10269322 JAVA_FIELD_GET: android.app.ActivityThread.mProviderMap ? [?]

Report sample 4

This packer employs a mix of dex and native code. The malware APK was provided by one of our users.

INTERESTING RECORDS BY ORDER OF EXECUTION (JAVA, NATIVE):
- j#2 JAVA_INVOKE: android.content.ContextWrapper.attachBaseContext ? [?]
- j#25 JAVA_FIELD_GET: android.content.pm.ApplicationInfo.sourceDir ? ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.ddbewkjewujiijejk2ijfe.security-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#28 JAVA_NEW: java.util.zip.ZipFile ["/data/app/~~wgQXv0VF9Q1KDYlkLS3B5w==/com.ddbewkjewujiijejk2ijfe.security-TA1X_cMfmXCqI7Zt9GTCQA==/base.apk"]
- j#97 JAVA_INVOKE: java.io.File.mkdir /data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir []
- j#869 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l]
- j#969 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/o0oooOO0ooOo.dat]
- j#1044 JAVA_NEW: java.io.FileOutputStream [/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/tosversion]
- j#1150 JAVA_INVOKE: java.lang.System.loadLibrary ["shell-super.2019"]
- n#94557 REGISTERED_NATIVE: PC=0x7B000D80: msig=Lcom/wrapper/proxyapplication/WrapperProxyApplication;->Ooo0ooO0oO()V @0x100005250
- n#94572 REGISTERED_NATIVE: PC=0x7B000D80: msig=Lcom/wrapper/proxyapplication/CustomerClassLoader;->ShowLogs(Ljava/lang/String;I)I @0x10000318C
- j#1151 JAVA_FIELD_GET: android.app.ContextImpl.mPackageInfo ? [?]
- j#1151 JAVA_FIELD_GET: android.app.LoadedApk.mActivityThread ? [?]
- n#96934 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/o0oooOO0ooOo.dat flags=0x0
- n#97957 FILE_ACCESS: PC=0x74446BC008: path=/proc/self/maps flags=0x0
- n#125423 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l flags=0x2
- n#125432 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l flags=0x0
- n#126040 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/0OO00l111l1l.lock flags=0x42
- n#129378 FILE_ACCESS: PC=0x74446BC008: path=/proc/self/maps flags=0x0
- n#152428 FILE_ACCESS: PC=0x744465FA2C: path= flags=0x0
- n#152476 FILE_ACCESS: PC=0x744466A178: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir flags=0x0
- n#152484 FILE_ACCESS: PC=0x744466A178: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir flags=0x0
- n#154816 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir flags=0x0
- n#156501 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/tosversion flags=0x0
- n#162810 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/tx_shell flags=0x0
- n#164410 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/.updateIV.dat flags=0x42
- n#165717 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex flags=0x0
- n#1863052 FILE_ACCESS: PC=0x744466AE7C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex flags=0x42
- n#1863114 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex flags=0x0
- n#1865062 FILE_ACCESS: PC=0x744466F6E8: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/odexdir/ flags=0x0
- n#1867557 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/oat/ flags=0x0
- n#1867590 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/oat/arm64/ flags=0x0
- n#1867629 FILE_ACCESS: PC=0x74446BC008: path=/proc/self/maps flags=0x0
- n#1886913 MEMORY_READ: PC=0x100035640: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#1886915 MEMORY_READ: PC=0x100035648: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#1890133 FILE_ACCESS: PC=0x744465FA2C: path=/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/oat/arm64/00O000ll111l_0.odex flags=0x0
- j#1184 JAVA_FIELD_GET: dalvik.system.BaseDexClassLoader.pathList ? [?]
- j#1305 JAVA_INVOKE: dalvik.system.DexPathList.makePathElements [[/data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/00O000ll111l_0.dex], /data/user/0/com.ddbewkjewujiijejk2ijfe.security/files/prodexdir/odexdir/.oat, []]
- j#1337 JAVA_FIELD_GET: dalvik.system.DexPathList$Element.dexFile ? [?]
- j#1387 JAVA_FIELD_GET: dalvik.system.DexPathList.dexElements ? [(?)]
- j#1402 JAVA_FIELD_GET: android.os.Build$VERSION.SDK_INT [33]
- j#1411 JAVA_FIELD_SET: dalvik.system.DexPathList.dexElements ? [(?, ?)]
- n#1892649 MEMORY_READ: PC=0x100035640: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#1892651 MEMORY_READ: PC=0x100035648: addr=0x7466E597F8 size=0x4: 58 00 00 00 ("X\u0000\u0000\u0000")
- n#4903698 FILE_ACCESS: PC=0x74446BC008: path=/proc/12624/maps flags=0x0
- n#4903744 FILE_ACCESS: PC=0x744466AE7C: path=/proc/12624/maps flags=0x0
- n#4903776 FILE_ACCESS: PC=0x74446B0634: path=/proc/12624/maps flags=0x0
- n#4904889 MEMORY_READ: PC=0x10001A540: addr=0x0 size=0x8

API

An unpacker is represented by the IGenericUnpacker interface.

The unpacker API

To create an APK unpacker, you may use the IApkUnit.createGenericUnpacker() method. (To retrieve an APK unit from a JEB project, use the project’s findUnit method, or any other IUnit search related method — please refer to sample scripts for example).

Limitations

The unpacker will not be able to handle all cases. Please report any problem or bug you are encountering, we will see if anything can be done to support most cases.

In an upcoming update, the IGenericUnpacker API will offer a way for users to write plugins in the form of dex-emulator and native-emulated hooks to do whatever is needed to perform an unpacking task that the built-in code would fail at.

Until next time!

Nicolas

  1. The unpacker was introduced in JEB 5.9; it received significant upgrades in versions 5.10, 5.11.

How To Use JEB – Auto-decrypt strings in protected binary code

This is the second entry in our series showing how to use JEB and its well-known and lesser-known features to reverse engineer malware more efficiently. Part 1 is here.

Today, we’re having a look at an interesting portion of a x86-64 Windows malware that carries encrypted strings. Those strings happen to be decrypted on the fly, the first time they’re required by some calling routine.

SHA256: 056cba26f07ab6eebca61a7921163229a3469da32c81be93c7ee35ddec6260f1. The file is not packed, it was compiled for Intel x86 64-bit processors, using an unknown version of Visual Studio. The file is dropped by another malware and its purpose is reconnaissance and information gathering. Let’s load it in JEB 5.8 and do a standard analysis (default settings).

Initial decompilations

For the sake of showing what mechanism is at play, we’re first looking at sub_1400011F0. Let’s decompile it by pressing the TAB key (menu: Action, Decompile…).

Raw decompilation of sub_1400011F0, before examining its callees.

Then, let’s decompile the callee sub_140001120.

JEB can now thoroughly look at the routine and refines the initial prototype that was applied earlier, when the caller sub_1400011F0 was decompiled. It is now set to: void(LPSTR).

The code itself is a wrapper around CreateProcess; it executes the command line provided as argument.

sub_140001120 executes a command-line with CreateProcess. Note the refined prototype, void(LPSTR).

Press escape to navigate back to the caller, or alternatively, examine the callers by pressing X (menu: Action, Cross-references…) and select sub_1400011F0. You will notice that JEB is now warning us that the decompilation is “stale”.

The initial decompilation of sub_1400011F0 is stale after the decompilation of sub_140001120 yielded a better prototype.

Second decompilation

The reason is that the prototype of sub_140001120 was refined by the second decompilation (to void(LSPTR)), and the method can be re-decompiled to a more accurate version.

Let’s redecompile it: press F5 (menu: Window, Refresh). You can see that second decompilation below. What happened to the calls to sub_140001040?

Second decompilation of sub_1400011F0, showing some decrypted strings instead of calls to sub_140001040.

String auto-decryption

Notice the following:

  • A “deobfuscation score” note was added as a method comment (refer to part 1 of the series)
  • The calls to sub_140001040 are gone, they have been replaced by dark-pink strings

JEB also notified us in the console:

Notifications about decrypted strings replace in decompiled code.

Dark-pink strings represent synthetic strings not present in the binary itself. Here, they are the result of JEB auto-decrypting buffers by emulating the calls to routine sub_140001040, which was identified as a string provider. Indeed, the decompilation of sub_140001120 helped, since the inferred parameter LPSTR was back-propagated to the callers, which in that case, was the return value of sub_140001040.

Auto-decryption can be very handy. In the case of this malware, we can immediately see what will be executed by CreateProcess: shells executing whoami and dir and redirecting outputs to files in the local folder. However, if necessary, this feature can be disabled via the “Decryptor Options” in the decompiler properties:

  • Menu: Options, Back-end properties… to globally disable this in the future, except for your current project
  • Menu: Options, Specific Project properties… for the current project only
  • Or you may simply redecompile the method with CTRL+TAB (menu: Action, Decompile with options…) and disable string decryptor for specific code
The string auto-decryptor may be enabled or disabled in the options

The decryptor routine

What is sub_140001040 anyway? Let’s navigate to the routine in the disassembly and decompile it.

A raw decompilation of the decryptor code, sub_140001040

After examination of the code, we can adjust things slightly:

  • The global gvar_140022090 is an array of PCHAR (double-click on the item; rename it with N; change the type to a PCHAR using Y; create an array from that using the * key).
  • The prototype is really PCHAR(int), we can adjust that with Y.
  • The first byte of an entry into encrypted_strings is the number of encrypted bytes remaining in the string; if 0, it is fully decrypted and subsequent calls will not attempt to decrypt bytes again.
  • The key variable is v3 is the key; let’s rename it with N. Note that the key at (i) is the sum of the previous two keys used by indices (i-1), (i-2); the initial tuple is (0, 1). This looks like a Fibonacci sequence.1
The decryptor (sub_140001040) after analysis.

Comparison with GHIDRA

For comparison sake, here are GHIDRA 11 decompilations.

The caller (sub_1400011F0) decompiled by GHIDRA 11.0.
The decryptor (sub_140001040) decompiled by GHIDRA 11.0.
The CreateProcess wrapper (sub_140001120) decompiled by GHIDRA 11.0. Notice that the low-level structure initialization code adds quite a bit of confusion.

Conclusion

JEB decompilers2 do their best to clean-up and restore code, and that includes decrypting strings when it is deemed reasonable and safe.

That concludes our second entry in this “How to use JEB” series. In the next episodes, we will look at other features and how to write interesting IR and AST plugins to help us further deobfuscate and beautify decompiled code.

As always, thank you for your support, and happy new year 2024 to All 😊 – Nicolas

  1. Interestingly, the JEB assistant (call it with the BACKTICK key, or menu: Action, Request Assistant…) would like to rename this method to “fibonacci_sequence“! Not quite it, but that’s a relevant hint!)
  2. Note the plural: dexdec – the Dex decompiler – has had string auto-decryption via emulation for a while; its users are well-accustomed to seeing dark-pink strings in deobfuscated code!

How To Use JEB – Analyze an obfuscated win32 crypto clipper

We’re kicking off a malware analysis series explaining how to use JEB Decompiler to perform reverse engineering tasks ranging from out-of-the-box actions to complex use cases requiring scripts or custom plugins.

In this first entry, we look at a Windows malware compiled for x86 32-bit targets. The malware is an Ethereum cryptocurrency stealer. It monitors and intercepts clipboard activity to find and replace wallet addresses by an address of its own — presumably, one controlled by the malware authors to collect stolen ether.

Quick look at the malware

The file has a size of 81Kb, is compiled for x86 platforms. Although it does not appear to be packed, most metadata elements of the PE header were scraped. There is no rich data or timestamp.

SHA256: 503b2dc50262be583633db7b52dca9bcadc698413270047c209818436196c987

Quick look at the file in Hiew

If you are familiar with JEB, its terminology, and the organization of its UI elements, you may skip the next section and go directly to “Examining the code”.

Opening the file in JEB

Let’s fire up JEB. Any recent build (5.7+) with the x86 analysis modules and decompiler will do, i.e. JEB Community Edition or JEB Pro.

We open the file and keep the default settings
A view of the GUI after the initial analysis (from top-left, clockwise: project explorer, main workspace, and code hierarchy)

Project and units

The top-left view shows the project, along with a single artifact (the input file) and the analysis units created by JEB:

  • The artifact file has a blue-round icon
  • The top-level unit is a winpe unit
  • It has one child unit at the moment, named “x86 image”, of type x86.

The bottom-left view shows a list of code routines resulting from the analysis of the file.

Disassembly

By default, the main panel shows the disassembly window.

You may press the SPACE bar to switch to a graph view of the code (menu: Action, Graph…). In the graph view, only a single method is rendered at a time.

CFG (control flow graph) view of a disassembled routine

PE unit

If you wish to have a look at the PE file in more details, open the winpe unit. Double-click the corresponding node in the project hierarchy.

View of a winpe unit’s “Overview” fragment

The winpe unit view provides several information, organized in fragments that can be seen below the unit view: Description, Hex Dump, Overview (the default fragment), Sections, Directory Entries, Symbols, etc.

Note that if the PE had not been stripped, we would probably see a compilation timestamp as well as additional sub-units detailing the Rich Header data. For Windows executables, that data is important to perform fine-grained compiler identification.

The Symbols tab lists all symbols advertised by the PE, including imported and exported routines. For example, if you filter on “clip”, you can see multiple win32 routines relating to clipboard access, such as OpenClipboard or SetClipboardData:

The Symbols fragment of the winpe unit view, with a filter applied (“clip”)

Examining the code

Let’s go back to the disassembly offered by the x86 unit. First, notice that the code hierarchy view does not seem to contain well-known methods (static code), typically standard library routines linked at compile-time.

Let’s see why by looking at which siglibs (signature libraries) were applied during the initial analysis (menu: Native, Signature Libraries…). It looks like none were loaded:

The Signatures Libraries dialog

Library code identification

Normally, when JEB performs the initial auto-analysis of the code, compiler identification is used to determine whether well-known signature libraries of static code (siglibs) should be loaded and applied to the binary. In this case, compiler identification failed because all header data had been discarded. JEB decided to not load and apply signatures.

To apply them manually, tick the “MSVC x86” boxes. (An alternative is to let JEB know that the file was compiled with MSVC before the analysis starts: when opening the artifact, when the Options panel is displayed, the user may decide to force the compiler to a set-value.)

Forcing a compiler setting before the initial analysis

After doing either of the above ((a) file re-analysis with a compiler identification pre-set; or (b) manual siglibs application), several methods are identified as MSVC code:

Light-blue areas mean the code was matched against well-known signatures

Entry-point and WinMain

Navigate to the executable entry-point (menu: Native, Go to entry-point…).

In the general case, the entry-point of a Windows PE compiled with MSVC is not the high-level entry-point that will contain meaningful code. Although it is relatively easy to find WinMain with a bit of experience, there is a JEB script to help you as well, FindMain.py (available in the samples-script folder, also available on GitHub). Open up the script selector with F2 (menu: File, Scripts, Script selector…).

Run a JEB Python script inside the GUI client

Select the desired script and execute it. The result is displayed in the console:

...
Found high-level entry-point at 0x401175 (branched from 0x401D38)
Renaming entry-point to 'winmain'
...

The code at 0x401175 was auto-renamed to winmain (menu: Action, Rename…).

Initial decompilation

Let’s decompile that method by pressing the TAB key (menu: Action, Decompile…).

Initial decompilation of WinMain

Two items of interest to note at this point:

  • There is lots of code that appears to be junk or garbage
  • There is a note about some “deobfuscation score”

Junk code

The decompiled WinMain method is about 300 lines of C code. A lot of it are assignments writing to program globals. At first glance, it looks like it could be some sort of obfuscation. Let’s look at the corresponding assembly code:

Press TAB to go back from a decompilation to the closest matching machine code disassembly line

The snippets have the following structure:
push GARBAGE / pop dword [gXXX]

Or that, assuming edi is callee-saved:
mov edi, gXXX / ... / mov dword [edi+offset], GARBABE

Later on, we will see how to remove this clutter to make the analysis more pleasant.

Deobfuscation score

A note “deobfuscation score: 6” was inserted as a method comment. That score indicates that some “advanced” clean-up was performed. In this case, a careful examination (as well as a comparison against a decompilation with UNSAFE optimizers turned off, which you can do by redecompiling the method with CTRL+TAB (menu: Action, Decompile with Options…)) will point to this area of code:

The opaque predicate calculation is highlighted in green using CTRL+M (menu: Action, Toggle Highlight…)

This predicate looks like the following: if(X*(X+1) % 2 == 0) goto LABEL.

With X being an integer, X*(X+1) is always even. Therefore, the predicate will always evaluate to true. JEB cleaned this up automatically. (While this particular predicate is trivial, truly opaque predicates will also be attempted to be broken up by JEB, using the Z3 SMT solver.)

Comparison with GHIDRA

For a point of comparison, you may have a look at the same method decompiled by GHIDRA 10.4 here (default settings were used, just like we did with JEB). The predicate is not cleaned-up adequately, extra control-flow edges are left over, leading to AST structuring confusion.

Cleaning up the code

Let’s start with decluttering this code. First of all, why couldn’t the decompiler clean it up on its own? If the globals written to are never read with meaningful intent, then they could be discarded.

The issue is that this is very hard to ensure in the general case. However, in specific cases, sometimes involving manual review, some global written-to memory range may be deemed useless, as it is the case here. How do we provide this information to the decompiler? Well, as of version 5.7, we cannot! 1 What we can do though is write a decompiler plugin to clean-up the offending IR, and in the process, generate clean(er) code.

IR cleaner plugin

The decompiler accept several types of plugins, including IR Optimizers (they work on the Intermediate Representation of a routine, as it moves up the decompilation pipeline), and AST optimizers (to clean-up or reformat the generated abstract syntax tree of the pseudo-code). In most cases, IR optimizers are well-suited to perform code clean-up or deobfuscation tasks (refer to this blog post for a detailed comparison).

We will write the plugin in Java (we could also write it in Python). It will do the following:

  • Examine each IR statement of a CFG
  • Check if the statement is writing an immediate to some global array: *(array + offset) = value
  • If so, check the array name. If it starts with the prefix “garbage”, consider the statement useless and replace it by a Nop statement

Writing IR plugins is out-of-scope in this post; we will go over that in details in a future entry. In the meantime, you can download the plugin code here. Dump the Java file in your JEB’s coreplugins/scripts/ folder. There is no need to close and re-open JEB; it will be picked up at the next decompilation.

public class GarbageCleaner extends AbstractEOptimizer {

	@Override
	public int perform() {
		int cnt = 0;

		for (BasicBlock<IEStatement> b : cfg) {
			for (int i = 0; i < b.size(); i++) {
				IEStatement stm = b.get(i);
				if (stm instanceof IEAssign && stm.asAssign().getDstOperand() instanceof IEMem
						&& stm.asAssign().getSrcOperand() instanceof IEImm) {
					IEMem dst = stm.asAssign().getDstOperand().asMem();
					IEGeneric e = dst.getReference();
					// [xxx + offset] = immediate
					if (e.isOperation(OperationType.ADD)) {
						IEOperation op = e.asOperation();
						if (op.getOperand1().isVar() && op.getOperand2().isImm()) {
							IEVar v = op.getOperand1().asVar();
							IEImm off = op.getOperand2().asImm();
							if (v.isGlobalReference()) {
								long addr = v.getAddress();
								INativeContinuousItem item = ectx.getNativeContext().getNativeItemAt(addr);
								// logger.info("FOUND ITEM %s", item.getName());
								if (item != null && item.getName().startsWith("garbage")) {
									long itemsize = item.getMemorySize();
									if (off.canReadAsLong() && off.getValueAsLong() + dst.getBitsize() / 8 < itemsize) {
										logger.info("FOUND GARBAGE CODE");
										b.set(i, ectx.createNop(stm));
										cnt++;
									}
								}
							}
						}
					}
				}
			}
		}

		if (cnt > 0) {
			cfg.invalidateDataFlowAnalysis();
		}
		return cnt;
	}
}

Note that by design, the plugin is not specific to this malware. We will be able to re-use it in future analyses: all global arrays prefixed with “garbage” will be treated by the decompiler as junk recipients, and cleaned-up accordingly!

Defining the garbage array

At this point, we need to determine where that array is. Some examination of the code leads to the following boundaries (roughly): start at 0x41597E, spans over 0x100 bytes. Navigate to the disassembly; create an array using the STAR key (menu: Native, Create/Edit Array…); specify its characteristics.

Creating a global array of 0x100 bytes. This is the garbage array.

As soon as the array is created, the disassembly will change to what can be seen below. At the same time, the decompilations using that array will be invalidated; that is the case for WinMain. You may see that another extra-comment was added by the decompiler: “Stale decompilation – Refresh this view to re-decompile this code”. Such decompilations are read-only until a new one is generated.

The array is now created. The decompilation of WinMain becomes stale.

Before redecompiling, remember we need to rename our array with a label starting with “garbage”. Set the caret on the array, hit the key N (menu: Actions, Rename…) and set your new name, e.g., garbageArray1.

Now you may go back to the decompilation view of WinMain and hit F5 (menu: Windows, Refresh…) to regenerate a decompilation.

Decompiled WinMain after the garbage array-assigns were cleaned-up by the plugin

The code above is much nicer to look at – and much easier to work on!

Quick analysis

The method at 0x401000, called by WinMain, is decrypting the thief’s wallet address, and generating two hexstring versions of it (ascii and unicode).

Decrypting the target wallet address. The decompilation is shown after proper types were applied on the data structures accessed (encrypted wallet address, hexstrings, etc.) and better names given to those vars

The loop in WinMain is doing the following:

  • Every second, it queries the Windows clipboard with OpenClipboard
  • It checks if it contains text strings or unicode strings
  • If the string is 42 characters in length and starts with “0x”, it proceeds (an Ethereum wallet address is 20 bytes, therefore its hexadecimal representation would be 40 characters)
  • It checks if the string is not the attacker’s wallet address
  • If not, it replaces the contents of the clipboard data by the attacker’s wallet address using SetClipboardData
  • Finally, the other contents found in the clipboard is discarded

Well-known literals

In JEB, you may replace immediates by well-known literals found in type libraries (aka typelibs, such as the win32 typelibs, which were automatically loaded when the analysis of the PE file started). To do that, select the immediate, then hit CTRL+N (menu: Action, Replace…), and select the desired literal 2

For example, per the MSDN, GetClipboardData uses CF_xxx constants to indicate the type of data. We can ask JEB to replace GetClipboardData(13) by GetClipboardData(CF_UNICODETEXT) using the Action/Replace handler:

Replacing 13 by CF_UNICODE in a call to GetClipboardData

Conclusion

That concludes the first blog in this “How to use JEB” series. In the next episodes, we will look at other features, dig deeper into writing IR plugins, look into types and types creation, and reverse other architectures, including exotic code.

To learn more, we encourage you to:

  • Explore this blog, as it contains many technical entries and how-to’s.
  • Look at the sample code (scripts and plugins) shipping with JEB, it will get you started on using the API to write your own extensions.
  • Join our Slack channel to engage with other users in the community and ask questions if you’re stuck on anything.

Thank you very much & Stay tuned 🙂 Happy Holiday to All 🎄

  1. The plugin written to analyze this malware may ship in some upcoming version of JEB.
  2. In many cases, JEB will do that automatically, and it should be the case here.

JEB Assistant

Update: With JEB 5.6, several restrictions are lifted to make the Assistant available for Java decompiled output generated by dexdec (it is currently limited to C output generated by gendec).

Starting from JEB 5.2, you may use the experimental “JEB Assistant” to infer names for decompiled methods and method parameters.

Below is a decompiled aarch64 routine found in the BPFDoor malware. A raw decompilation does not produce any useful name (the default routine name is sub_40157C).

An unnamed arm64 decompiled routine

You may click the “Call the Assistant” button (also available via the Action menu, Request Assistant handler, or the back-tick keyboard shortcut) to query the assistant via JEB.IO. At the time of writing, a JEB.IO account is not required to access the assistant.

Upon first request, a disclaimer will be shown, letting you know that the decompiled code must be sent to our server:

The disclaimer is shown the first time the assistant is called

The assistant may return a better name for the method and its parameters. Sometimes, the names may be incorrect, yet provide some insight into what the method is doing. Other times, they may be entirely out of scope! It is always better to take the provided results as hints, rather than absolute truths.

In the case of our mysterious method, the assistant did provide valuable information: decryptData(data, size, key). Indeed, the method is a decryption function — more specifically, rc4 with a pre-computed sbox. The parameter names are (almost) correct.

You may decide to apply the suggested method name directly. The suggested parameter names are not applied automatically.

The assistant is providing the suggestions, it is up to the user to apply them

This feature is experimental. Currently, several limitations apply:

  • The assistant is limited to decompiled native routines. It will not work for dex/dalvik decompilations. The assistant works with routines as well as decompiled classes.
  • The assistant will refuse to work on overly long routines (whose decompilation exceeds several thousand characters).
  • The assistant is not available via the JEB API and requests are rate-limited (at most one every 5 seconds).

On the plus side, a JEB.IO account is not required at this time to use the assistant! Anybody can use it to (sometimes) gain insight into obscure decompilations. We hope it will help you in your reverse-engineering efforts. Please let us know your feedback through the usual channels (email, Slack, etc.).

Until next time 🙂 — Nicolas.