Updated on March 11.
A note about 2020 Q1 updates (versions 3.10 to 3.16) regarding the DEX/Dalvik decompiler modules:
- Generic String Decryption
- Lambda Recovery
- Unreflecting Code
- Decompiling Java Bytecode
- Auto-Rename All
Generic String Decryption
JEB ships with a generic deobfuscator that can perform on-the-fly string decryption and other complex optimizations. Although this optimizer performs safe (i.e., guaranteed) optimizations in most cases, it is unsafe in the general case case and therefore, may be disabled in the options. Refer to the Engines options .parsers.dcmp_dex.EnableDeobfuscators and .parsers.dcmp_dex.EmulationSupport.
Many code protectors offer options to replace immediate string constants by method invocations that perform on-the-fly decryption.
A variety of techniques exist, ranging from simple one-off trivial decryptor methods, to complex schemes involving object(s) creation, complicated decryptors injected in third-party packages, non-trivial logic, junk code meant to slow down analyzers, use of opaque predicates, etc. They are implemented in an infinite number of ways. JEB’s generic deobfuscator can perform quick, safe emulation of the intermediate representation to provide a replacement. It may sometimes fail or bail out due to several reasons, such as performance or pitfalls like anti-emulation and anti-sandboxing techniques.
Example 1
The above code (blue box) ends up being deobfuscated to:
Example 2:
The above code is deobfuscated to:
Below, a decryptor that had been injected into the com.google.gson.Gson() class:
Example 3:
One last example, which was involuntarily – yet, quite timely! – provided by a user:
Decrypting all strings: The decryptor kicks in when decompiling methods only. At the moment, if a string happens to be successfully decrypted, the optimizer does not attempt to recover all similarly encrypted strings in the code, although it is most certainly an addition that will make it in a future software update.
Rendering: You may quickly identify decrypted strings in the client as they are rendered using a special color associated with the itemId STRING_GENERATED, by default rendered in a flashy pink color in light and dark themes. Hovering over such items will bring up a pop-up with additional origin information, like the underlying code that would have generated that string:
API:
– From a DEX perspective: Generated strings are artificial. Therefore, IDexString.isArtificial()
would return true.
– From a Java/AST perspective: IJavaConstant objects that embed origin information do so using the “origin” tag. Use IJavaConstant.getTags().get("origin")
to retrieve it.
Lambda Recovery
JEB attempts to perform Java 8 style lambda recovery and reconstruction.
Desugared Lambdas
Recovery and reconstruction does not rely on any type of metadata 1, such as special prefixes -$$Lambda$
for classes and methods implementing desugared lambdas in dex 37-.
You may therefore see constructs like this:
Options: Lambda reconstruction can be disabled in the options (Edit, Options, Engines, …). Lambda rendering can also be disabled in the options, as well as on-demand by right-clicking a decompiled view, Rendering Options….
API Note: In the above cases, the underlying Java AST may be a IJavaNew or IJavaStaticField node. This is not the case for real (not desugared) lambdas, which map to an IJavaCall node – see below.
Real Lambdas
Lambda reconstruction also takes place when the code has not been desugared (which is rare!), i.e. code relying on dex38’s invoke-custom and invoke-polymorphic.
API Note: Such lambdas map to an IJavaCall node for which isCustomCall() will return true.
Unreflecting Code
Many code protectors make heavy use of reflection – combined with string encryption, as we’ll see below – to obfuscate code. In practice, reflection is limited to method invocation (static and virtual), static and non-static field setting and getting, and new instance creation. A few examples:
v = Class.forName("java.lang.Integer").getMethod("valueOf", String.class).invoke(null, str); // instead of v = Integer.valueOf(str);
Class.forName("SomeClassName").getField("b").setInt(x, 4); // instead of x.b = 4;
Class.forName("java.lang.String").getConstructor(byte[].class) .newInstance(val); // instead of new String(arg6);
Such code is generally protected by a catch-all handler that forwards the cause of any exception raised by a reflection issue:
try { // ... } catch(Throwable e) { throw e.getCause(); }
By default, JEB will attempt to unreflect code. This deobfuscator is potentially unsafe and may be disabled in the options. Note that you always have the ability to choose, for a particular decompilation, whether some options should be temporarily enabled or disabled, by pressing CTRL+TAB (or COMMAND+TAB on macOS) to decompile (same as menu Action, Decompile with options…).
So, in a nutshell, code normally decompiled to:
will be decompiled to:
Technical Note: This optimizer works on the Intermediate Representation manipulated by the decompiler, not to be confused with the AST rendered as its output. (The AST cleaner that was described in an older post is more limited than this IR optimizer.)
Last-step failures: Successfully unreflecting code eventually depends on being able to find the intended target method or field matching the provided description (method parameter types or field type). Failure to do so will generate a log like "A candidate field/method/constructor for unreflection was not found"
.
Decompiling Java Bytecode
JEB supports JLS bytecode decompilation for *.class files and jar-like archives (jar, war, ear, etc.). The Java bytecode is converted to Dalvik using Android’s dx by default. Users may choose to use d8 (not recommended for now) instead by selecting so in the Options.
The resulting DEX file(s) are processed as usual.
You may use this to decompile Android Library files (*.aar files) in JEB.
Auto-Rename All
JEB 3.13 introduced a new generic action, Auto-Rename All. Its implementation is at the discretion of code plugins. The DEX plugin implements it, therefore users may execute Action, Auto-Rename All… at any time (generally after processing an obfuscated file) in order to rename code items such as field, method, or class names, to something more easily processable for our -limited- human brains.
Look at this horrendous obfuscation scheme below. It’s using right-to-left unicode characters to seriously mess up rendering:
Let’s run Action, Auto-Rename All… on this file:
As usual, feel free to join us on Slack, message us on Twitter, or email us privately at support@pnfsoftware.com.
Until next time!
–
- Relying on metadata leads to false negatives in the best case – e.g., when the code has been minified by something like ProGuard; it leads to false positives in the worst case – e.g. forged metadata to incite the decompiler to generate inaccurate or wrong code. ↩