Ethereum Smart Contract Decompiler

Update: March 8, 2022:
– The most up-to-date version of this document can be found in the Manual 
Update: Dec 8, 2021:
– Reference section with list of special translations for EVM opcodes
Update: Jan 2, 2019:

– The full EVM decompiler shipped with JEB 3.0-beta.8
– Download a sample JEB Python script showing how to use the API
Update: Nov 20, 2018:
– We uploaded the decompiled code of an interested contract, the second part of the PolySwarm challenge (a good write-up can be found here)

We’re excited to announce that the pre-release of our Ethereum smart contract decompiler is available. We hope that it will become a tool of choice for security auditors, vulnerability researchers, and reverse engineers examining opaque smart contracts running on Ethereum platforms.

TL;DR: Download the demo build and start reversing contracts

Keep on reading to learn about the current features of the decompiler; how to use it and understand its output; its current limitations, and planned additions.

This opaque multisig wallet is holding more than USD $22 million as of 10/26/2018 (on mainnet, address 0x3DBB3E8A5B1E9DF522A441FFB8464F57798714B1)

Overall decompiler features

The decompiler modules provide the following specific capabilities:

    • The EVM decompiler takes compiled smart contract EVM 1 code as input, and decompiles them to Solidity-like source code.
    • The initial EVM code analysis passes determine contract’s public and private methods, including implementations of public methods synthetically generated by compilers.
    • Code analysis attempts to determine method and event names and prototypes, without access to an ABI.
  • The decompiler also attempts to recover various high-level constructs, including:
      • Implementations of well-known interfaces, such as ERC20 for standard tokens, ERC721 for non-fungible tokens, MultiSigWallet contracts, etc.
      • Storage variables and types
    • High-level Solidity artifacts and idioms, including:
        • Function mutability attributes
      • Function payability state
      • Event emission, including event name
      • Invocations of address.send() or address.transfer()
      • Precompiled contracts invocations

On top of the above, the JEB back-end and client platform provide the following standard functionality:

    • The decompiler uses JEB’s optimizations pipeline to produce high-level clean code.
    • It uses JEB code analysis core features, and therefore permits: code refactoring (eg, consistently renaming methods or fields), commenting and annotating, navigating (eg, cross references), typing, graphing, etc.
    • Users have access to the intermediate-level IR representation as well as high-level AST representations though the JEB API.
  • More generally, the API allows power-users to write extensions, ranging from simple scripts in Python to complex plugins in Java.

Our Ethereum modules were tested on thousands of smart contracts active on Ethereum mainnet and testnets.

Basic usage

Open a contract via the “File, Download Ethereum Contract…” menu entry.

You will be offered two options:

  • Open a binary file already stored on disk
  • Download 2 and open a contract from one of the principal Ethereum networks: mainnet, rinkeby, ropsten, or kovan:
    • Select the network
    • Provide the contract 20-byte address
    • Click Download and select a file destination
Open a contract via the File, Open smart contract menu entry

Note that to be recognized as EVM code, a file must:

  • either have a “.evm-bytecode” extension: in this case, the file may contain binary or hex-encoded code;
  • or have be a “.runtime” or “.bin-runtime” extension (as generated by the solc Solidity compiler), and contain hex-encoded Solidity generated code.

If you are opening raw files, we recommend you append the “.evm-extension” to them in order to guarantee that they will be processed as EVM contract code.

Contract Processing

JEB will process your contract file and generate a DecompiledContract class item to represent it:

The Assembly view on the right panel shows the processed code.

To switch to the decompiled view, select the “Decompiled Contract” node in the Hierarchy view, and press TAB (or right-click, Decompile).

Right-click on items to bring up context menus showing the principal commands and shortcuts.
The decompiled view of a contract.

The decompiled contract is rendered in Solidity-like code: it is mostly Solidity code, but not entirely; constructs that are illegal in Solidity are used throughout the code to represent instructions that the decompiler could not represent otherwise. Examples include: low-level statements representing some low-level EVM instructions, memory accesses, or very rarely, goto statements. Do not expect a DecompiledContract to be easily recompiled.

Code views

You may adjust the View panels to have side-by-side views if you wish to navigate the assembly and high-level code at the same time.

  • In the assembly view, within a routine, press Space to visualize its control flow graph.
  • To navigate from assembly to source, and back, press the TAB key. The caret will be positioned on the closest matching instruction.
Side by side views: assembly and source

Contract information

In the Project Explorer panel, double click the contract node (the node with the official Ethereum Foundation logo), and then select the Description tab in the opened view to see interesting information about the processed contract, such as:

  • The detected compiler and/or its version (currently supported are variants of Solidity and Vyper compilers).
  • The list of detected routines (private and public, with their hashes).
  • The Swarm hash of the metadata file, if any.
The contract was identified as being compiled with Solidity <= 0.4.21

Commands

The usual commands can be used to refactor and annotate the assembly or decompiled code. You will find the exhaustive list in the Action and Native menus. Here are basic commands:

  • Rename items (methods, variables, globals, …) using the N key
  • Navigate the code by examining cross-references, using the X key (eg, find all callers of a method and jump to one of them)
  • Comment using the Slash key
  • As said earlier, the TAB key is useful to navigate back and forth from the low-level EVM code to high-level decompiled code

We recommend you to browser the general user manual to get up to speed on how to use JEB.

Rename an item (eg, a variable) by pressing the N key

Remember that you can change immediate number bases and rendering by using the B key. In the example below, you can see a couple of strings present in the bad Fomo3D contract, initially rendered in Hex:

All immediates are rendered as hex-strings by default.
Use the B key to cycle through base (10, 16, etc.) and rendering (number, ascii)

Understanding decompiled contracts

This section highlights idioms you will encounter throughout decompiled pseudo-Solidity code. The examples below show the JEB UI Client with an assembly on the left side, and high level decompiled code on the right side. The contracts used as examples are live contracts currently active Ethereum mainnet.

We also highlight current limitations and planned additions.

Dispatcher and public functions

The entry-point function of a contract, at address 0, is generally its dispatcher. It is named start() by JEB, and in most cases will consist in an if-statement comparing the input calldata hash (the first 4 bytes) to pre-calculated hashes, to determine which routine is to be executed.

  • JEB attempts to determine public method names by using a hash dictionary (currently containing more than 140,000 entries).
  • Contracts compiled by Solidity generally use synthetic (compiler generated) methods as bridges between public routines, that use the public Ethereum ABI, and internal routines, using a compiler-specific ABI. Those routines are identified as well and, if their corresponding public method was named, will be assigned a similar name __impl_{PUBLIC_NAME}.

NOTE/PLANNED ADDITION: currently, JEB does not attempt to process input data of public routines and massage it back into an explicit prototype with regular variables. Therefore, you will see low-level access to CALLDATA bytes within public methods.

A dispatcher.

Below, see the public method collectToken(), which is retrieving its first parameter – a 20 byte address – from the calldata.

A public method reading its arguments from CALLDATA bytes.

Interface discovery

At the time of writing, implementation of the following interfaces can be detected: ERC20, ERC165, ERC721, ERC721TokenReceiver, ERC721Metadata, ERC721Enumerable, ERC820, ERC223, ERC777, TokenFallback used by ERC223/ERC777 interfaces, as well as the common MultiSigWallet interface.

Eg, the contract below was identified as an ERC20 token implementation:

This contract implements all methods specified by the ERC20 interface.

Function attributes

JEB does its best to retrieve:

  • low-level state mutability attributes (pure, read-only, read-write)
  • the high-level Solidity ‘payable’ attribute, reserved for public methods

Explicitly non-payable functions have lower-level synthetic stubs that verify that no Ether is being received. They REVERT if it is is the case. If JEB decides to remove this stub, the function will always have an inline comment /* non payable */ to avoid any ambiguity.

The contract below shows two public methods, one has a default mutability state (non-payable); the other one is payable. (Note that the hash 0xFF03AD56 was not resolved, therefore the name of the method is unknown and was set to sub_AF; you may also see a call to the collect()’s bridge function __impl_collect(), as was mentioned in the previous section).

Two public methods, one is payable, the other is not and will revert if it receives Ether.

Storage variables

The pre-release decompiler ships with a limited storage reconstructor module.

  • Accesses to primitives (int8 to int256, uint8 to uint256) is reconstructed in most cases
  • Packed small primitives in storage words are extracted (eg, a 256-bit storage word containing 2x uint8 and 1x int32, and accessed as such throughout the code, will yield 3 contract variables, as one would expect to see in a Solidity contract
Four primitive storage variables were reconstructed.

However, currently, accesses to complex storage variables, such as mappings, mappings of mappings, mappings of structures, etc. are not simplified. This limitation will be addressed in the full release.

When a storage variable is not resolved, you will see simple “storage[…]” assignments, such as:

Unresolved storage assignment, here, to a mapping.

Due to how storage on Ethereum is designed (a key-value store of uint256 to uint256), Solidity internally uses a two-or-more indirection level for computing actual storage keys. Those low-level storage keys depend on the position of the high level storage variables. The KECCAK256 opcode is used to calculate intermediate and final keys. We will detail this mechanism in detail in a future blog post.

Precompiled contracts

Ethereum defines four pre-compiled contracts at addresses 1, 2, 3, 4. (Other addresses (5-8) are being reserved for additional pre-compiled contracts, but this is still at the ERC stage.)

JEB identifies CALLs that will eventually lead to pre-compiled code execution, and marks them as such in decompiled code: call_{specific}.

The example below shows the __impl_Receive (named recovered) method of the 34C3 CTF contract, which calls into address #2, a pre-compiled contract providing a fast implementation of SHA-256.

This contract calls address 2 to calculate the SHA-256 of a binary blob.

Ether send()

Solidity’s send can be translated into a lower-level call with a standard gas stipend and zero parameters. It is essentially used to send Ether to a contract through the target contract fallback function.

NOTE: Currently, JEB renders them as send(address, amount) instead of address.send(amount)

The contract below is live on mainnet. It is a simple forwarder, that does not store ether: it forwards the received amount to another contract.

This contract makes use of address.send(…) to send Ether

Ether transfer()

Solidity’s transfer is an even higher-level variant of send that checks and REVERTs with data if CALL failed. JEB identifies those calls as well.

NOTE: Currently, JEB renders them as transfer(address, amount) instead of address.transfer(amount)

This contract makes use of address.transfer(…) to send Ether

Event emission

JEB attempts to partially reconstruct LOGx (x in 1..4) opcodes back into high-level Solidity “emit Event(…)”. The event name is resolved by reversing the Event method prototype hash. At the time of writing, our dictionary contains more than 20,000 entries.

If JEB cannot reverse a LOGx instruction, or if LOG0 is used, then a lower-level log(…) call will be used.

NOTE: currently, the event parameters are not processed; therefore, the emit construct used in the decompiled code has the following form: emit Event(memory, size[, topic2[, topic3[, topic4]]]). topic1 is always used to store the event prototype hash.

An Invocation of LOG4 reversed to an “emit Deposit(…)” event emission

API

JEB API allows automation of complex or repetitive tasks. Back-end plugins or complex scripts can be written in Python or Java. The API update that ship with JEB 3.0-beta.6 allow users to query decompiled contract code:

  • access to the intermediate representation (IR)
  • access to the final Solidity-like representation (AST)

API use is out-of-scope here. We will provide examples either in a subsequent blog post or on our public GitHub repository.

Additional References

List of EVM opcodes that receive special translation: link (on GitHub)

Conclusion

As said in the introduction, if you are reverse engineering opaque contracts (that is, most contracts on Ethereum’s mainnet), we believe you will find JEB useful.

You may give a try to the pre-release by downloading the demo here. Please let us know your feedback: we are planning a full release before the end of the year.

As always, thank you to all our users and supporters. -Nicolas

  1. EVM: Ethereum Virtual Machine
  2. This Open plugin uses Etherscan to retrieve the contract code

Reverse Engineering WebAssembly

Note: Download a demo of JEB Decompiler here.

We published a paper deep-diving into WebAssembly from a reverse engineer point of view (wasm format, bytecode, execution environment, implementation details, etc.).

The paper annex details how JEB can be used to analyze and decompiler WebAssembly modules.

Code and decompilation view of a WebAssembly module

Thank you – Nicolas.

JEB3 Beta is available

The demo builds of JEB3 Beta are available for download on our website. The full builds are also available, however, you will need to make that demand explicit by emailing us.

What’s new in JEB3? This major release contains hundreds of changes, which can be roughly categorized as follows:

  • New desktop client. The JEB3 client is leaner and faster than the client that shipped with JEB2. It also comes with a Dark theme, supports configurable keyboard shortcuts, and easily supports multiple instances.
  • Interactive global graphs. On top of the interactive control flow graphs, JEB3 presents the user with additional smart, global graphs, such as call graphs and class graphs.
  • Improved native decompilation pipeline. A large bulk of the update as well as future trend for JEB3 is refining and opening access to our native code decompilers. We will publish several blogs regarding advanced use of decompilers, including how to use the API to customize a decompilation, write intermediate optimization passes, or even write a custom decompiler or custom analysis modules.
  • Intel x86 decompilers. JEB Pro ships with our Intel x86 32-bit decompiler and Intel x86 64-bit decompiler modules. You can already try them out in the demos.
  • Additional decompilers. We are planning to ship additional decompilers. In fact JEB3 Beta already ships with a WebAssembly decompiler. It can be used to decompile web apps or EOS smart contracts to C. We will soon provide an Ethereum decompiler as well.
  • C++ class reconstruction. The full builds will ship with experimental support for class hierarchy discovery and reconstruction of Visual Studio-compiled x86 stripped programs, as well as C++ decompilation, as was demo’ed in this YouTube video.
  • More Type Libraries. Our type library system was improved, and we generated typelibs for the following environments:
    • Android NDK on ARM 32-bit
    • Android NDK on ARM 64-bit
    • Android NDK on x86 32-bit
    • Android NDK on x86 64-bit
    • Windows win32 on Intel x86 32-bit
    • Windows win32 on Intel x86 64-bit
    • Windows win32 on ARM 32-bit
    • Windows win32 on ARM 64-bit
    • Windows DDK on Intel x86 32-bit
    • Windows DDK on Intel x86 64-bit
    • Linux glibc on Intel x86 32-bit
    • Linux glibc on ARM 32-bit
    • Linux glibc on MIPS 32-bit
  • More Signature Libraries. JEB3 ships with complete library signature sets for:
    • Android NDK libraries. Common libraries (libc, libc++, zlib, etc.) are signed from from NDK v11 up to the latest version (v17 as of 08/18).
    • Visual Studio compiled binaries. This system allows the recognition of statically linked library code in binaries compiled for x86 and x86-64 architectures.
  • Full support for Windows malware analysis. The Intel decompilers, Windows type libraries and signature libraries make JEB a great platform to analyze win32 malware or malicious kernel drivers.

If you are a registered user, you can request to be put on the early adopters list and use JEB3 right now. You may also decide to wait and automatically receive your build when it becomes publicly available for all. The release date is scheduled for the early Fall.

DEX Version 39, Dalvik and ART Opcode Overlaps, and JEB 2.3.11

JEB 2.3.11 is out We’re getting close to completion on our 2.3 branch! 1

Before we get into the matter of this blog post, a couple of noteworthy changes in terms of licensing:

  • The Android Basic builds require an active Internet connection; however, if the JEB license is current, we allow a much longer grace period before requesting a connection with our licensing server. This is to take care of scenarios where the connectivity would drop for a relatively extended period of time on either end.
  • Most interestingly, expired licenses of all types may now be used past their expiration date to reload and work on existing JDB2. New projects cannot be created with expired licenses though.

In terms of features, JEB 2.3.11 includes upgrades to our ARM64, MIPS64 and x86-64 parsers 2, as well as fixes and additions to the DEX parser. One interesting update, which prompted writing this blog post, is the support of DEX 39 opcodes.

DEX 39 Opcodes

Here they are, per the official documentation:

  • const-method-handle vAA, method_handle@BBBB
  • const-method-type vAA, proto@BBBB

Version 39 of the DEX format will be supported with the release of Android P 3. DEX 38 had been introduced to support Oreo’s new opcodes related to dynamic programming. We wrote a lengthy post about them on this very blog.

The new instructions const-method-handle and const-method-types are natural additions to retrieve method handles (basically, the same as a function pointer in C, a concept foreign to the JVM until lambdas and functional-style programming made its way into the language) and method prototypes. Those opcodes simply query into the prototypes and handles pools.

In fact, support for those two opcodes was added in JEB months ago,  right after their introduction in ART, which dates back to September 2017 (AOSP commit). Now, if you’ve been following through the Dalvik, DEX and ART intricacies, you may know that we are facing opcode overlaps:

  • The original non-optimized DEX instruction set spans from 0 to 0xFF, with undefined ranges (inclusive brackets omitted for clarity): 3E-43, 73, 79-7A, E3-FF
    • DEX 38 defines the range FA-FD (4x new invoke-xxx)
    • DEX 39 defines the range FE-FE for the aforementioned new opcodes (2x new const-method-xxx)
  • The now defunct optimized DEX (ODEX) set, predating ART, used the reserved sub-range E3-FE
  • The deadborn extended set used FF as an extension code to address 2-byte opcodes (FFxx); they were defined but unimplemented in Ice Cream Sandwich, and removed soon after in Jelly Bean.
  • Finally, ART opcodes: also used for optimizing DEX execution, those opcodes use the 73 and E3-FF ranges

ART opcodes in E3-FE are not necessarily the same as the original ODEX’s! The following table recaps the differences between ODEX and OART:

legend: red= removed in ART, orange= moved, green= added in ART

When you feed a piece of optimized DEX file to JEB, it may not know which instruction set to use. Normally, the following rules apply:

  • For stand-alone (within or outside an APK) DEX files advertising a version code less than or equal to 37, the legacy ODEX set would be used if any opcodes within that range are encountered;
  • For DEX files with version 38 or above, or that are part of an OAT ELF file, the newer ART set will be used.

However, if the determination is incorrect (eg, you are opening a stand-alone DEX 37 file using ART opcodes), you may manually specify which optimized opcodes set the Dalvik parser should use by opening the project’s settings (Edit/Options, Advanced…), and setting the property DalvikParserMode 4 to:

  • 0: legacy DEX (default value)
  • 50: ART
  • 100: DEX 38
  • 110: DEX 39
  • 1000: latest

That’s it for today’s DEX clarifications. Remember to upgrade to JEB 2.3.11. On a side-note, let us know if you’d like to be part of our group of early testers: those users receive beta builds ahead of time (eg, JEB 2.3.12-beta this week).

Thank you.

  1. A couple more updates are in the pipe before we start publishing betas of JEB 3.
  2. The x86 modules now support the newest AVX-512 instruction set, although we do not decompile it
  3. Per Google’s habits, we may expect a beta of Android P with API level 28 this Spring
  4. That property is not as accessible as we’d like; an upcoming update will clarify and improve the UX around that.

A new APK Resources Decoder with de-Obfuscation Capabilities

The latest JEB release ships with our all-new Android resources (ARSC) decoder, designed to reliably handle tweaked, obfuscated, and sometimes malformed resource files.

As it appears that optimizing resources for space (eg, the WeChat team has made their compressor/refactoring module publicly available,  etc.) or complexity (eg, commercial app protectors have been doing it for some time now) is becoming more and more commonplace, we hope that our users will come to appreciate this new module.

Here are the key points, followed by examples of what to expect from the new module.

ARSC Decoder Workflow

In terms of workflow, nothing changes: starting with JEB 2.3.10, the new Android Resources decoder module is enable by default.

If you ever need to switch back to the legacy module, simply open the Options, Advanced panel, filter on AndroidResourcesDecoderSelector and set the value to 1 (instead of 2).

ARSC Decoder Output

In terms of output, users should see improvements in at least three areas:

  • First, the module can deal with obfuscated resources and malformed files better, resulting in lower failure rates. Ideally, we’d like to get as close as possible to a 0-failure, so please report issues!
  • Second, flattened, renamed, or generally refactored resources are handled as well, and the original res/ folder will be reconstructed, resulting in a readable Resources sub-tree.
  • Finally, the module can generate an aapt2-like text output to cope with the limitations of AOSP’s aapt/aapt2 (eg, crashes); the output can be quite large, so currently, aapt2-like output generation is disabled by default. To enable it,  go to the Options, Advanced panel, filter on AndroidResourcesGenerateAapt2LikeOutput and set the value to true.  The output will be visible as an additional fragment of the APK unit view:

aapt2-like output on a file that failed aapt2

Additional Input (APK Frameworks)

By default, the latest Android framework (currently API 27) is dropped by JEB in [HOME_FOLDER]/.jeb-android-frameworks/1.apk.

If an app you are analyzing requires additional framework libraries, drop them as [package_id].apk in that folder, and you should be good to go.

Example 1: flattened resources in a banking app

Here’s a sample that demonstrates what the output looks like with an app found on VirusTotal. The app is called itsme, the apk is protected by resources refactoring (res/ folder flattening) and trimming (renaming of files, name-less resource objects, etc.).

Have a look at the APK contents:

Protected app contents

aapt2 fails on it (resource id overlap):

error: trying to add resource 'be.bmid.itsme:attr/' with ID 0x7f010001 but resource already has ID 0x7f010000.

apktool 2.3.1 cannot reconstruct the resource tree either. Resources are moved to an unknown/ folder; on non-Linux system, resources manipulation also fail due to illegal character names.

JEB does its best to rebuild the resources tree, and renames illegally named resources as well across the Resources base, consistently:

A rebuilt resources tree, originally obfuscated by GuardSquare (?)

Example 2: tweaked xml

The second file is a version of the Xapo Bitcoin wallet app 1, also found on VirusTotal. This app does not fail aapt2, however, it does fail other tools, including apktool 2.3.1

I: Using Apktool 2.3.1 on 96cbabe2fb11c78a283348b2f759dc742f18368e0d65c5d0a15aefb4e0bdc645
I: Loading resource table...
I: Decoding AndroidManifest.xml with resources...
I: Loading resource table from file: [...]/1.apk
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8601

The resources are flattened and renamed; the XML resources are oddly structured and stretch the XML specifications as well.

JEB handles things smoothly.

Conclusion

There are many more examples of “stretched” resources in APKs we’ve come across, however we cannot share them at the moment.

If you come across unsupported scenarios or bugs, feel free to issue a report, we’ll happily investigate and update the module.

  1. https://xapo.com/

Language Translation Contribution in Python; VirusTotal Hash Check Plugin in Java.

This post is geared toward power-users who would like to take advantage of API additions that shipped with the latest JEB update.1

TL;DR: see below for a language translation contribution in Python, and a VirusTotal hash check plugin in Java.

Contributions

With JEB 2.3.6, users can now write their own unit contribution plugins in Python (or Java, of course).

First, let’s recap: JEB extensions consist of back-end plugins, and front-end scripts. Front-end scripts are written in Python and execute in the context  of a client (generally, the UI client, but it could also be a script executed by a headless, command-line JEB client). Back-end plugins form a more diverse realm: they consist of parser plugins (eg, disassemblers, decompilers, decoders, etc.), generic engines plugins, and contribution plugins.  They are mostly written in Java – although that is slowly changing as we are adding program-wide support for JEB extensions in Python.

Contribution plugins can enhance the output produced by parser plugins. A concrete example: an interactive disassembly or other text output (eg, a decompiled piece of Java or C code) is made of text items; a contribution can provide additional information to a client about a given item, when the client requests it. When it comes to the main JEB UI client, that information can be requested when a user hovers its mouse over an interactive text item.

Several contributions are already built-in, such as those providing live variable and register values when debugging a program; or the Javadoc contribution that displays API documentation on Java disassembly. Users may also write their own contributions.

  • Contributions extend IUnitContribution;
  • They can target any type of unit;
  • They can be written in Java or in Python;
  • They are plugins,  and as such, should be dropped into the JEB’s coreplugins/ folder (Python contributions will need a Jython package in that folder as well);
  • A Python contribution must be named exactly like the contribution class name (in the above below, SampleContribution.py)

The skeleton of a Python contribution that would enhance all code units would look like:

class SampleContributionPlugin(IUnitContribution):

  def __init__(self):
    pass
  
  def getPluginInformation(self):
    return PluginInformation(...)

  def isTarget(self, unit):
    return isinstance(unit, ICodeUnit)

  def setPrimaryTarget(self, unit):
    self.target = unit

  def getPrimaryTarget(self):
    return self.target

  def getItemInformation(self, targetUnit, itemId, itemText):
    # provide info about an item or a bit of text

  def getLocationInformation(self, targetUnit, location):
    return None

We uploaded a sample contribution plugin that works for text documents produced by any type of parser plugin (eg, disassembly, decompiled code, etc.). The contribution uses Google to provide real-time translations of the text snippet your mouse pointer is currently on:

The translation contribution translates foreign language text items to English when the user hovers their mouse over them; here, an Arabic string found in a malware sample of Mirai is being translated.

Note that you do not need a Google API key for it to work: the plugin scrapes Google search out; as such it is quite brittle and will almost certainly break in the future, but keep in mind this is a demo/sample to get you started for your own contributions.

VirusTotal Report Plugin

On a side-note, JEB 2.3.6 also ships with a VirusTotal hash checker plugin (disabled by default). This plugin automatically checks the hash of top-level units against the VirusTotal database.

We open-sourced it on GitHub (VirusTotalReportPlugin.java).

To set it up, run File, Plugins, Execute an Engines Plugin, VT Report Plugin:

To set up the VT plugin, you will need a VT API key.

Then, enter your VirusTotal API key; you’re good to go. Newly processed files will be automatically checked against VT and a log message as well as a notification will be stored to let you know the outcome.

The notification produced by the JEB VT plugin: here, the file looks bad (28 anti-virus products marked it as such)

That’s it for today — until next time!

Introducing the JEB Malware Sharing Network

Update: Oct. 12: Python script to query the API

We are very excited to announce that JEB 2.3.6 integrates with a new project we called the Malware Sharing Network. It allows reverse engineers to share samples anonymously, in a give-and-take fashion. The more and the better you give, the more and the better you will receive.

  • Files are shared with PNF Software (they are not shared directly with other users);
  • Contributions and users are algorithmically ranked and scored;
  • In exchange for their contributions, users receive more files, based on their score.

The goal is to offer a platform for reversers that can (and wish to) share malware files to easily do it, with the added incentive of receiving samples in return — including relatively high-value files that may not be accessible to most users, such as files that are not publicly downloadable on most malware trackers; or files that are not present on malware databases at all, including VirusTotal.

Obviously, the service is entirely optional. Any user, including users of the demo version, may use it whenever they please.

Getting started

The latest JEB update will let you know about the Malware Sharing Network right after you upgrade. You may also click the Share button in the toolbar at any time to get started.

Sharing a sample is easy!

First time users should create an account. You will only need an email address and a password. Click the “Create an Account” button to sign up.

Log in to the Malware Sharing Network

Once you’ve successfully logged in, you will be able to view your profile. Things like your sharing score and other stats are displayed.

User profile and stats

Sharing a File

Any time you are working in JEB, you can decide to share the primary file being worked on by clicking the Share button or the Share entry in the File menu:

Before sharing a file, you may:

  • redact the sample name;
  • add a text comment;
  • select a Determination, among four choices (“Unknown”, “Clean”, “Unsure” and “Malicious”).

By hitting the Share button, you will submit the file to PNF Software. It will be added to our file portal, get scored, and eventually, be shared with other users who are participating in this sample exchange program.

When your score gets high enough, you will receive samples. They will be accessible from our website, and also, using the Malware Sharing Network back-end API.

API for Scripting

After successfully logging in, you may have noticed that the API key field was populated. Power-users will be able to use it to perform automation and scripting with our back-end, such as querying samples by hashes, uploading and downloading files, etc. It’s all standard HTTP-POST queries with JSON responses.

A Python wrapper to  issue simple API queries can be found on our public GitHub repository.  First make sure to set up your API key (either in source, or create an environment variable JEBIO_APIKEY, or pass it as a parameter if you are importing the script as a library).

Queries return JSON output,  except for download requests, that return binary attachments. The return “code” variable is set to 0 on success, !=0 on error.

Here are a few examples:

Query a file hash:
$ jebio.py check 42aaa93a894a69bfcbc21823b09e4ea9f723c428
42aaa93a894a69bfcbc21823b09e4ea9f723c428: {
 "code": 0,
 "created": "2017-10-09 16:24:31",
 "filesize": 75599,
 "filestatus": 0,
 "md5hash": "879322cfd1c1b3b1813a27c3e311f1a5",
 "sha1hash": "42aaa93a894a69bfcbc21823b09e4ea9f723c428",
 "sha256hash": "57ae463e6bc53a38512c58a878370338dcfe0fb59eeedfd9b3e7959fe7c149d1",
 "userdetails": {
 "comments": "",
 "created": "2017-10-09 16:24:31",
 "determination": 0,
 "filename": "Raasta.apk"
 }
}

Note: the userdetails section is present only if you uploladed the file yourself.

Upload a file:
$ jebio.py upload 1.apk
1.apk: {
 "code": 0,
 "uploadeventid": 155
}
Download a file: (subject to permission)
$ jebio.py download a2ba1bacc996b90b37a2c93089692bf5f30f1d68
a2ba1bacc996b90b37a2c93089692bf5f30f1d68: downloaded to ba1d6f317214d318b2a4e9a9663bc7ec867a6c845affecad1290fd717cc74f29.zip (password: "infected")

DEX and APK Updates in JEB 2.3.5

This post highlights changes and additions related to Android app processing that shipped with JEB 2.3.5 (and the upcoming 2.3.6 release). Per usual, consult the full changelog for a complete list of changes.

Contributions for Units

We added plugin support for unit contributions. These back-end extensions can be written in Python! Practically, contributions for text documents (eg, disassembly) take the form of pop-ups when the user hovers the mouse over a text item. Several JEB modules already ship with contributions, eg the Live Registers view of the jdb/gdb/lldb debbuggers plugins.

With JEB 2.3.6, users may write their own contribution in Java or Python. They extend the IUnitContribution interface and are fairly straightforward to implement. (We will upload an example of a cross-unit contribution written in Python on GitHub shortly.)

JEB 2.3.5 ships with a Javadoc contribution, whose immediate use can be seen in the Dalvik disassembly view of an APK: hover over an interactive code item to display its documentation. (The plugin works whether your system is connected to the Internet or not.)

The javadoc contribution kicks-in when hovering on a type name or method name, here, newWakeLock().

DEX Header Summary

The DEX disassembly view now starts with a comment header summarizing the principal features of the bytecode, and optionally, its containing application (APK) unit.

Basic information is identified, such as package names, application details (if there is one1), activities and other end-point classes, as well as dangerous permission groups.

Various APK and DEX features of a known Android malware; notice that some phone and text permissions are requested by the app.

This legitimate APK is not an application, and the disassembly header emphasizes this fact.

Full Field and Method Refactoring

Up until JEB 2.3.4, renaming fields and methods only renamed the directly accessed field/method reference. We now support renaming “related” references as well, to cover cases like method overrides or “out-of-class” field access.

Here is a simple example with fields:

class A {
    int x;
    void f() {x = 1;}    //(1)
}

class B extends A {
    void g() {x = 2;}    //(2)
}

Technically, accessing x in (1) is not the same as in (2): f() uses a reference to A; g() uses a reference to B. However, the same concrete field is being accessed — because B is not defining (masking in this case) its own field named x. Even if B were to define its own field x (of type int or else), we could still access A.x by casting thisto B.  Similar issues arise with methods, with the added complexity of interface definitions and overrides.

JEB now handles renaming those references properly. Also remember that viewing the list of cross-references (key: X) does not display related references. You can see those by executing the Overrides action (key: O).

Various accesses to field A.i0 (here accessing it via type B) can be seen by using the O key. The O key also works for method references.

Miscellaneous API Updates

The API was augmented in various places. This blog being focused on Android changes, have a look at the definition updates in those interfaces:

  • IDexUnit and IDexFile: those interfaces have been present since day 1 or almost; we added a few convenience routines such as getDisassembly(). Remember that IDexUnit represents an entire DEX unit, possibly the result of an underlying merger of several DEX files, if the app in question is a multi-DEX one. If you need to access physical details of a given classesX.dex, use the corresponding IDexFile object, which can be retrieved via the master IDexUnit.
  • IApkUnit: also a well-known interface; several convenience methods were added to access common Android Manifest properties, such as activities, services, providers, receivers, etc. Obviously, you may access the Manifest directly (it is an IXmlUnit) and perform your own XML navigation.
  • IXApkUnit: this new interface represents Extended APK (XAPK) files and is self-explanatory.
  • ICertificateUnit: the certificate unit is also self-explanatory. It offers a direct reference to a parsed X509 certificate object.

 

  1. Unlike what the official doc says, a Manifest tag may not contain an Application element.

Debugging Dynamically Loaded DEX Bytecode Files

The JEB 2.3.2 release contains several enhancements of our JDWP and GDB/LLDB1 debugger clients used to debug both the Dalvik bytecode and native code of Android applications.

Dynamically loaded DEX files

In this post, we wanted to highlight a neat addition to our Dalvik debugger. Up until now, we did not support debugging several DEX files within a single debugging session. 2

So, we decided to add support for debugging DEX files loaded in a dynamic fashion. Below is a use-case, step-by-step study of a simple app whose workflow goes along these lines:

  1. A routine in the principal classes.dex file looks for an encrypted asset
  2. That asset is extracted and decrypted; it is a Jar file containing additional DEX bytecode
  3. The Jar file is dynamically loaded using DexClassLoader, and its code is executed

Now, we want to debug that additional bytecode. How do we proceed?

An example of debugging dynamically loaded bytecode

The app is called EnDyna (a benign crackme-like app, download it here). It offers a simple text box, and waits for the user to input a passcode. When entering the proper passcode, a success message is displayed.

The app requires the right password

Open the app in JEB. It contains a seemingly-encrypted asset file called edd.bin.

Encrypted asset file

A closer look at the MainActivity class shows that the edd.bin file is extracted, decrypted (using a simple XOR cipher) and loaded using DexClassLoader in order to validate the user input.

Passcode verification routine

Let’s attach the debugger to the app, and set a breakpoint where the call to the DexClassLoader constuctor is made.

A breakpoint was set on the DexClassLoader constructor invocation

We then trigger the verify() routine by inputting a passcode and hitting the Verify button. Our breakpoint is immediately hit. By examining the stackframe of the paused thread, we can retrieve the class loader variables and see where the decrypted DEX file was written to – and is about to get loaded from.

The decrypted Jar file about to be loaded from the path referenced by the stack variable v8

We use the Dalvik debugger interpreter to retrieve the file (command “pull”).

We now have the Jar file containing our dynamically-loaded DEX file in hand! We load it in JEB by adding an additional artifact to the project (command File, Add an Artifact…).

After processing is complete, the Android debugger notices that the added artifact contains a DEX file, and integrates it in its list of managed units.

We can set a breakpoint on the method of the second DEX file that’s about to be called.3

The second DEX file; notice the decompiled chk() method on the right-side. Here, we set a breakpoint on the method’s first instruction. It’s about to be called from MainActivity.verify(), in the primary classes.dex file.

We resume execution, our breakpoint is hit: we can start debugging the dynamically dropped DEX file!

Of course, all of the above actions can be automated by a Python script or a Java plugin. (We will upload a sample script that hooks DexClassLoader on our public GitHub repository shortly.)

We published a short video that demos the above steps, have a look at it if you want to know precisely the steps that we took to get to debug the additional DEX file.

Thank you – stay tuned for more updates, and happy debugging!

  1. Our native GDB debugger client underwent a major revamp, as we upgraded to the LLDB debugger server instead of gdbserver. More details in a separate post!
  2. It was a non-issue for standard multi-DEX APKs since JEB automatically merges them into a single, virtual DEX file, bypassing the 64Kref limits if it has to
  3. Note that the class in question (com.xyz.kf.Ver) may not even be loaded at this point; it is perfectly fine to do so: JEB handles dynamically loaded types fine and will register breakpoints timely and accordingly.

Android O and DEX 38: Dalvik Opcodes for Dynamic Invocation

Android O – API level 26 – upgrades the DEX format in order to provide support for dynamic invocation via two new Dalvik opcodes: invoke-polymorphic and invoke-custom.1

In this post, we will:

  1. Do a brief recap of how dynamic invocation is achieved in Java
  2. Present the changes made to the DEX file format in Android O
  3. Explain what the new dynamic invocation instructions can do and how they work
  4. Show code samples to generate DEX version 38 files
  5. Have a quick look at dynamic invocations in the context of app obfuscation

Note that JEB supports DEX version 38, as well as version 39 additions. That includes API support for programmatic access to the new pools and invoke-polymorphic, invoke-custom instructions, via the IDexUnit entry point interface.

Java’s invokedynamic

As a reminder, and quoting from the official Oracle doc (emphasis mine):

“[invokedynamic] improves implementations of compilers and runtime systems for dynamic languages on the JVM. It does this by allowing the language implementer to define custom linkage behavior. This contrasts with other JVM instructions such as invokevirtual, in which linkage behavior specific to Java classes and interfaces is hard-wired by the JVM.”

If that sounded like gibberish to you, you may want to get up to speed on dynamic invocation in Java – in particular, read the javadoc of MethodHandle and CallSite. We will (re)explain a bit in this post, but it is definitely not the main purpose of it. On top of the official Oracle doc as well as the original JSR, I recommend this article from the author of ByteBuddy.

Back in the Dalvik world

Up until DEX v35/v37, the way to invoke code in Dalvik was through one of the 5 invocation instructions:

  • invoke-virtual for virtual methods (Java’s invokevirtual)
  • invoke-static for static methods (Java’s invokestatic)
  • invoke-interface for methods called on interface types (Java’s invokeinterface)
  • invoke-super for super-class methods  (Java’s invokespecial)
  • invoke-direct for constructors  (Java’s invokespecial, again)
invoke-virtual Ljava/lang/String;->length()I, v

Each one of these takes a method item, which specifies a type (class or interface) as well as a method reference – ie, the “hard-wiring” part mentioned in the above quote. Java is statically typed, and the bytecode reflects that.2 That is, until invokedynamic was introduced with Java 7.

So, what is the Dalvik equivalent of Java’s invokedynamic?

Actually, there are 4 (2×2):

  • invoke-polymorphic (as well as invoke-polymorphic/range), which does “half” of what invokedynamic can do;
  • invoke-custom (as well as invoke-custom/range), which does the other, more powerful “half”.

invoke-custom requires additional pool elements, namely method handle items and call site items. Let’s walk over the DEX format additions to support those additional pools.

DEX version 38 changes

Most DEX files have version number 35. Android Nougat introduced version 37, which did not bring any structural changes (the new version code indicated support for Java 8’s default methods). If you were wondering why Dalvik did not have the equivalent of JVM’s invokedynamic, well, brace yourself: DEX version 38 is coming.

The header magic is now “DEX\x0A038\x00”. The updated file layout shows two additional pools: call_site_ids and method_handles.

However, the header size is still 70h bytes, and therefore, contains neither the offset to, nor the count of items, for those pools. Where are they?

Let’s turn to the DEX map. Sure enough, new types were introduced: TYPE_CALL_SITE_ID_ITEM (7) and TYPE_METHOD_HANDLE_ITEM (8). We can parse the map, find those two entries, and start parsing the pools.

  • call site item is essentially an array of DEX Values. The array contains at least 3 entries:
    • a method handle index (as in: a Java MethodHandle) to a bootstrap linker method;
    • a dynamic method name, the one to be dynamically resolved
    • a dynamic method type (as in: a DEX prototype);
    • additional arguments. More on this later when we discuss invoke-custom.
  • A method handle item contains:
    • a type, indicating whether the method handle is a method invoker or a field accessor;
    • and a method id or field id, depending on the aforementioned type.

As far as other changes go, obviously, the DEX Value entries can be of two additional types: VALUE_METHOD_TYPE (0x15) that references the prototypes pool, and VALUE_METHOD_HANDLE (0x16) that references the method handles pool. (Note that there is no VALUE_CALL_SITE.)

Now, let’s see how those pools are used by the new invoke instructions, and how those instructions work.

Dalvik’s invoke-polymorphic

Below are the specifications of invoke-polymorphic taken from Android Source:

invoke-polymorphic MH.invoke, prototype, {mh, [args]}

invoke-polymorphic is used to invoke a method handle using one of two @PolymorphicSignature 3 methods of the MethodHandle object: invoke() or invokeExact(). It takes at least 3 arguments:

  • A method reference to either MethodHandle.invoke or MethodHandle.invokeExact (MH.invoke)
  • The prototype of the method to be executed
  • A method handle (mh) of the target

See the example below: MethodHandle.invoke() is used on the method handle v0; the target method has the prototype (I)Object. Therefore, v1 is of type int; the return value will be of type Object.

invoke-polymorphic used in a Dalvik version 38 file

The return type as well as parameter types are specified in the prototype item, instead of a static method item — hello, polymorphism. Of course, the target method handle must reference a method of such type, either exactly, if MethodHandle.invokeExact() is used, or have compatibility with the type (via  conversion operations), using MethodHandle.invoke().

Wait, That looks like a normal invocation!

You would be semi-right to think so. After all, we are executing invoke() or invokeExact() the old fashion way here… so, why need an additional opcode? First, remember that those methods have polymorphic signatures; their prototype is determined at compile-time. Therefore, there are two options (using the example above):

  • either the bytecode references an invoke with an (I)Object prototype: in this case, we could simply call invoke-virtual on an artificial invoke(I)Object. This is the case with the Java bytecode: invokevirtual is used;
  • or the bytecode references the generic invoke([Object)Object: in this case, the invocation would require an additional prototype argument. Hence the requirement for a new invoke opcode. This is the case with the Dalvik bytecode: invoke-polymorphic was created. It  takes not one, but two pool indexes.4

Can’t I do the same with reflection?

You may be wondering what the point of these convoluted constructions is… After all, couldn’t we do the same with reflection? The answer is mostly yes, however, remember that invokedynamic has a different goal than introspection: the goal of invokedynamic is to provide an efficient low-level primitive meant to execute dynamic call sites, and therefore, enable the implementation of dynamic languages on top of the JVM.

Practically, and as far as Java goes,  they enable the implementation of Java 8 lambdas without the use of pre-compiled anonymous inner classes.

Also practically, true polymorphism means we are no longer dealing with the auto-boxing casts associated with Reflection API calls. MethodHandle.invoke() is a very particular method – as said above, it is has a polymorphic  signature, inferred at compile-time based from the types of arguments and return value provided in the call. Nothing like actual code to show what we mean here.

Sample Code

The example below has a triple-purpose:

  • Set up your environment to generate DEX v38 files;
  • Generate DEX files containing invoke-polymorphic instructions;
  • Compare MethodHandle.invoke() vs reflection.

First, download Android Studio 2.3. It turns out that at the time of writing, compiling DEX v38 with a non-Jack toolchain (using AS 3.0) produces invalid DEX files.  Make sure to use the Android Gradle plugin 2.3.3 or above. Finally, make sure to use SDK level 26 in your module-level Gradle script, and add the “o-b2” option to allow the generation of the new invoke instructions (thank you, Tsuyoshi Murakami):

android.defaultConfig.jackOptions {
    enabled = true
    additionalParameters 'jack.android.min-api-level': 'o-b2'
}

Now, let’s compile this 30-line activity class:

public class MainActivity extends Activity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        String text = "";
        try {
            text = String.format("dynamic=%s reflect=%s", execDynamically(), execViaReflection());
        }
        catch(Throwable e) {
            text = e.toString();
        }
        Log.d("DexVer38", text);
        TextView tv = new TextView(this);
        tv.setText(text);
        setContentView(tv);
    }

    static Object execDynamically() throws Throwable {
        return MethodHandles.lookup().findStatic(MainActivity.class,"foo",
                MethodType.methodType(int.class, String.class, int.class)).invoke("hello", 2);
    }

    static Object execViaReflection() throws Throwable {
        return MainActivity.class.getDeclaredMethod("foo", String.class, int.class).invoke(null, "hello", 2);
    }

    static int foo(String s, int i) {
        return s.charAt(i);
    }
}

Both execDynamically() and execViaReflection() methods eventually invoke foo(“hello”, 2) and return its result:

$ adb logcat -s "DexVer38"
[...] D DexVer38: dynamic=108 reflect=108

However, while the polymorphic MethodHandle.invoke() of execDynamically truly takes a String as first argument, an int as second argument, and returns an int; we know it is not the case with the non-polymorphic invocation used by the Method.invoke(): casts are in place to box/unbox the int primitives to/from an Integer object.

Open the resulting DEX file in JEB:

MethodHandle and invoke-polymorphic vs Reflection and invoke-cirtual

Carefully look at the disassembly of both methods:

  • invoke-polymorphic’s MethodHandle.invoke handles any prototype, as long as the referenced method matches it
  • reflection’s Method.invoke is called using a traditional invoke, and therefore, its arguments must be a an array of Object, and its return value an Object — hence, the casts.

I hope this sheds some light on invoke-polymorphic, in terms of MethodHandle uses and resulting differences in the bytecode.

Dalvik’s invoke-custom

Below are the specifications of invoke-custom taken from Android Source:

invoke-custom callsite, {arguments}

Dalvik’s invoke-custom ~= Java’s invokedynamic

Before we explain the mechanics behind invoke-custom, remember that unlike the legacy invoke-xxx instructions, it does not take a reference to the LType;->method() that will be executed. Both will be determined at run-time.

The invoke-custom instruction first resolves and then invokes a call site:

  • Initially, an invoke-custom instruction is an unlinked state: its call site has yet to be created. It is the resolution stage:
    • The runtime checks if a CallSite object exists for the provided callsite index
    • If not, a new CallSite object is created using the data provided by the call site item at the corresponding pool index, via a bootstrap linker method
    • The invoke-custom is now in a linked state
  • When the invoke-custom is in a linked site, the CallSite object’s MethodHandle is invoked.

The following diagram summarizes the bootstrap process of linking an unlinked invoke-custom:

Delaying the resolution and creation of the callsite until runtime allows the VM to take the decision of which type and which method should the execution flow be dispatched to.

Bear in mind that in standard Java, crafting explicit code using dynamic invocation is currently not possible. That limitation can be circumvented with custom toolchains (such as Android’s Jack, as we’ll see below). However, a prime candidate for implicit use of dynamic invocations are of course lambdas. Lambda functions have been supported since Android Nougat and are currently compiled using virtual invocations. It is safe to say that we should see lambdas using invoke-custom in the near future, maybe as early as the release candidate of Android O.

Sample Code

Currently, crafting high-level Java code that produces invoke-custom is convoluted and artificial — unfortunately, lambdas are still desugared into statically invoked methods of synthetic inner classes.

Two possible options are:

  1. Crafting Dalvik code manually , or via a custom tool, or via a bytecode manipulation library. It is outside the scope of this post;
  2. Use the soon-to-be-deprecated Jack toolchain and custom Jack annotations to generate bootstrap methods.

Using the second approach, we can generate code that contains correct call site item pools. However, at the moment, those DEX files do not pass the Verifier.

That being said, the generated bytecode looks fine. Have a look at the sample below:

public class MainActivity extends Activity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        String text = "";
        try {
            text = "" + execCustom();
        }
        catch(Throwable e) {
            text = formatThrowable(e);
        }
        Log.d("DexVer38", text);
        TextView tv = new TextView(this);
        tv.setText(text);
        setContentView(tv);
    }

    public static String formatThrowable(Throwable t) {
        Writer writer = new StringWriter();
        PrintWriter out = new PrintWriter(writer);
        t.printStackTrace(out);
        return writer.toString();
    }

    public static Character execCustom() throws Throwable {
        return foo("hello", Integer.valueOf(2));
    }

    @CalledByInvokeCustom(
            invokeMethodHandle = @LinkerMethodHandle(kind = MethodHandleKind.INVOKE_STATIC,
                    enclosingType = MainActivity.class,
                    name = "linkerMethod",
                    argumentTypes = {MethodHandles.Lookup.class, String.class, MethodType.class}),
            name = "foo",
            returnType = Character.class,
            argumentTypes = {String.class, Integer.class})
    static Character foo(String s, Integer i) {
        return s.charAt(i);
    }

    private static CallSite linkerMethod(MethodHandles.Lookup caller, String name, MethodType methodType)
            throws NoSuchMethodException, IllegalAccessException {
        return new ConstantCallSite(caller.findStatic(caller.lookupClass(), name, methodType));
    }
}

Using the CalledByInvokeCustom annotation, we can specify that foo() must be dynamically invoked. The code is a bit artificial, and the linked method trivial, but see how the seemingly static call to foo() in execCustom() was compiled to the following bytecode:

invoke-custom used in a Dalvik version 38 file

Note that the JEB syntax for invoke-custom call sites is temporary and subject to change. At the moment, the pool’s call site is displayed within double curly brackets:

{{ MethodReference / DynamicMethodName / DynamicPrototype / [additional arguments, …] }}

JEB will decompile those constructs to an invocation of the bootstrap linker method, followed by a call to invoke() on the returned CallSite’s method handle. In a real environment, the bootstrap method would be executed just once. Indeed, high-level Java code cannot reflect all forms and uses of those low-level constructs.

Keep in mind that invoke-custom‘s purpose is much broader than this dummy example. As said in the previous section, we should expect it to initially be used when generating Java 8’s lambdas. They may not be extremely popular – not yet – in traditional Java programming circles, but Google’s big push on Kotlin for Android O, including:

  • Kotlin integration in Android Studio, facilitating adoption;
  • Kotlin full compatibility with Java, allowing mixed code base during migration;
  • Kotlin’s affinities with dynamically-typed languages;

may be indicators that invoke-custom (and invoke-polymorphic) will be used to power new language features for Android app development in the near future.

Dynamic invocation used for obfuscation

Finally, let’s conclude this post with a note on obfuscation, and generally, unintended, unplanned, or at least non-primary use cases, for MethodHandle.

Just like reflection has been heavily used by all5 Dalvik protectors and obfuscators to hide API calls and make static code flow analysis difficult, we should expect MethodHandle and CallSite to be used in similar ways.

MethodHandle objects have more restrictions than pure reflection though, eg, in terms of the scope of what can be retrieved. Obviously, they cannot be used to retrieve types dynamically — which means there is no equivalent to Reflection’s Class.forName(“…”). However, they can be used to retrieve handles on methods, constructors, and fields, and therefore could be mixed in with standard reflection-based obfuscation techniques.

As for invoke-custom: parsing and analysis of the call site items pool will be required to retrieve references to boostrap linker methods, and determine their effect on code.

So, exciting times ahead! We should all be excited to see those new dynamic invoke opcodes used by apps in the future, as well as the potential they bear in terms of new languages (or more realistically, new language features) that they can provide for Android app development.

  1. And their /range counterparts. Essentially, this update is the Android implementation of JSR292
  2. That point is debated; however, the invoke call sites exhibit the static nature of type binding in the bytecode.
  3. The PolymorphicSignature annotation is defined within MethodHandle and visible only to types declared  in the java.lang.invoke package
  4. Both ways are conceptually valid. In the first case, we are assuming that an infinity of MethodHandle.invoke signatures exist. In the second case, we consider that MethodHandle.invoke true prototype is ([Object)Object; that means we must provide the actual prototype separately, via a new opcode.
  5. The Dalvik verifier is quite strict and limits the classes of obfuscation that can be applied onto bytecode.