Ethereum Analysis

Note

This manual page assumes familiarity with common JEB actions and views. We also assume a minimal amount of knowledge of the Ethereum Virtual Machine bytecode and framework of operations. This was adapted from contents originally published on our blog.

Capabilities#

The Ethereum plugin provides the following specific capabilities:

The EVM code analyzer determines a contract public and private methods, including implementations of public methods synthetically generated by compilers. This analysis attempts to determine method and event names and prototypes, without access to an ABI.
The EVM decompiler decompiles analyzed EVM code to Solidity-like source code. The decompiler attempts to recover various high-level constructs, including:
- Implementations of well-known interfaces, such as ERC20 for standard tokens, ERC721 for non-fungible tokens, MultiSigWallet contracts, etc.
- Storage variables and types
- High-level Solidity artifacts and idioms, including:
- Function mutability attributes
- Function payability state
- Event emission, including event name
- Invocations of address.send() or address.transfer()
- Precompiled contracts invocations

Basic usage#

Open a contract via the File, Open smart contract... menu entry.

You will be offered two options:

Open a binary file already stored on disk
Or download and open a contract from one of the principal Ethereum or Ethereum-compatible mainnets and testnets:
- Select the network
- Provide the contract 20-byte address
- Click Download and select a file destination

Open a contract via the File, Open smart contract menu entry

Note

To be recognized as EVM code, a file must:

have a ".evm-bytecode" extension: in this case, the file may contain binary or hex-encoded code;
or have a ".runtime" or ".bin-runtime" extension (as generated by the solc Solidity compiler) and contain hex-encoded Solidity generated code.

If you are opening raw files, we recommend you append the ".evm-extension" to them in order to guarantee that they will be processed as EVM contract code.

Contract Processing#

JEB will process your contract file and generate a class item to represent it:

The Assembly view on the right panel shows the processed code.

To switch to the decompiled view, select the "Decompiled Contract" node in the Code Hierarchy view, and press Tab (or right-click, Decompile).

Right-click on items to bring up context menus showing the principal commands and shortcuts.

A decompiled contract is rendered in Solidity-like code: it is mostly Solidity code, but not entirely. Constructs that are illegal in Solidity are used throughout the code to represent instructions that the decompiler could not represent otherwise. Examples include: statements representing some low-level EVM instructions, memory accesses, or very rarely, goto statements. Do not expect a DecompiledContract to be easily recompiled.

Code views#

You may adjust the View panels to have side-by-side views if you wish to navigate the assembly and high-level code at the same time.

In the assembly view, within a routine, press Space to visualize its control flow graph.
To navigate from assembly to source, and back, press the Tab key. The caret will be positioned on the closest matching instruction.

Contract information#

In the Project Explorer panel, double click the contract node (the node with the official Ethereum Foundation logo), and then select the Description tab in the opened view to see interesting information about the processed contract, such as:

The file metadata, if some were found and successfully parsed (Solidity-generated metadata is recovered)
The detected compiler and/or its version (currently supported are variants of Solidity and Vyper compilers).
The list of detected routines (private and public, with their hashes).

The contract was identified as being compiled with Solidity >= 0.4.22, specifically solc 0.6.12.

Exposed metadata: IPDS hash, Solidity compiler version. Keep in mind that metadata is indicative and should not be trusted.

Commands#

The usual commands can be used to refactor and annotate the assembly or decompiled code. You will find the exhaustive list in the Action and Native menus. Here are basic commands:

Rename items (methods, variables, globals, ...) using the N key
Navigate the code by examining cross-references using the X key (e.g., find all callers of a method and jump to one of them)
Comment using the Slash key
As said earlier, the Tab key is useful to navigate back and forth from low-level EVM code to high-level decompiled code

Rename an item (e.g., a variable) by pressing the N key

Immediate number bases and rendering can be changed by using the B key. In the example below, you can see a couple of strings present in the bad Fomo3D contract, initially rendered in Hex:

All immediates are rendered as hex-strings by default.

Use the B key to cycle through base (10, 16, etc.) and rendering (number, ascii)

Understanding decompiled contracts#

This section highlights idioms you will encounter throughout decompiled pseudo-Solidity code. The examples below show the GUI client set up to display EVM disassembly on the left side, and high-level decompiled code on the right side. The contracts used as examples are live contracts active on mainnet.

Dispatcher and public functions#

The entry-point function of a contract, at address 0, is generally its dispatcher. It is named start() by JEB, and in most cases will consist in an if-statement comparing the input CALLDATA hash (the first 4 bytes) to pre-calculated hashes, to determine which routine is to be executed.

JEB attempts to determine public method names by using a hash dictionary (currently containing more than 340,000 entries).
Contracts compiled by Solidity generally use synthetic (compiler generated) methods as bridges between public routines, that use the public Ethereum ABI, and internal routines, using a compiler-specific ABI. Those routines are identified as well and, if their corresponding public method was named, will be assigned a similar name __impl_{PUBLIC_NAME}.

Limitation

Currently, JEB does not attempt to process input data of public routines and massage it back into an explicit prototype with regular variables. Therefore, you will see low-level access to CALLDATA bytes within public methods.

Below, see the public method collectToken(), which is retrieving its first parameter – a 20 byte address – from the calldata.

A public method reading its arguments from CALLDATA bytes.

Interface discovery#

At the time of writing, implementation of the following interfaces can be detected: ERC20, ERC165, ERC721, ERC721TokenReceiver, ERC721Metadata, ERC721Enumerable, ERC820, ERC223, ERC777, TokenFallback used by ERC223/ERC777 interfaces, as well as the common MultiSigWallet interface.

E.g., the contract below was identified as an ERC20 token implementation:

This contract implements all methods specified by the ERC20 interface.

Function attributes#

JEB does its best to retrieve:

low-level state mutability attributes (pure, read-only, read-write)
the high-level Solidity ‘payable’ attribute, reserved for public methods

Explicitly non-payable functions have lower-level synthetic stubs that verify that no Ether is being received. They REVERT if it is is the case. If JEB decides to remove this stub, the function will always have an inline comment /* non payable */ to avoid any ambiguity.

The contract below shows two public methods, one has a default mutability state (non-payable); the other one is payable. Note that the hash 0xFF03AD56 was not resolved, therefore the name of the method is unknown and was set to sub_AF; you may also see a call to collect()’s bridge function __impl_collect(), as was mentioned in the previous section.

Two public methods, one is payable, the other is not and will revert if it receives Ether.

Storage variables#

The current decompiler has a rather limited storage reconstructor module.

Accesses to primitives (int8 to int256, uint8 to uint256) is reconstructed in most cases
Packed small primitives in storage words are extracted (e.g., a 256-bit storage word containing 2x uint8 and 1x int32, and accessed as such throughout the code, will yield 3 contract variables, as one would expect to see in a Solidity contract

Four primitive storage variables were reconstructed.

Limitation

Currently, accesses to complex storage variables, such as mappings, mappings of mappings, mappings of structures, etc. are not simplified. This limitation will be addressed in a future update.

When a storage variable is not resolved, you will see simple storage[...] assignments, such as:

Unresolved storage assignment, here, to a mapping.

Due to how storage on Ethereum is designed (a key-value store of uint256 to uint256), Solidity internally uses a two-or-more indirection level for computing actual storage keys. Those low-level storage keys depend on the position of the high level storage variables. The KECCAK256 opcode can be used to calculate intermediate and final keys. We will detail this mechanism in detail in a future blog post.

Precompiled contracts#

Ethereum defines at least four pre-compiled contracts at addresses 1 through 8.

JEB identifies CALLs that will eventually lead to pre-compiled code execution, and marks them as such in decompiled code: call_{specific}.

The example below shows the __impl_Receive (named recovered) method of the 34C3 CTF contract, which calls into address #2, a pre-compiled contract providing a fast implementation of SHA-256.

This contract calls address 2 to calculate the SHA-256 of a binary blob.

Ether send()#

Solidity's send can be translated into a lower-level call with a standard gas stipend and zero parameters. It is essentially used to send Ether to a contract through the target contract fall-back function.

Currently, JEB renders them as send(address, amount) instead of address.send(amount).

The contract below is live on mainnet. It is a simple forwarder, that does not store ether: it forwards the received amount to another contract.

This contract makes use of address.send(...) to send Ether

Ether transfer()#

Solidity’s transfer is an even higher-level variant of send that checks and REVERTs with data if CALL failed. JEB identifies those calls as well.

Currently, JEB renders them as transfer(address, amount) instead of address.transfer(amount).

This contract makes use of address.transfer(...) to send Ether

Event emission#

JEB attempts to partially reconstruct LOGx (x in 1..4) opcodes back into high-level Solidity emit Event(...). The event name is resolved by reversing the Event method prototype hash.

If JEB cannot reverse a LOGx instruction, or if LOG0 is used, then a lower-level log(...) call will be used.

Currently, the event parameters are not processed; therefore, the emit construct used in the decompiled code has the following form: emit Event(memory, size[, topic2[, topic3[, topic4]]]). topic1 is always used to store the event prototype hash.

An Invocation of LOG4 reversed to an `emit Deposit(...)` event emission

API#

The EVM analysis modules are built onto the native code analysis pipeline of JEB. Therefore, standard APIs can be used to automate analysis tasks. In particular, the decompiler API gives access to:

the intermediate representation (IR)
the final Solidity-like representation (AST)

Example

This sample script demonstrates how to retrieve the decompiled EVM code of an Ethereum contract and print out AST nodes: code

Refer to "Extending JEB" to get started with developing scripts or plugins for JEB.