Update: March 8, 2022:
– The most up-to-date version of this document can be found in the Manual
Update: Dec 8, 2021:
– Reference section with list of special translations for EVM opcodes
Update: Jan 2, 2019:
– The full EVM decompiler shipped with JEB 3.0-beta.8
– Download a sample JEB Python script showing how to use the API
Update: Nov 20, 2018:
– We uploaded the decompiled code of an interested contract, the second part of the PolySwarm challenge (a good write-up can be found here)
We’re excited to announce that the pre-release of our Ethereum smart contract decompiler is available. We hope that it will become a tool of choice for security auditors, vulnerability researchers, and reverse engineers examining opaque smart contracts running on Ethereum platforms.
TL;DR: Download the demo build and start reversing contracts
Keep on reading to learn about the current features of the decompiler; how to use it and understand its output; its current limitations, and planned additions.
Overall decompiler features
The decompiler modules provide the following specific capabilities:
-
- The initial EVM code analysis passes determine contract’s public and private methods, including implementations of public methods synthetically generated by compilers.
-
- Code analysis attempts to determine method and event names and prototypes, without access to an ABI.
- The decompiler also attempts to recover various high-level constructs, including:
-
- Implementations of well-known interfaces, such as ERC20 for standard tokens, ERC721 for non-fungible tokens, MultiSigWallet contracts, etc.
-
- Storage variables and types
- High-level Solidity artifacts and idioms, including:
-
- Function mutability attributes
- Function payability state
- Event emission, including event name
- Invocations of address.send() or address.transfer()
- Precompiled contracts invocations
-
-
On top of the above, the JEB back-end and client platform provide the following standard functionality:
-
- The decompiler uses JEB’s optimizations pipeline to produce high-level clean code.
-
- It uses JEB code analysis core features, and therefore permits: code refactoring (eg, consistently renaming methods or fields), commenting and annotating, navigating (eg, cross references), typing, graphing, etc.
-
- Users have access to the intermediate-level IR representation as well as high-level AST representations though the JEB API.
- More generally, the API allows power-users to write extensions, ranging from simple scripts in Python to complex plugins in Java.
Our Ethereum modules were tested on thousands of smart contracts active on Ethereum mainnet and testnets.
Basic usage
Open a contract via the “File, Download Ethereum Contract…” menu entry.
You will be offered two options:
- Open a binary file already stored on disk
- Download 2 and open a contract from one of the principal Ethereum networks: mainnet, rinkeby, ropsten, or kovan:
- Select the network
- Provide the contract 20-byte address
- Click Download and select a file destination
Note that to be recognized as EVM code, a file must:
- either have a “.evm-bytecode” extension: in this case, the file may contain binary or hex-encoded code;
- or have be a “.runtime” or “.bin-runtime” extension (as generated by the solc Solidity compiler), and contain hex-encoded Solidity generated code.
If you are opening raw files, we recommend you append the “.evm-extension” to them in order to guarantee that they will be processed as EVM contract code.
Contract Processing
JEB will process your contract file and generate a DecompiledContract class item to represent it:
To switch to the decompiled view, select the “Decompiled Contract” node in the Hierarchy view, and press TAB (or right-click, Decompile).
The decompiled contract is rendered in Solidity-like code: it is mostly Solidity code, but not entirely; constructs that are illegal in Solidity are used throughout the code to represent instructions that the decompiler could not represent otherwise. Examples include: low-level statements representing some low-level EVM instructions, memory accesses, or very rarely, goto statements. Do not expect a DecompiledContract to be easily recompiled.
Code views
You may adjust the View panels to have side-by-side views if you wish to navigate the assembly and high-level code at the same time.
- In the assembly view, within a routine, press Space to visualize its control flow graph.
- To navigate from assembly to source, and back, press the TAB key. The caret will be positioned on the closest matching instruction.
Contract information
In the Project Explorer panel, double click the contract node (the node with the official Ethereum Foundation logo), and then select the Description tab in the opened view to see interesting information about the processed contract, such as:
- The detected compiler and/or its version (currently supported are variants of Solidity and Vyper compilers).
- The list of detected routines (private and public, with their hashes).
- The Swarm hash of the metadata file, if any.
Commands
The usual commands can be used to refactor and annotate the assembly or decompiled code. You will find the exhaustive list in the Action and Native menus. Here are basic commands:
- Rename items (methods, variables, globals, …) using the N key
- Navigate the code by examining cross-references, using the X key (eg, find all callers of a method and jump to one of them)
- Comment using the Slash key
- As said earlier, the TAB key is useful to navigate back and forth from the low-level EVM code to high-level decompiled code
We recommend you to browser the general user manual to get up to speed on how to use JEB.
Remember that you can change immediate number bases and rendering by using the B key. In the example below, you can see a couple of strings present in the bad Fomo3D contract, initially rendered in Hex:
Understanding decompiled contracts
This section highlights idioms you will encounter throughout decompiled pseudo-Solidity code. The examples below show the JEB UI Client with an assembly on the left side, and high level decompiled code on the right side. The contracts used as examples are live contracts currently active Ethereum mainnet.
We also highlight current limitations and planned additions.
Dispatcher and public functions
The entry-point function of a contract, at address 0, is generally its dispatcher. It is named start() by JEB, and in most cases will consist in an if-statement comparing the input calldata hash (the first 4 bytes) to pre-calculated hashes, to determine which routine is to be executed.
- JEB attempts to determine public method names by using a hash dictionary (currently containing more than 140,000 entries).
- Contracts compiled by Solidity generally use synthetic (compiler generated) methods as bridges between public routines, that use the public Ethereum ABI, and internal routines, using a compiler-specific ABI. Those routines are identified as well and, if their corresponding public method was named, will be assigned a similar name __impl_{PUBLIC_NAME}.
NOTE/PLANNED ADDITION: currently, JEB does not attempt to process input data of public routines and massage it back into an explicit prototype with regular variables. Therefore, you will see low-level access to CALLDATA bytes within public methods.
Below, see the public method collectToken(), which is retrieving its first parameter – a 20 byte address – from the calldata.
Interface discovery
At the time of writing, implementation of the following interfaces can be detected: ERC20, ERC165, ERC721, ERC721TokenReceiver, ERC721Metadata, ERC721Enumerable, ERC820, ERC223, ERC777, TokenFallback used by ERC223/ERC777 interfaces, as well as the common MultiSigWallet interface.
Eg, the contract below was identified as an ERC20 token implementation:
Function attributes
JEB does its best to retrieve:
- low-level state mutability attributes (pure, read-only, read-write)
- the high-level Solidity ‘payable’ attribute, reserved for public methods
Explicitly non-payable functions have lower-level synthetic stubs that verify that no Ether is being received. They REVERT if it is is the case. If JEB decides to remove this stub, the function will always have an inline comment /* non payable */ to avoid any ambiguity.
The contract below shows two public methods, one has a default mutability state (non-payable); the other one is payable. (Note that the hash 0xFF03AD56 was not resolved, therefore the name of the method is unknown and was set to sub_AF; you may also see a call to the collect()’s bridge function __impl_collect(), as was mentioned in the previous section).
Storage variables
The pre-release decompiler ships with a limited storage reconstructor module.
- Accesses to primitives (int8 to int256, uint8 to uint256) is reconstructed in most cases
- Packed small primitives in storage words are extracted (eg, a 256-bit storage word containing 2x uint8 and 1x int32, and accessed as such throughout the code, will yield 3 contract variables, as one would expect to see in a Solidity contract
However, currently, accesses to complex storage variables, such as mappings, mappings of mappings, mappings of structures, etc. are not simplified. This limitation will be addressed in the full release.
When a storage variable is not resolved, you will see simple “storage[…]” assignments, such as:
Due to how storage on Ethereum is designed (a key-value store of uint256 to uint256), Solidity internally uses a two-or-more indirection level for computing actual storage keys. Those low-level storage keys depend on the position of the high level storage variables. The KECCAK256 opcode is used to calculate intermediate and final keys. We will detail this mechanism in detail in a future blog post.
Precompiled contracts
Ethereum defines four pre-compiled contracts at addresses 1, 2, 3, 4. (Other addresses (5-8) are being reserved for additional pre-compiled contracts, but this is still at the ERC stage.)
JEB identifies CALLs that will eventually lead to pre-compiled code execution, and marks them as such in decompiled code: call_{specific}.
The example below shows the __impl_Receive (named recovered) method of the 34C3 CTF contract, which calls into address #2, a pre-compiled contract providing a fast implementation of SHA-256.
Ether send()
Solidity’s send can be translated into a lower-level call with a standard gas stipend and zero parameters. It is essentially used to send Ether to a contract through the target contract fallback function.
NOTE: Currently, JEB renders them as send(address, amount) instead of address.send(amount)
The contract below is live on mainnet. It is a simple forwarder, that does not store ether: it forwards the received amount to another contract.
Ether transfer()
Solidity’s transfer is an even higher-level variant of send that checks and REVERTs with data if CALL failed. JEB identifies those calls as well.
NOTE: Currently, JEB renders them as transfer(address, amount) instead of address.transfer(amount)
Event emission
JEB attempts to partially reconstruct LOGx (x in 1..4) opcodes back into high-level Solidity “emit Event(…)”. The event name is resolved by reversing the Event method prototype hash. At the time of writing, our dictionary contains more than 20,000 entries.
If JEB cannot reverse a LOGx instruction, or if LOG0 is used, then a lower-level log(…) call will be used.
NOTE: currently, the event parameters are not processed; therefore, the emit construct used in the decompiled code has the following form: emit Event(memory, size[, topic2[, topic3[, topic4]]]). topic1 is always used to store the event prototype hash.
API
JEB API allows automation of complex or repetitive tasks. Back-end plugins or complex scripts can be written in Python or Java. The API update that ship with JEB 3.0-beta.6 allow users to query decompiled contract code:
- access to the intermediate representation (IR)
- access to the final Solidity-like representation (AST)
API use is out-of-scope here. We will provide examples either in a subsequent blog post or on our public GitHub repository.
Additional References
List of EVM opcodes that receive special translation: link (on GitHub)
Conclusion
As said in the introduction, if you are reverse engineering opaque contracts (that is, most contracts on Ethereum’s mainnet), we believe you will find JEB useful.
You may give a try to the pre-release by downloading the demo here. Please let us know your feedback: we are planning a full release before the end of the year.
As always, thank you to all our users and supporters. -Nicolas
Just trying to understand why would one need decompiler for smart contracts when along with the ABI the smart contract code would also be available on Etherscan website.
Actually, the majority of active smart contracts – at least on Ethereum’s mainnet – are opaque: they ship with neither source nor ABI.
As an example, head over to Etherscan’s Top Accounts section. You will see plenty of contracts handling from hundreds of thousands to millions of dollars in ETH, without published source code.
Hi Nicolas, are you able to backtrace map() types key generation parameters with the JEB decompiler now?
I downloaded and installed the demo version.
But I don’t see the option:
“Open Smart Contract”
What do I need to do for her to appear?
Thanks
It would be “File, Download Ethereum Contract…”. I will updated the post, thanks.