Solidity is the smart contract language of the Ethereum blockchain. It gets compiled into bytecode by the solc compiler. As one might expect, the compiled bytecode is intended to be executed by a computer - or rather, by the the Ethereum Virtual Machine (EVM) distributed across all of the nodes participating in the Ethereum blockchain. As bytecode it lacks the context of the original source code that would make it human readable.
If all we have is the compiled Solidity bytecode of a smart contract, how do we know what it does? If there’s documentation about what it does, great. But what if it’s missing, incomplete, or we don’t trust it? We can try running the smart contract, perhaps in a sandboxed environment, with various inputs and observe the outputs, but many smart contracts are complex and linked to other smart contracts or hard coded Ethereum addresses.
Here’s a very simple example of Solidity, Test1.sol based on the example in the evmdis README:
And here it is assembled into bytecode with solc, the Solidity compiler:
We could try disassembling the bytecode, using a tool to translate each opcode into a human readable instruction. The result would still be fairly obtuse, with auto-generated variables names and concise, higher level constructs like loops and branches optimized into a verbose long form of assembly instructions.
Not particularly easy to read.
Let’s say we had the source code to a smart contract, and wanted to see if it was the same or similar to a compiled smart contract already on the blockchain. We could compile the source code to bytecode and perform a byte-by-byte comparison. This is unlikely to work in practice, as compilers tend to be nondeterministic between versions as optimizations are introduced. Even across two different executions on the same system, the resulting compiled bytecode can be different. Solidity has a goal of being deterministic, but hasn’t always been. In the case of The DAO, there was an extensive effort undertaken to validate that the deployed DAO bytecode matched the source code made challenging by compiler non-determinism.
It’s worth noting that Etherscan has a handy online facility to verify deployed smart contract bytecode vs. the source code. It allows selection of the full range of solc tags to try compiling against.
For offline smart contract verification, Nick Johnson’s evmdis is a Solidity bytecode disassembler which takes a slightly different approach to disassembly. It implements a static analysis technique called abstract interpretation which simulates the execution of a sample of Solidity bytecode (here’s another useful tutorial on the subject [PDF]). The evmdis readme has a good summary, but essentially it runs the program and tracks unique permutations of the program’s stack. It also breaks the program into logical basic blocks and translates series of simples expressions into compound ones. The output is a more concise series of assembly instructions and jump labels organized into logical blocks, a kind of “summarized assembly”, something a human can more easily analyze and reason about.
We learned a few lessons from playing around from this. First, there are some crucial options to pass to the Solidity compiler, solc, to make things work:
If we had provided just the --bin flag instead of --bin-runtime, solc would have automatically wrapped our smart contract with code to load the smart contract itself onto the blockchain. If we then try to simulate execution of this bytecode with evmdis, we don’t get any useful output because the we just end up running the ‘loader’ code and not the contained smart contract.
The --optimize flag is like the -O flag to gcc, it tries to optimize the compiled code.
The -o . flag specifies the current directory as the output directory. Otherwise it would be to stdout with some extra output that we’d need to strip off.
Next we download and install evmdis:
evmdis expects the raw hex data (ASCII base-16 representation of Solidity bytecode) output by solc to be piped to it.
Still not super compact but more so than the raw decoded assembly.
Let’s try making a trivial change to our test smart contract, compiling and comparing it. We’ll put this in Test2.sol.
Not very enlightening, although we can see there are differences. Let’s try diffing the evmdis output.
The difference here is fairly clear. It’s worth noting that at this point we were getting tired of piping, mv’ing and diffing files. We extended evmdis so you can now just do this:
Feel free to play around with our modifications. You can pass in Solidity, which will automatically be compiled with solc, or bytecode in ASCII hex format. It retains the original evmdis functionality of parsing stdin, if desired. If one input is provided, it will be disassembled and displayed. If two inputs are provided, they will be disassembled and compared. Optionally, the bytecode and source Solidity (if available) can be compared as well.
We also added the ability to pass in an Ethereum smart contract address, which will download the bytecode and use that as an input. However, we are not making these modifications public as they rely on scraping a third party website and we don’t want to be a source of noise for them.
It does allow us to come back to something we mentioned earlier in this post: comparing the deployed DAO smart contract to its source code.
For The DAO, we can use an earlier version of solc with our modified version of evmdis:
We can see from the output there are a lot of changes, across a total of 5540 lines of disassembled code. This is a lot to look at, however, far less than the raw disassembly:
We’re going to leave things here for now. One challenge you might have in testing out evmdis or solc against existing Solidity source code is finding a runnable copy of a particular solc version. We spent a lot of time getting solc 0.3.2 working as part of this exercise, and it’s the basis of our next post.