🔥 Core Concepts
Overview
greed relies on analyses provided by Gigahorse – a fantastic analysis framework for EVM-based smart contracts developed by Dedaub. While Gigahorse offers precise CFG reconstruction and various out-of-the-box data-flow analyses (such as storage layout partial reconstruction, tainted calls, etc.), it lacks the capability to perform classic symbolic execution of a contract's code. For example, it cannot automatically discover the value of CALLDATA needed to reach a specific instruction. Additionally, we found using Datalog rules (or via the produced artifacts) to be less user-friendly when trying to quickly prototype research ideas. This led to the creation of greed!
When designing greed, we wanted to provide both a convenient Python wrapper for Gigahorse's analyses and symbolic execution capabilities similar to the popular tool Mythril. Unlike Mythril, we don't offer a one-click solution to discover smart contract vulnerabilities; instead, we provide a powerful and user-friendly smart contract binary analysis platform (those familiar with angr will find many similarities). That being said, one can implement all the analyses offered by Mythril on top of greed.
We believe that greed provides a more flexible way to prototype research tools and smart contract analyses. This is accomplished through (1) a modular (angr-inspired) design of the symbolic executor, (2) the inherited precision of the CFG analysis offered by Gigahorse, and (3) the conciseness of the TAC-IR (intermediate representation) that strips away the unnecessary complexities of the stack-based EVM design.
To use greed, it is first necessary to analyze a target contract binary with Gigahorse, obtaining all the decompilation artifacts and CFG reconstruction. If you are just getting started with greed, please refer to our quick start guide!
Three Address Code
Our symbolic executor, greed, is based on Gigahorse's T(hree) A(ddress) C(ode) intermediate representation – TAC-IR: a convenient IR that transforms the (EVM) stack-based opcodes into a register based representation.
For instance, the classic function transferFrom
in TAC would look like this:
function transferFrom(address,address,uint256)() public {
Begin block 0x1ca
prev=[], succ=[0x1d1, 0x1d5]
=================================
0x1cb: v1cb = CALLVALUE
0x1cc: v1cc = ISZERO v1cb
0x1cd: v1cd(0x1d5) = CONST
0x1d0: JUMPI v1cd(0x1d5), v1cc
Begin block 0x1d1
prev=[0x1ca], succ=[]
=================================
0x1d1: v1d1(0x0) = CONST
0x1d4: REVERT v1d1(0x0), v1d1(0x0)
...
One thing to keep in mind when using greed is: every variable defined by the TAC IR will become a virtual register. For instance, when you symbolically execute the transferFrom
function, you can access to content of the variable v1cb
like this:
CALLVALUE
.
Project
To create a greed Project you will just:
Where target_dir
is the path to the folder holding the Gigahorse analyses artifacts.
The project will import in greed some of the analysis results generated by Gigahorse (e.g., functions, blocks, and statements), and will make them available through handy object attributes:
>>> proj.code # raw bytecode
>>> proj.function_at # dictionary of all the functions in the contract
>>> proj.block_at # dictionary of all the basic block in the contract
>>> proj.statement_at # dictionary of all the (TAC) opcode in the contract
>>> proj.callgraph # callgraph of the entire contract code
CFG
Gigahorse provides state-of-the-art CFG reconstruction for smart contract binaries. This information is automatically imported and available via the greed's API.
Every basic block in the CFG is terminated by an unconditional jump (i.e., JUMP
), a conditional jump (JUMPI
), or a terminating opcode (STOP
/REVERT
/RETURN
).
To access the CFG information of specific functions in the contract you can do the following:
>>> proj.function_at['0x0'].cfg
>>> proj.function_at['0x0'].cfg.dump("mycfg.dot") # this will dump the CFG in .dot format
>>> proj.function_at['0x0'].cfg.bbs # this will return a list of block objects belonging to the CFG
Similarly, to access a specific Block
or Statement
information:
The Block
and Statement
objects, once fetched, offer numerous other attributes for advanced inspection.
State
Similary to angr, a SimState (i.e., State
) represents the state of the smart contract at a specific opcode. You can imagine the State as a snapshot of the execution at a specific program location.
You can create your first state from a project using the Factory
object:
>>> entry_state = proj.factory.entry_state(xid=1) # the xid represent an unique identifier for the current symbolic execution.
When no initial context is specified, every environment variable (e.g., balance, calldata, calldata size, block, etc...) will be an unconstrained symbolic variable. Conversely, you can pass an initial context by passing a dictionary and specifying some concrete values:
>>> ctx = {
"CALLDATA" : "0x001122330099aabb77ff00",
"CALLDATASIZE" : 11,
"CALLER" : "0x6b6Ae9eDA977833B379f4b49655124b4e5c64086",
"ORIGIN" : "0x6b6Ae9eDA977833B379f4b49655124b4e5c64086",
"ADDRESS" : "0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984",
}
>>> entry_state = proj.factory.entry_state(xid=1,init_ctx=ctx)
This will set the the CALLDATA for the symbolic execution to the value 0x001122330099aabb77ff00
, a CALLDATASIZE of 11 bytes, ADDRESS
of the currently executing contract to 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984
and CALLER
/ORIGIN
to 0x6b6Ae9eDA977833B379f4b49655124b4e5c64086
.
A state offers many API for its inspection:
>>> entry_state.curr_stmt # print the current statement ready for execution
>>> entry_state.revert # weather the state reverted or not
>>> entry_state.instruction_count # how many instructions have been executed up to this point
>>> entry_state.pc # current program counter
>>> entry_state.registers # overview of all the virtual registers defined during the execution up to this point
Symbolic Execution
Symbolic execution is the process of progressing SimState(s) and propagating the symbolic variables according to the program's execution.
Imagine, for instance, that the condition of a JUMPI
contains a symbolic variable X.
If the condition checks for X != 0
, and the constraints over the variable X make it possible for X != 0
and X == 0
, greed will create two different State(s), one at the location specified by the JUMPI (adding a constraint of X != 0
to the state), and one at the fallthrough location (adding a constraint of X == 0
).
Simulation Manager
To begin symbolic execution you need to instantiate a Simulation Manager
.
The simulation manager is a concept borrowed from angr, used to orchestrate the progression of the SimState(s).
Exploration Techniques can be used to provide a strategy to explore the states of the program. For instance, an exploration technique could be the classic breadth-first search (BFS) or depth-first search (DFS), but also more complicated strategies such as Directed Search.
You can instantiate a simulation manager via the Factory in this way:
By default, the employed exploration technique will be BFS. You can change this behavior by installing new exploration techniques that implement the Exploration Technique
object interface. For example:
>>> from greed.exploration_techniques import DirectedSearch
>>> directed_search = DirectedSearch(proj.statement_at['0x1e'])
>>> simgr.use_technique(directed_search)
Finally, to start the exploration:
The simulation manager operates using stashes
. You can imagine a stash as a "bucket" of states generated by the simulation manager during symbolic execution. For example, the stash active
contains all the states that are currently "active" in the simulation manager.
The simulation manager will process (or "step") every state in the active
stash individually to create a successor state.
If the state has no successors, the simulation manager will move the state to the deadended
stash.
If the execution instead terminates with a REVERT
opcode, the state will be moved to the errored
stash.
Exploration techniques can defined new stashes to implement more complicated program exploration strategies.
Solver
The default solver employed by greed is Yices2 with QF_ABV
logic.
We made this decision based on the impressive results obtained in the latest SMT solving competition, and based on the results showed by Frank et al. in ETHBMC.
That being said, greed offers a modular architecure and implementing support for a new solver backend is quite straigthforward.
By default, we give unlimited time to the solver to solve the constraints. However, it can be sometimes helpful to timeout after a certain amount of time. You can do this by using the option SOLVER_TIMEOUT
. Keep in mind that, when the timeout fires, the state will be reported as errored
.