[EthMag] Proposed scheme for encoding Ethereum state into a Verkle tree

Cryptocurrency News and Public Mining Pools

[EthMag] Proposed scheme for encoding Ethereum state into a Verkle tree

[EthMag] Proposed scheme for encoding Ethereum state into a Verkle tree

Link: https://ethereum-magicians.org/t/proposed-verkle-tree-scheme-for-ethereum-state/5805

One of the things that I want to do at the same time as moving the current Ethereum state to a Verkle tree to cut witness sizes is to move it from the current 2-layer "trie inside a trie" structure to a single-layer structure.

You may have seen diagrams of the Ethereum state like this before:

https://preview.redd.it/paa4fwmlk3471.png?width=1488&format=png&auto=webp&s=c2bd8d1c96c15f24bfca6fbef65ab11b4134f469

The state consists of a set of accounts, where each account contains a header. Each header in turn contains a pointer to a piece of code, and a root of another whole account-specific tree. This structure has a few really annoying problems:

  1. It's complicated. There's a lot of extra code that needs to be written to handle the tree-inside-a-tree mechanism, as well as the special cases for accessing things in the account header that are not storage (ie. nonce/balance/code). This complexity appears in many places: database reads/writes, Merkle proof construction, Merkle proof verification, syncing, caching…
  2. It's unbalanced: there are individual subtrees that have a huge amount of data in them. This is annoying for state sync protocols, because you can't make any assumption that what looks like a "small portion" of the state (eg. all objects starting with 0x7A5) will actually be small
  3. It doubles worst-case tree attack lengths
  4. It becomes even more complicated to figure out the interplay between state and mechanisms such as state expiry. Anything that manipulates the state piece-by-piece would need separate logic for account-level and storage-slot-level manipulations

I propose an alternative, which fits the entire Ethereum state into a single key:value trie. That is, every position in the state (eg. (address, storage_slot), (address, NONCE), (address, code_chunk_index)) gets mapped to a single 32-byte key, which is where it is stored in the trie. As a witness-size-optimization feature, the trie is designed in such a way that the last byte of the key is always a separate commitment (Verkle trees are made up of commitments, where each commitment can have many children, in our case up to 256). Here's an example of how this might look:

https://preview.redd.it/0uqek39cp3471.png?width=700&format=png&auto=webp&s=5978af80832b143d0d5caf7085a92c37612a645c

There is no "trie of tries" design, and there is also no special serialization for a header; instead, header and code are both mapped to keys, just like storage. Values which share the first 31 bytes of their key are put into the same bottom-layer commitment; this saves witness space for common use cases (verifying many header fields or code chunks or adjacent storage slots). Note that accounts will no longer be "under one subtree" the way they are today; different parts of the storage of an account will be in different locations in the state trie.

For the sake of illustration, here's part of the code in my proposal, which shows how values inside addresses will get mapped to trie keys:

def get_tree_key(address: Address, tree_index: int, sub_index: int): return ( hash(address + tree_index.to_bytes(32, 'big'))[:31] + bytes([sub_index]) ) def get_tree_key_for_balance(address: Address): return get_tree_key(address, 0, 1) def get_tree_key_for_nonce(address: Address): return get_tree_key(address, 0, 2) def get_storage_slot_tree_key(address: Address, storage_key: int): if storage_key < 64: return get_tree_key(address, 0, 64 + storage_key) else: return get_tree_key(address, 2**248 + storage_key // 256, storage_key % 256) 

Notice in particular how values in the header get mapped to trie keys that share the first 31 bytes, so they are part of the same bottom-level commitment, and adjacent storage slots (and code too) benefit from the same treatment.

Additionally, storage slots 0…63 have the privilege of being part of the same bottom-level commitment as the header, so witness size for accessing these frequently used slots is really low. Gas costs (see next section) will be adjusted to take advantage of this; the current proposal charges a flat 200 gas per slot for these first 64 slots.

Witness gas cost reform

An important reform that needs to happen alongside this is gas cost reform, finishing the work that was started in EIP 2929 by making gas costs closely map to witness data usage in all cases. This allows us to ensure tight witness size bounds for Ethereum blocks (even including worst-case attacks on the trie that require 2**80 computing power, we're looking at ~6 MB witnesses; under more "normal" circumstances witness size will be <2 MB).

A necessary part of this is introducing gas costs for accessing code chunks (see this proto-EIP; the current plan is ~200 gas per 31-byte chunk). That proto-EIP goes further, however, and harmonizes gas costs more generally so that they align with this trie structure: 1900 gas for a new branch proof, and 200 gas for a new value in the same bottom-level commitment.

Some preliminary analysis of code chunk gas costs have shown that this may increase contract calling cost by ~10%. However, this new gas scheme will also introduce some important benefits for developers. A notable benefit is that gas costs for accessing many adjacent storage slots (eg. single persistent variables in Solidity are typically mapped to storage slots 0, 1, 2….n) will drop from ~2100 per storage slot to ~200 per storage slot after the first. Hence, well-designed contracts could easily get savings that make up for the increased cost of code access. Additionally, the per-chunk and per-subtree costs mean that contracts that call to libraries will no longer be much less efficient than large contracts that keep all their code together in one piece.

submitted by /u/vbuterin
[link] [comments]