Geth v1.13 is being released shortly after the v1.12 release, even though its main feature has been in development for 6 years. The new version introduces a new database model for storing the Ethereum state, which improves speed and includes proper pruning. This means that there will be no more junk accumulating on disk and no need for offline pruning. Excluding approximately 589GB of ancient data, which is the same across all configurations, the hash scheme for full sync exceeded our 1.8TB SSD at block ~15.43M. The difference in size compared to snap sync is due to compaction overhead.
The new data model was implemented because the old way of storing the Ethereum state did not allow for efficient pruning. We had to make several changes to Geth’s codebase to enable proper pruning. The previous method had hacks and tricks to slow down junk accumulation, but it was not a satisfactory solution.
In the new data model, state trie nodes are stored based on their path, not their hash. This change allows for better pruning as nodes with the same content but different paths are stored separately. Additionally, multiple state tries can be stored in the database, which introduces a different form of deduplication. Previously, the database could only contain one state trie at a time, which made pruning difficult.
To handle potential side-chain switches and optimize performance, Geth’s persistent state does not track the chain head directly. Instead, it maintains the trie changes done in the last 128 blocks in memory. Multiple competing branches are tracked in memory, and as the chain moves forward, the oldest diff layer is flattened down. This allows for fast reorganizations within the top 128 blocks. Geth also has a dirty cache between the persistent state and the diff layers to accumulate writes and optimize disk usage.
In cases where deep reorganizations are required, Geth introduces reverse diffs. These diffs allow for converting the post-state of a block back to its pre-state. The last 90,000 reverse diffs are stored on disk, and Geth can use them to switch to a different side-chain and process blocks on top if needed.
Given the significant changes in Geth’s internals, Geth v1.13.0 offers two modes of operation. The old data model is still supported and remains the default. However, users can switch to the new data model by resyncing the state (ancient data can be kept). To do this, users can manually resync or use the command “geth removedb” to delete the state database while keeping the ancient database. Afterward, Geth should be started with the “–state.scheme=path” flag. The path model is not the default yet, but if a previous database exists and no state scheme is specified, Geth will use the existing database. It is recommended to always specify “–state.scheme=path” to ensure compatibility. If no major issues are found with the path model implementation, Geth v1.14.x will likely switch to it as the default format.
For those running private Geth networks using “geth init,” it is necessary to specify the “–state.scheme” for the init step to avoid creating an old-style database. The new data model is compatible with archive nodes for archive node operators.
Overall, the introduction of the new data model and pruning capabilities in Geth v1.13 represents a significant advancement for Ethereum state storage and ensures more efficient and optimized operations.