Discussion on KIP-237

Summary of KIP-237

Over time, node operators face increasing challenges with disk storage as blockchain data progressively consumes more space. This often necessitates costly migrations to larger disks. Kaia has focused on optimizing storage, particularly for the state trie, by introducing two methods: state migration and live pruning. Now, KIP-237 extends these efforts to further reduce storage consumption, targeting three additional data types—header, body, and receipt. A preliminary experiment against a full mainnet node has shown that KIP-237 can cut total disk usage by 50% within just three days of compression in the background.

Technical Outlook

The key idea behind reducing storage for headers, bodies, and receipts lies in compression algorithms. Unlike state trie optimization, all node types retain full copies of these components. Over time, older data is accessed less frequently. KIP-237 addresses this by compressing cold data at runtime without causing downtime. API calls, such as klay_getTransactionByHash, request compressed data; it is decompressed on demand and temporarily cached for faster future access.
For frequently accessed data, such as recent blocks, the compression module bypasses compression to ensure fast API responses.

Implication

When a node starts with the compression flag enabled, it operates normally while compression runs in the background. Throughout the process, all data remains fully accessible without downtime.

How To Use

Similar to live pruning, new flags have been introduced, with execution details will be outlined in KIP-237.

FAQ

  • Q: Are old data deleted?
    A: No, KIP-237 is a lossless optimization. All data is preserved and restructured using compression.
  • Q: Can I configure compression options such as the algorithm, chunk size, and retention?
    A: No, these options are currently fixed based on preliminary experiments for generally optimal usage.
  • Q: Is it possible to fully recover compressed data as if compression was never applied?
    A: Not yet supported, but open for discussion with reasonable justification.
  • Q: Can I enable or disable the compression feature at any time?
    A: Yes, you can toggle it on or off whenever needed.
  • Q: Does compression require higher system specifications than the recommended requirements?
    A: No, the compression overhead is negligible, so your existing machine should be sufficient.

Idea

The Kaia team is dedicated to continually reducing storage usage and is open to contributions with your innovative ideas.

1개의 좋아요

This would save storage cost of node with minimal impact on operation.

While fixed options simplify implementation, allowing some degree of configurability (e.g., algorithm selection or chunk size) could cater to diverse use cases and hardware setups. For example, operators with high-performance machines might prefer more aggressive compression settings.

The inability to fully recover compressed data as if compression was never applied could be a limitation for certain use cases, such as forensic analysis or debugging. Adding this feature in future iterations would enhance the flexibility and appeal of the proposal.

Compression introduces additional processing steps, which could potentially expose vulnerabilities (e.g., decompression bombs). Ensuring robust security measures during implementation will be critical.

Hi @Zarilex,

[Configuration space]
The suggested argument makes sense. However, we are concerned about the potential misuse of configurations for compression purposes. In some software and libraries, the available options are intentionally limited to prevent misbehavior without a deep understanding of the underlying mechanisms, so only general settings are provided. If more control is needed, a manual patch could be a viable approach.

[Revert]
Compressed data retains the original information but appears in a different format before decompression. From a forensic analysis perspective, the decompressed data format is not significantly problematic since it is merely another representation of the original data. For debugging purposes, I believe working directly with the raw (compressed) data—either through manual code patches or third-party libraries—is preferable. In this case, querying a specific range of blocks to collect decompressed data follows a similar approach and may be more convenient.

[Vulnerability]
Compression and decompression are applied to local data only, so I believe this new module will not introduce an additional attack vector.

Thank you for your comments. Please let me know if there are any further items to address.

1개의 좋아요