Bug Fix: On-Demand Backfill Ignores StartBlock On New BN
Introduction
In the realm of blockchain technology, ensuring the integrity and consistency of data across all nodes is paramount. Block nodes (BNs) play a crucial role in maintaining this data, and the process of backfilling ensures that new or lagging nodes catch up with the latest state of the blockchain. However, a recent bug has been identified in the hiero-ledger system, specifically within the on-demand backfill mechanism. This bug causes a new or empty block node to disregard the configured startBlock property, leading to inefficient and potentially problematic backfilling from block 0.
Understanding the Problem: The startBlock Configuration
The backfill.startBlock configuration property is designed to specify the block number from which a block node should begin its backfilling process. This is particularly useful when a node doesn't need to synchronize from the very beginning of the blockchain, saving time and resources. However, when a new block node with empty storage encounters a TOO_FAR_BEHIND message, it initiates backfilling from the LatestAvailableBlock up to the LatestBlockKnownToNetwork. In the case of an empty block, LatestAvailableBlock defaults to block 0, effectively ignoring the backfill.startBlock configuration.
This behavior can lead to several issues. Firstly, it wastes computational resources by processing blocks that are not relevant to the node's intended state. Secondly, it prolongs the synchronization process, delaying the node's readiness to participate in the network. Finally, it can potentially introduce inconsistencies if the node relies on a specific starting point for its operations. Therefore, addressing this bug is crucial to ensure the efficient and reliable operation of block nodes in the hiero-ledger system. The current implementation overlooks the configured backfill.startBlock value when a new Block Node (BN) with empty storage needs to backfill. Ideally, the system should respect this configuration to optimize the backfilling process, avoiding unnecessary synchronization from block 0.
Steps to Reproduce the Bug
To demonstrate this bug, follow these steps:
- Set up a source block node (BN1): Create a block node (BN1) containing blocks within a specific range, for example, from block 1000 to 2000. This will serve as the source of blocks for the backfilling process.
- Create a new, empty block node (BN2): Set up a new block node (BN2) with empty storage. This represents the scenario where a node is starting fresh and needs to synchronize with the network.
- Configure backfilling: Configure BN2 to backfill its data from BN1. This establishes the connection between the two nodes for the purpose of data synchronization.
- Disable greedy backfill and block streaming: Disable the greedy backfill feature and prevent the streaming of blocks to BN2. This ensures that the backfilling process is triggered manually, allowing for controlled testing.
- Manually trigger the
TOO_FAR_BEHINDmessage: Simulate a scenario where BN2 is significantly behind the current state of the network by manually sending aTOO_FAR_BEHINDmessage to BN2. This will initiate the on-demand backfill process. - Observe the logs: Examine the logs of BN2 to observe the starting block number of the backfilling process. You will notice that it starts from block 0, despite the
backfill.startBlockproperty being configured to a different value. This confirms the existence of the bug. This behavior deviates from the expected behavior, where the backfilling process should respect thebackfill.startBlockconfiguration and start from the specified block number.
Root Cause Analysis
The root cause of this bug lies in the logic that determines the starting block for the on-demand backfill process. When a new block node with empty storage receives a TOO_FAR_BEHIND message, it defaults to starting the backfill from block 0, regardless of the backfill.startBlock configuration. This occurs because the system prioritizes the LatestAvailableBlock (which is 0 for an empty node) over the configured startBlock value. The absence of a mechanism to compare and select the larger of the two values leads to the incorrect starting block being used. The issue arises specifically when the block node is new and has no existing blocks. In such cases, the LatestAvailableBlock defaults to 0, overriding the configured backfill.startBlock value. This behavior is not aligned with the intended functionality of the backfill.startBlock configuration, which is designed to allow administrators to specify a specific starting point for the backfilling process.
Proposed Solution
To address this bug, the on-demand backfill logic needs to be modified to ensure that the backfill.startBlock configuration is respected even when the block node is new and empty. The proposed solution involves comparing the LatestAvailableBlock with the backfill.startBlock value and selecting the larger of the two as the starting block for the backfilling process. This can be achieved by implementing a simple Math.max() comparison, as demonstrated in the following code snippet:
// we should create new Gap and a new task to backfill it
long lastPersistedBlock =
context.historicalBlockProvider().availableBlocks().max();
// if lastPersistedBlock is less than backfill start block, we start from backfill start block
long startBackfillFrom = Math.max(lastPersistedBlock, backfillConfiguration.startBlock());
long newestBlockKnown = notification.blockNumber();
LongRange gap = new LongRange(startBackfillFrom, newestBlockKnown);
This code snippet first retrieves the LatestAvailableBlock from the historical block provider. It then compares this value with the backfill.startBlock configuration using Math.max(). The larger of the two values is then assigned to the startBackfillFrom variable, which is subsequently used as the starting block for the backfilling process. By implementing this change, the on-demand backfill mechanism will correctly respect the backfill.startBlock configuration, even when the block node is new and empty.
Benefits of the Solution
Implementing the proposed solution offers several benefits:
- Efficient backfilling: By starting the backfilling process from the correct block number, the solution avoids unnecessary synchronization of irrelevant blocks, saving computational resources and time.
- Consistent node state: By respecting the
backfill.startBlockconfiguration, the solution ensures that the block node starts with the intended state, preventing potential inconsistencies. - Improved node readiness: By reducing the time required for backfilling, the solution allows the block node to become ready to participate in the network more quickly.
Conclusion
The bug in the on-demand backfill mechanism, which causes new and empty block nodes to ignore the backfill.startBlock configuration, can lead to inefficient and potentially problematic behavior. By implementing the proposed solution, which involves comparing the LatestAvailableBlock with the backfill.startBlock value and selecting the larger of the two, the on-demand backfill mechanism can be corrected to ensure efficient, consistent, and timely synchronization of block nodes. This fix is crucial for maintaining the integrity and reliability of the hiero-ledger system.
For more information on blockchain backfilling, you can visit Bitcoin Wiki.