Zebra/zebra-network/src
Henry de Valence cad38415b2
network: fix bug in inventory advertisement handling (#1022)
* network: fix bug in inventory advertisement handling

The RFC https://zebra.zfnd.org/dev/rfcs/0003-inventory-tracking.html described
the use of a `broadcast` channel in place of an `mpsc` channel to get
ring-buffer behavior, keeping a bound on the size of the channel but dropping
old entries when the channel is full.

However, it didn't explicitly describe how this works (the `broadcast` channel
returns a `RecvError::Lagged(u64)` to inform receivers that they lost
messages), so the lag-handling wasn't implemented and I didn't notice in
review.

Instead, the ? operator bubbled the lag error all the way up from
`InventoryRegistry::poll_inventory` through `<PeerSet as Service>::poll_ready`
through various Tower wrappers to users of the peer set.  The error propagation
is bad enough, because it caused client errors that shouldn't have happened,
but there's a worse interaction.

The `Service` contract distinguishes between request errors (from
`Service::call`, scoped to the request) and service errors (from
`Service::poll_ready`, scoped to the service).  The `Service` contract
specifies that once a service returns an error from `poll_ready`, the service
can be assumed to be failed permanently.

I believe (but haven't tested or carefully worked through the details) that
this caused various tower middleware to report the entire peer set service as
permanently failed due to a transient inventory "error" (more of an indicator),
and I suspect that this is the cause of #1003, where all of the sync
component's requests end up failing because the peer set reported that it
failed permanently.  I am able to reproduce #1003 locally before this change
and unable to reproduce it locally after this change, though I have not tested
exhaustively.

* network: add metric for dropped inventory advertisements

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2020-09-07 21:24:31 -07:00
..
peer Use ok_or for constants, rather than a redudant closure 2020-09-02 14:26:26 +10:00
peer_set network: fix bug in inventory advertisement handling (#1022) 2020-09-07 21:24:31 -07:00
protocol Rename old references to BlockHeaderHash and BlockHeight (#1002) 2020-09-04 15:40:48 -07:00
address_book.rs network: add AddressBook::potentially_connected_peers(). 2020-09-07 11:13:15 -07:00
config.rs chain: move Network, NetworkUpgrade to parameters 2020-08-17 11:46:34 -07:00
constants.rs chain: move Network, NetworkUpgrade to parameters 2020-08-17 11:46:34 -07:00
lib.rs fix: Split a clippy allow, so its comment is clearer 2020-09-01 11:40:18 -04:00
meta_addr.rs cleanup warnings throughout codebase 2020-05-27 15:42:29 -04:00
peer.rs Move server.rs to connection.rs and change imports. 2020-01-16 13:20:03 -05:00
peer_set.rs Implement Inventory Tracking RFC (#963) 2020-09-01 14:28:54 -07:00
policies.rs Fix sync algorithm. (#887) 2020-08-12 16:48:01 -07:00
protocol.rs Refactor protocol into internal, external modules. 2019-11-27 05:06:01 -05:00
timestamp_collector.rs Upgrade tokio, futures, hyper to released versions. 2019-12-13 17:42:15 -05:00