Commit Graph

177 Commits

Author SHA1 Message Date
Jane Lusby 431f194c0f
propagate errors out of zebra_network::init (#435)
Prior to this change, the service returned by `zebra_network::init` would spawn background tasks that could silently fail, causing unexpected errors in the zebra_network service.

This change modifies the `PeerSet` that backs `zebra_network::init` to store all of the `JoinHandle`s for each background task it depends on. The `PeerSet` then checks this set of futures to see if any of them have exited with an error or a panic, and if they have it returns the error as part of `poll_ready`.
2020-06-09 12:24:28 -07:00
Jane Lusby 8c178c3ee4
fix panic in seed subcommand (#401)
Co-authored-by: Jane Lusby <jane@zfnd.org>

Prior to this change, the seed subcommand would consistently encounter a panic in one of the background tasks, but would continue running after the panic. This is indicative of two bugs. 

First, zebrad was not configured to treat panics as non recoverable and instead defaulted to the tokio defaults, which are to catch panics in tasks and return them via the join handle if available, or to print them if the join handle has been discarded. This is likely a poor fit for zebrad as an application, we do not need to maximize uptime or minimize the extent of an outage should one of our tasks / services start encountering panics. Ignoring a panic increases our risk of observing invalid state, causing all sorts of wild and bad bugs. To deal with this we've switched the default panic behavior from `unwind` to `abort`. This makes panics fail immediately and take down the entire application, regardless of where they occur, which is consistent with our treatment of misbehaving connections.

The second bug is the panic itself. This was triggered by a duplicate entry in the initial_peers set. To fix this we've switched the storage for the peers from a `Vec` to a `HashSet`, which has similar properties but guarantees uniqueness of its keys.
2020-05-27 17:40:12 -07:00
Jane Lusby b6b35364f3 cleanup warnings throughout codebase 2020-05-27 15:42:29 -04:00
Henry de Valence 3ed75cb626 Tweak peer set metrics.
- Add a total peers metric to prevent races between measurements of
  ready/unready peers (which can cause the sum to be wrong).
- Add an outbound request counter.
2020-02-21 06:48:25 -05:00
Henry de Valence 43b2d35dda Crawl for more peers when we exhaust candidates. 2020-02-21 06:48:25 -05:00
Henry de Valence afa2c2347f fmt 2020-02-21 06:48:25 -05:00
Henry de Valence 00edcae0c2 Add metrics for the crawler and candidate set. 2020-02-14 20:14:05 -05:00
Henry de Valence 75d3d44fb3 Metrics MVP: add two metrics and export them to Prometheus.
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2020-02-14 20:14:05 -05:00
Henry de Valence 8000f888fd Connect to multiple peers concurrently.
The previous outbound peer connection logic got requests to connect to new
peers and processed them one at a time, making single connection attempts
and retrying if the connection attempt failed.  This was quite slow, because
many connections fail, and we have to wait for timeouts.  Instead, this logic
connects to new peers concurrently (up to 50 at a time).
2020-02-14 18:23:41 -05:00
Henry de Valence 2c0f48b587 Refactor connection logic and try a block request.
Attempting to implement requests for block data revealed a problem with
the previous connection logic.  Block data is requested by sending a
`getdata` message with hashes of the requested blocks; the peer responds
with a sequence of `block` messages with the blocks themselves.

However, this wasn't possible to handle with the previous connection
logic, which could only convert a single Bitcoin message into a
Response.  Instead, we factor out the message handling logic into a
Handler, which can statefully accumulate arbitrary data into a Response
and signal completion.  This is still pretty ugly but it does work.

As a side effect, the HeartbeatNonceMismatch error is removed; because
the Handler now tries to process messages until it comes to a Response,
it just ignores mismatched nonces (and will eventually time out).

The previous Mempool and Transaction requests were removed but could be
re-added in a different form later.  Also, the `Get` prefixes are
removed from `Request` to tidy the name.
2020-02-10 09:03:56 -08:00
Henry de Valence 8d58dd804f Note that tracing causes clippy false positives
Thanks @hawkw for pointing this out.
2020-02-05 12:42:32 -08:00
Henry de Valence f04f4f0b98 Apply clippy fixes 2020-02-05 12:42:32 -08:00
Henry de Valence 2965187b91 Upgrade tokio, futures, hyper to released versions. 2019-12-13 17:42:15 -05:00
Henry de Valence 36cd6d6e06 cargo fmt 2019-11-27 23:53:36 -05:00
Henry de Valence ad6525574b Rename PeerConnector -> peer::Connector 2019-11-27 23:53:36 -05:00
Henry de Valence 778e49b127 Rename PeerHandshake -> peer::Handshake 2019-11-27 23:53:36 -05:00
Henry de Valence 77191e62f6 Remove outdated fixup note. 2019-11-27 23:53:36 -05:00
Henry de Valence da78603d3a Rename `PeerClient` to `peer::Client`. 2019-11-27 23:53:36 -05:00
Henry de Valence 4fbc8270a2 Move PeerSet initialization into a submodule. 2019-11-27 05:06:01 -05:00
Henry de Valence 47ec2e2689 Remove stub discover module. 2019-10-22 19:06:08 -07:00
Henry de Valence c3ec235a5b Suppress unused import warnings. 2019-10-22 19:06:08 -07:00
Henry de Valence ed2ee9d42f Add a PeerConnector wrapper around PeerHandshake 2019-10-22 19:06:08 -07:00
Henry de Valence e1a35490af Move the CandidateSet to its own file.
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2019-10-22 19:06:08 -07:00
Henry de Valence b1832ce593 Initial work to add a crawl-and-dial task.
This responds to peerset demand by connecting to additional peers.

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2019-10-22 19:06:08 -07:00
Henry de Valence 5847b490da Move PeerSet setup logic into a peer_set::init() 2019-10-18 16:11:01 -07:00
Henry de Valence f6e62b0f5e Remove failure from zebra-chain, zebra-network.
Failure uses a distinct Fail trait rather than the standard library's
Error trait, which causes a lot of interoperability problems with tower
and other Error-using crates.  Since failure was created, the standard
library's Error trait was improved, and its conveniences are now
available without the custom Fail trait using `thiserror` (for easy
error derives) and `anyhow` (for a better boxed Error).
2019-10-16 13:16:52 -04:00
Henry de Valence ae1a164ff8
Beginning of peerset implementation. (#62)
* Don't expose submodules of zebra_network::peer.

* PeerSet, PeerDiscover stubs.

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>

* Initial work on PeerSet.

This is adapted from the MIT-licensed tower-balance implementation.

* Use PeerSet in the connect stub.
2019-10-10 18:15:24 -07:00