Zebra

Commit Graph

Author	SHA1	Message	Date
Janito Vaqueiro Ferreira Filho	1010773b05	Keep track of background peer tasks (#3253 ) * Refactor to create heartbeat sender function Move the code that's part of the heartbeat task into a separate helper function. * Move `Client` initialization down Keep it closer to where it's actually used, and make it easier to add new fields to `Client` for the connection and heartbeat tasks. * Add background task handles to `Client` type Prepare it to be able to check for panics or errors from the background tasks. * Add dummy background tasks to `ClientTestHarness` Spawn simple timeout tasks as mock connection and heartbeat tasks. * Fix `PeerSet` tests that use `ClientTestHarness` Building a `ClientTestHarness` requires a Tokio runtime to be set up, so the calls were moved into the `async` block. * Refactor to create `set_task_exited_error` Make the code reusable for both background tasks. * Check heartbeat task for errors Periodically poll it to check if the task has unexpectedly stopped. * Check if connection background task has stopped The client service should stop if the connection background task has exited, because then it's not able to receive any replies. * Allow aborting mocked `Client` background tasks Wrap the background tasks in `Abortable`, so that they can be aborted through the `ClientTestHarness`. * Test if stopped connection task is detected Check that stopping the background connection task is something that the `Client` instance detects and handles correctly. * Test if stopped heartbeat task is detected Check that stopping the background heartbeat task is something that the `Client` instance detects and handles correctly. * Allow setting custom background tasks Will be used later to create background tasks that panic. * Test if `Client` handles panics in connection task Use a mock background connection task that panics immediately, and check that the `Client` handles it gracefully. * Test if `Client` handles panics in heartbeat task Use a mock background heartbeat task that panics immediately, and check that the `Client` handles it gracefully. * Change ticket referenced by `TODO` The previously linked issue was a broad plan to improve Zebra's shutdown behavior, while the new issue is more specific, and can be scheduled sooner. Co-authored-by: teor <teor@riseup.net> Co-authored-by: teor <teor@riseup.net>	2021-12-22 01:35:38 +00:00
teor	6814525a7a	Update async correctness docs and the async in Zebra RFC (#3243 ) * Justify that the ErrorSlot Mutex is deadlock-safe * Document cancellation safety in the async RFC * Document task starvation in the async RFC Co-authored-by: Marek <mail@marek.onl>	2021-12-21 07:10:15 +00:00
teor	f176bb59a2	Stop ignoring some connection errors that could make the peer set hang (#3200 ) * Drop peer services if their cancel handles are dropped * Exit the client task if the heartbeat task exits * Allow multiple errors on a connection without panicking * Explain why we don't need to send an error when the request is cancelled * Document connection fields * Make sure connections don't hang due to spurious timer or channel usage * Actually shut down the client when the heartbeat task exits * Add tests for unready services * Close all senders to peer when `Client` is dropped * Return a Client error if the error slot has an error * Add tests for peer Client service errors * Make Client drop and error cleanups consistent * Use a ClientDropped error when the Client struct is dropped * Test channel and error state in peer Client tests * Move all Connection cleanup into a single method * Add tests for Connection * fix typo in comment Co-authored-by: Conrado Gouvea <conrado@zfnd.org> Co-authored-by: Conrado Gouvea <conrado@zfnd.org> Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-12-15 14:52:44 +00:00
teor	1835ec2c8d	Add diagnostics for peer set hangs (#3203 ) * Use a named CancelHeartbeatTask unit struct for the channel type * Prefer cancel handles in selects, if both are ready * Fix message metrics to just show the command name * Add metrics for internal requests and responses * Add internal requests and responses to the messages dashboard * Add a canceled metric, and peer addresses to request and response metrics * Add a canceled messages graph * Add connection state metrics for currently open connections * Fix the connection state graph with new metrics * Always send an error before dropping pending responses * Move error detail logging into `fail_with` * Delete an unused timer future * Make error strings in metrics less verbose * Downgrade some error logs to info * Remove a redundant expect * Avoid unnecessary allocations for connection state metrics * Fix missed updates to mempool and block gossip metrics	2021-12-14 21:11:03 +00:00
teor	a92c431c03	Ignore NotFound errors in the syncer (#3131 )	2021-12-02 11:28:20 -03:00
teor	ab471b0db0	Revert "Stop returning NotFound errors, use the response instead" (#3124 ) * Revert "Stop returning NotFound errors, use the response instead" This reverts commit 45871f6915c0b294502bf04917c42fdcd3b1075c. * Fix clippy warnings * Downgrade a frequent log to debug level	2021-12-01 05:09:54 +00:00
teor	a358c410f5	Stop closing connections on unexpected messages, Credit: Equilibrium (#3120 ) * Ignore unsupported messages from peers * Ignore unknown message commands from peers * Implement Display for Request, Response, Handler, connection::State * Stop ignoring some completed `Response`s * Stop returning NotFound errors, use the response instead Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-11-30 19:26:17 +00:00
teor	7457edcb86	Stop asking users to report peer errors, fix a common peer error (#3054 ) * Stop treating inv with mixed item types as a connection error * Remove unused connection errors * Stop asking users to create bug reports for peer errors	2021-11-15 11:32:18 -03:00
teor	3e03d48799	Limit the number of outbound peer connections (#2944 ) * Limit the number of outbound connections in the crawler * Make zebra-network channel bounds depend on config.peerset_initial_target_size * Bias Zebra towards outbound connections And turn connection limits into `Config` methods. * Downgrade some connection logs to debug * Remove verbose or outdated fields in tracing logs * Clarify connection limits Includes: - `fastmod OUTBOUND_PEER_BIAS_FRACTION OUTBOUND_PEER_BIAS_DENOMINATOR zebra` - clarify connection limit documentation Clarify inventory channel capacity * Add zebra_network::initialize tests with limited numbers of peers * Avoid cooperative async task starvation in the peer crawler and listener If we don't yield in these loops, they can run for a long time before tokio forces them to yield. * Test the crawler with small connection limits And use the multi-threaded runtime to avoid long hangs. * Stop using the multi-threaded executor in tests where it's not needed * Avoid starvation for every connection Adds yields after inbound successes and initial peer connections. * Add a crawler peer connection success test * Add outbound connection limit tests * Improve outbound tests	2021-10-27 21:28:51 +00:00
teor	c8ad19080a	Improve logging for initial peer connections (#2896 ) Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-10-18 18:43:12 +00:00
teor	905b90d6a1	Refactor and document correctness for std::sync::Mutex in ErrorSlot	2021-04-21 16:39:06 -04:00
teor	375c8d8700	Fix a deadlock between the crawler and dialer, and other hangs (#1950 ) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.	2021-04-07 10:25:10 -03:00
teor	72e2e83828	Revert "introduce Transition enum" This reverts commit `6906f87ead`.	2021-02-24 13:07:31 -08:00
teor	359015b2be	Revert "Only reject pending client requests when the peer has errored" This reverts commit `e06705ed81`.	2021-02-24 13:07:31 -08:00
teor	1a70d807b6	Revert "make sure peer/error.s comments are up to date" This reverts commit `6f205a1812`.	2021-02-24 13:07:31 -08:00
Jane Lusby	6f205a1812	make sure peer/error.s comments are up to date	2021-02-19 14:11:35 -08:00
teor	e06705ed81	Only reject pending client requests when the peer has errored - Add an `ExitClient` transition, used when the internal client channel is closed or dropped, and there are no more pending requests - Ignore pending requests after an `ExitClient` transition - Reject pending requests when the peer has caused an error (the `Exit` and `ExitRequest` transitions) - Remove `PeerError::ConnectionDropped`, because it is now handled by `ExitClient`. (Which is an internal error, not a peer error.)	2021-02-19 14:11:35 -08:00
Jane Lusby	6906f87ead	introduce Transition enum	2021-02-19 14:11:35 -08:00
Henry de Valence	f93deb1cac	network: fix missing {0} in PeerError::Serialization	2020-12-01 19:16:41 -08:00
Henry de Valence	4df5632752	network: handle Message::NotFound as a response This cleans up the response processing logic a little bit along the way, but the overall division of responsibility should be better documented in a future commit.	2020-09-20 10:21:18 -07:00
Henry de Valence	3c993f33b1	network: add PeerError::WrongMessage This lets us distinguish between cases where the message was unsupported (e.g., BIP11 messages), and cases where the message was uninterpretable in context (e.g., unsolicited messages).	2020-09-20 10:21:18 -07:00
Henry de Valence	4a41c9254d	network: avoid panic when shutting down cleanly. When the connection sees the client_rx channel close it knows it will never get any more requests, and it should terminate. But instead of terminating, it errored itself, and the method to error itself tries to pull all the outstanding client requests from the channel in order to fail them before it shuts down. This results in reading from a closed channel, causing a panic. Instead we return cleanly rather than failing (since we know there are no outstanding requests, as the channel is closed).	2020-07-22 18:04:45 +10:00
Jane Lusby	df18ac72c5	fix sharedpeererror to propagate tracing context	2020-06-17 14:38:26 -07:00
Jane Lusby	4b9e4520ce	cleanup API for arc based error type (#469 ) Co-authored-by: Jane Lusby <jane@zfnd.org>	2020-06-12 11:29:42 -07:00
Jane Lusby	8276bed400	reinstate reject error variant	2020-05-27 15:42:29 -04:00
Jane Lusby	b6b35364f3	cleanup warnings throughout codebase	2020-05-27 15:42:29 -04:00
George Tankersley	df79fa75e0	Implement minimal version handshaking (#295 ) Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com> Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>	2020-04-13 18:33:15 -04:00
Deirdre Connolly	8c0b00109f	Remove PeerError::DeadServer, unused, unneeded Resolves #251	2020-03-12 16:23:08 -04:00
Henry de Valence	7049f9d891	Add a FindBlocks request to get initial block hashes. Bitcoin does this either with `getblocks` (returns up to 500 following block hashes) or `getheaders` (returns up to 2000 following block headers, not just hashes). However, Bitcoin headers are much smaller than Zcash headers, which contain a giant Equihash solution block, and many Zcash blocks don't have many transactions in them, so the block header is often similarly sized to the block itself. Because we're aiming to have a highly parallel network layer, it seems better to use `getblocks` to implement `FindBlocks` (which is necessarily sequential) and parallelize the processing of the block downloads.	2020-02-14 18:23:41 -05:00
Henry de Valence	2c0f48b587	Refactor connection logic and try a block request. Attempting to implement requests for block data revealed a problem with the previous connection logic. Block data is requested by sending a `getdata` message with hashes of the requested blocks; the peer responds with a sequence of `block` messages with the blocks themselves. However, this wasn't possible to handle with the previous connection logic, which could only convert a single Bitcoin message into a Response. Instead, we factor out the message handling logic into a Handler, which can statefully accumulate arbitrary data into a Response and signal completion. This is still pretty ugly but it does work. As a side effect, the HeartbeatNonceMismatch error is removed; because the Handler now tries to process messages until it comes to a Response, it just ignores mismatched nonces (and will eventually time out). The previous Mempool and Transaction requests were removed but could be re-added in a different form later. Also, the `Get` prefixes are removed from `Request` to tidy the name.	2020-02-10 09:03:56 -08:00
Henry de Valence	f04f4f0b98	Apply clippy fixes	2020-02-05 12:42:32 -08:00
Deirdre Connolly	82e246d87b	Merge pull request #135 from ZcashFoundation/130 On receipt of a Filter(Load\|Add\|Clear) message, disconnect from peer	2019-12-05 14:06:05 -05:00
Henry de Valence	d1b3e8fe6b	Rename PeerServer -> peer::Server	2019-11-27 23:53:36 -05:00
Henry de Valence	da78603d3a	Rename `PeerClient` to `peer::Client`.	2019-11-27 23:53:36 -05:00
Henry de Valence	6db852fab2	Refactor protocol into internal, external modules. This commit just moves things around and patches import paths.	2019-11-27 05:06:01 -05:00
Deirdre Connolly	49c5265d41	Add Rejected variant to PeerError enum, for now	2019-11-26 19:35:49 -05:00
Henry de Valence	ed2ee9d42f	Add a PeerConnector wrapper around PeerHandshake	2019-10-22 19:06:08 -07:00
Deirdre Connolly	adffc4239d	Partially complete heartbeats to peer	2019-10-21 15:55:18 -04:00
Henry de Valence	db7ac53f3b	Add a Mutex<HashSet<Nonce>> to detect self-conns.	2019-10-17 09:34:18 -07:00
Henry de Valence	f6e62b0f5e	Remove failure from zebra-chain, zebra-network. Failure uses a distinct Fail trait rather than the standard library's Error trait, which causes a lot of interoperability problems with tower and other Error-using crates. Since failure was created, the standard library's Error trait was improved, and its conveniences are now available without the custom Fail trait using `thiserror` (for easy error derives) and `anyhow` (for a better boxed Error).	2019-10-16 13:16:52 -04:00

40 Commits