Zebra

Commit Graph

Author	SHA1	Message	Date
Deirdre Connolly	bf72d6dbc0	Update zebra-network/src/peer/handshake.rs Co-authored-by: teor <teor@riseup.net>	2021-05-18 14:02:19 +10:00
teor	d2a8985dbc	Reliability: Add inbound canonical addresses to the address book Add canonical addresses from inbound connections to the address book, so that Zebra can use them for reconnection attempts. Use the newly added `NeverAttemptedAlternate` state for these addresses, so we try gossiped addresses first, then canonical addresses. This avoids duplicate connections to inbound peers.	2021-05-18 14:02:19 +10:00
teor	b0b8b2f61a	Add extra instrumentation for initialize and handshakes (#2122 ) * Instrument the crawl task When we created the crawl task, we forgot to instrument it with the global span. This fix makes sure that the git and network span appears on crawl logs. * Instrument the connector * Improve handshake instrumentation Make some spans debug, so there are not too many spans. * Add the address to initial peer connection errors	2021-05-17 16:49:16 -04:00
teor	7969459b19	Security: Move the Verack response after the version check (#2121 ) We should do as many local checks as possible, before sending further messages.	2021-05-17 16:39:44 -04:00
teor	f541f85792	Send unspecified addresses and client services for isolated connections	2021-05-14 23:45:42 +10:00
teor	9160365d06	Fix a comment	2021-05-14 23:45:42 +10:00
teor	a8a0d6450c	Security: stop gossiping temporary inbound remote addresses to peers - stop putting inbound addresses in the address book - drop address book entries that can't be used for outbound connections - distinguish between temporary inbound and permanent outbound peer addresses - also create variants to handle proxy connections (but don't use them yet) - avoid tracking connection state for isolated connections - document security constraints for the address book and peer set	2021-05-14 23:45:42 +10:00
teor	905b90d6a1	Refactor and document correctness for std::sync::Mutex in ErrorSlot	2021-04-21 16:39:06 -04:00
teor	3f45735f3f	Use futures:🔒:Mutex for the nonce set	2021-04-21 01:39:49 -04:00
teor	ad272f2bee	Make sure handshake version negotiation always has a timeout As part of this change, refactor handshake version negotiation into its own function.	2021-04-19 18:31:28 -04:00
teor	2cecd52a10	Fix comment typo	2021-04-19 10:11:22 -04:00
teor	8fb12f07a1	Fix outdated comment	2021-04-19 10:11:22 -04:00
teor	eabadb8301	Make heartbeats wait for the connection queue to empty, with a timeout Also cleanup the heartbeat code, so each heartbeat request/response runs in a future with a single timeout.	2021-04-19 10:11:22 -04:00
teor	375c8d8700	Fix a deadlock between the crawler and dialer, and other hangs (#1950 ) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.	2021-04-07 10:25:10 -03:00
teor	306fa88214	Document the correctness of Poll::Pending wakeups	2021-03-27 08:55:49 -04:00
teor	b329892665	Add a comment about a zcashd inv message bug	2021-03-26 11:26:59 -04:00
teor	6fe81d8992	Make MetaAddr.last_seen into a private field	2021-03-26 07:23:49 +10:00
Jack Grigg	7a8cae9321	Tag message metrics by type	2021-03-17 09:38:07 +10:00
Jack Grigg	e51f33a4b9	Use interoperable names for common metrics These names match the equivalent metrics in zcashd, enabling common metrics to be collected across both node types.	2021-03-17 09:38:07 +10:00
teor	895bb43ead	Clippy: Fix inconsistent struct member orders lint	2021-03-01 23:31:18 -05:00
teor	9c3f236075	Stop sending blocks and transactions on error	2021-02-25 08:44:57 -08:00
teor	78f162733d	Revert "leverage return value for propagating errors" This reverts commit `e6cb20e13f`.	2021-02-24 13:07:31 -08:00
teor	72e2e83828	Revert "introduce Transition enum" This reverts commit `6906f87ead`.	2021-02-24 13:07:31 -08:00
teor	a5e89f4f2b	Revert "accidental drop on mustusesender" This reverts commit `5ec8d09e0d`.	2021-02-24 13:07:31 -08:00
teor	d60226a3cf	Revert "rustfmt" This reverts commit `9d9734ea81`.	2021-02-24 13:07:31 -08:00
teor	359015b2be	Revert "Only reject pending client requests when the peer has errored" This reverts commit `e06705ed81`.	2021-02-24 13:07:31 -08:00
teor	663ed6c842	Revert "Remove remaining references to fail_with" This reverts commit `5e4bf804aa`.	2021-02-24 13:07:31 -08:00
teor	3c225550ee	Revert "rename transitions from Exit to Close" This reverts commit `cfc4717b98`.	2021-02-24 13:07:31 -08:00
teor	86dc66dfa9	Revert "deduplicate match arms in handle_client_request" This reverts commit `2adee7b31a`.	2021-02-24 13:07:31 -08:00
teor	292a4391e2	Revert "update comments throughout connection.rs" This reverts commit `651d352ce1`.	2021-02-24 13:07:31 -08:00
teor	fc44a97925	Revert "remove unnecessary Option around request timeout" This reverts commit `c3724031df`.	2021-02-24 13:07:31 -08:00
teor	e06120cd36	Revert "ensure peer/client.rs comments are up to date" This reverts commit `2266886a53`.	2021-02-24 13:07:31 -08:00
teor	1a70d807b6	Revert "make sure peer/error.s comments are up to date" This reverts commit `6f205a1812`.	2021-02-24 13:07:31 -08:00
teor	3b2077fcfd	Revert "Apply suggestions from code review" This reverts commit `736092abb8`.	2021-02-24 13:07:31 -08:00
Jane Lusby	736092abb8	Apply suggestions from code review Co-authored-by: teor <teor@riseup.net>	2021-02-19 14:11:35 -08:00
Jane Lusby	6f205a1812	make sure peer/error.s comments are up to date	2021-02-19 14:11:35 -08:00
Jane Lusby	2266886a53	ensure peer/client.rs comments are up to date	2021-02-19 14:11:35 -08:00
Jane Lusby	c3724031df	remove unnecessary Option around request timeout	2021-02-19 14:11:35 -08:00
Jane Lusby	651d352ce1	update comments throughout connection.rs	2021-02-19 14:11:35 -08:00
Jane Lusby	2adee7b31a	deduplicate match arms in handle_client_request	2021-02-19 14:11:35 -08:00
Jane Lusby	cfc4717b98	rename transitions from Exit to Close	2021-02-19 14:11:35 -08:00
teor	5e4bf804aa	Remove remaining references to fail_with	2021-02-19 14:11:35 -08:00
teor	e06705ed81	Only reject pending client requests when the peer has errored - Add an `ExitClient` transition, used when the internal client channel is closed or dropped, and there are no more pending requests - Ignore pending requests after an `ExitClient` transition - Reject pending requests when the peer has caused an error (the `Exit` and `ExitRequest` transitions) - Remove `PeerError::ConnectionDropped`, because it is now handled by `ExitClient`. (Which is an internal error, not a peer error.)	2021-02-19 14:11:35 -08:00
teor	9d9734ea81	rustfmt	2021-02-19 14:11:35 -08:00
Jane Lusby	5ec8d09e0d	accidental drop on mustusesender	2021-02-19 14:11:35 -08:00
Jane Lusby	6906f87ead	introduce Transition enum	2021-02-19 14:11:35 -08:00
Jane Lusby	e6cb20e13f	leverage return value for propagating errors	2021-02-19 14:11:35 -08:00
teor	5424e1d8ba	Fix candidate set address state handling (#1709 ) Design: - Add a `PeerAddrState` to each `MetaAddr` - Use a single peer set for all peers, regardless of state - Implement time-based liveness as an `AddressBook` method, rather than a `PeerAddrState` variant - Delete `AddressBook.by_state` Implementation: - Simplify `AddressBook` changes using `update` and `take` modifier methods - Simplify the `AddressBook` iterator implementation, replacing it with methods that are more obviously correct - Consistently collect peer set metrics Documentation: - Expand and update the peer set documentation We can optimise later, but for now we want simple code that is more obviously correct.	2021-02-18 11:18:32 +10:00
Alfredo Garcia	d7c40af2a8	Fix shutdown panics (#1637 ) * add a shutdown flag in zebra_chain::shutdown * fix network panic on shutdown * fix checkpoint panic on shutdown	2021-02-03 19:03:28 +10:00
teor	983e94f9e4	Add a TODO for inbound error handling cleanup	2021-02-03 08:32:10 +10:00
teor	b551d81f8d	Explain why we stay connected on Inbound errors We might be syncing using this peer, so it's ok to just ignore any internal errors in their Inbound requests, and drop the request.	2021-01-27 12:08:49 -08:00
teor	05fff8e6f7	Revert "Stop panicking when fail_with is called twice on a connection" But keep the extra error information.	2021-01-18 00:23:36 -05:00
teor	4fe81da953	Improve logging for connection state errors	2021-01-18 00:23:36 -05:00
teor	a6c1cd3c35	Stop panicking when fail_with is called twice on a connection We can't rule out the connection state changing between the state checks and any eventual failures, particularly in the presence of async code. So we turn this panic into a warning.	2021-01-18 00:23:36 -05:00
teor	44c8fafc29	Stop processing the request after failing an overloaded connection zebra-network's Connection expects that `fail_with` is only called once per connection, but the overload handling code continues to process the current request after an overload error, potentially leading to further failures. Closes #1599	2021-01-18 00:23:36 -05:00
teor	0f0fb93b5c	Update some comments in zebra-network Add ticket numbers, and update based on design decisions and new code.	2021-01-15 09:02:10 -05:00
teor	b7d0a40ee1	Revert unused instrument macros Reverts most of "Instrument some functions to try to locate the panic"	2021-01-06 13:07:23 -08:00
teor	6d3aa0002c	Ensure received client request oneshots are used via the type system The `peer::Client` translates `Request`s into `ClientRequest`s, which it sends to a background task. If the send is `Ok(())`, it will assume that it is safe to unconditionally poll the `Receiver` tied to the `Sender` used to create the `ClientRequest`. We enforce this invariant via the type system, by converting `ClientRequest`s to `InProgressClientRequest`s when they are received by the background task. These conversions are implemented by `ClientRequestReceiver`. Changes: * Revert `ClientRequest` so it uses a `oneshot::Sender` * Add `InProgressClientRequest`, which is the same as `ClientRequest`, but has a `MustUseOneshotSender` * `impl From<ClientRequest> for InProgressClientRequest` * Add a new `ClientRequestReceiver` type that wraps a `mpsc::Receiver<ClientRequest>` * `impl Stream<InProgressClientRequest> for ClientRequestReceiver`, converting the successful result of `inner.poll_next_unpin` into an `InProgressClientRequest` * Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection` with the new `ClientRequestReceiver` type * `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`	2021-01-06 13:07:23 -08:00
teor	df1b0c8d58	Defer a timeout fix until later	2021-01-06 13:07:23 -08:00
teor	d5cfd5ad5f	Clarify the ClientRequest invariant Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2021-01-06 13:07:23 -08:00
teor	f8ff2e9c0b	Add more sends before dropping ClientRequests This fix also changes heartbeat behaviour in the following ways: * if the queue is full, the connection is closed. Previously, the sender would wait until the queue had emptied * if the queue flush fails, Zebra panics, because it can't send an error on the ClientRequest sender, so the invariant is broken	2021-01-06 13:07:23 -08:00
teor	3e711ccc8a	Instrument some functions to try to locate the panic	2021-01-06 13:07:23 -08:00
teor	fa29fca917	Panic when must-use senders are dropped before use Add a MustUseOneshotSender, which panics if its inner sender is unused. Callers must call `send()` on the MustUseOneshotSender, or ensure that the sender is canceled. Replaces an unreliable panic in `Client::call()` with a reliable panic when a must-use sender is dropped.	2021-01-06 13:07:23 -08:00
teor	b03809ebe3	Add the invalid state to an unreachable panic message	2021-01-06 13:07:23 -08:00
teor	86136c7b5c	Stop ignoring errors when the new state is AwaitingRequest The previous code would send a Nil message on the Sender, even if the result was actually an error.	2021-01-06 13:07:23 -08:00
teor	da5084a10a	Split the 3-level match using a temporary	2021-01-06 13:07:23 -08:00
teor	fd23c46726	Remove a redundant fmt::Display bound	2021-01-06 13:07:23 -08:00
teor	3892894ffa	Call ClientRequest.tx.send() even if there is an error Previously, tx would be dropped before send if: - the success case would have used tx to wait for further messages, - but the response was actually an error. Instead, send the error on `tx` and call `fail_with()` using the same error. To support this change, allow `fail_with()` to take a `PeerError` or a `SharedPeerError`.	2021-01-06 13:07:23 -08:00
teor	28f3186182	Mark ClientRequest and State::AwaitingResponse as must_use	2021-01-06 13:07:23 -08:00
teor	b1f14f47c6	Rewrite GetData handling to match the zcashd implementation (#1518 ) * Rewrite GetData handling to match the zcashd implementation `zcashd` silently ignores missing blocks, but sends found transactions followed by a `NotFound` message: `e7b425298f/src/main.cpp (L5497)` This is significantly different to the behaviour expected by the old Zebra connection state machine, which expected `NotFound` for blocks. Also change Zebra's GetData responses to peer request so they ignore missing blocks. * Stop hanging on incomplete transaction or block responses Instead, if the peer sends an unexpected block, unexpected transaction, or NotFound message: 1. end the request, and return a partial response containing any items that were successfully received 2. if none of the expected blocks or transactions were received, return an error, and close the connection	2021-01-04 13:25:35 +10:00
Henry de Valence	bfbc737b6c	network: don't cancel heartbeat requests The cancellation implementation changes made to the connection state machine mean that if a response oneshot is dropped, the connection will avoid cancelling the request. So the heartbeat task does have to wait on the response.	2020-12-02 02:18:13 -05:00
Henry de Valence	f93deb1cac	network: fix missing {0} in PeerError::Serialization	2020-12-01 19:16:41 -08:00
Henry de Valence	18cf5e0249	network: use short Display for Message in spans This makes the span data more compact (e.g., `msg_as_req{msg=block}`) and restores the Debug impl for Message to show all of the data contained in the message. The full message is added as a single event at trace level in the span to preserve the previous full-inspectability.	2020-12-01 19:16:41 -08:00
Jane Lusby	a91d0f0bb6	Include short sha in log messages and error urls (#1410 ) As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs. To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs. Co-authored-by: teor <teor@riseup.net>	2020-12-01 12:13:20 -08:00
Henry de Valence	add94c1c45	deps: move to tokio 0.3, tower 0.4 This change is mostly mechanical, with the exception of the changes to the `tower-batch` middleware. This middleware was adapted from `tower::buffer`, and the `tower::buffer` code was changed to implement its own bounded queue, because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See `ddc64e8d4d` for more context on the Tower changes. To match Tower as closely as possible in order to be able to upstream `tower-batch`, those changes are copied from `tower::Buffer` to `tower-batch`.	2020-11-20 10:08:16 -08:00
Henry de Valence	8e709bfa88	network: don't fail on unsolicited messages These messages might be unsolicited, or they might be a response to a request we already canceled. So don't fail the whole connection, just drop the message and move on.	2020-10-26 12:05:35 -07:00
Henry de Valence	13daefa729	network: handle request cancellation in Connection We handle request cancellation in two places: before we transition into the AwaitingResponse state, and while we are in AwaitingResponse. We need both places, or else if we started processing a request, we wouldn't process the cancellation until the timeout elapsed. The first is a check that the oneshot is not already canceled. For the second, we wait on a cancellation, either from a timeout or from the tx channel closing.	2020-10-26 12:05:35 -07:00
teor	1e97691fc8	Fix some "needless lifetime" clippy lints These lints seem to be new in clippy nightly.	2020-10-12 08:54:23 +10:00
Henry de Valence	b72c249b96	network: add a metric+warning when shedding load	2020-09-21 09:26:39 -07:00
Henry de Valence	4df5632752	network: handle Message::NotFound as a response This cleans up the response processing logic a little bit along the way, but the overall division of responsibility should be better documented in a future commit.	2020-09-20 10:21:18 -07:00
Henry de Valence	64905563d1	network: remove glob import in message-handling This clarifies which parts are the handler state and which parts are the incoming message.	2020-09-20 10:21:18 -07:00
Henry de Valence	9c021025a7	network: fill in remaining request/response pairs	2020-09-20 10:21:18 -07:00
Henry de Valence	b289cb9164	network: clean up GetHeaders, GetBlocks modeling	2020-09-20 10:21:18 -07:00
Henry de Valence	3c993f33b1	network: add PeerError::WrongMessage This lets us distinguish between cases where the message was unsupported (e.g., BIP11 messages), and cases where the message was uninterpretable in context (e.g., unsolicited messages).	2020-09-20 10:21:18 -07:00
Henry de Valence	430176dd0d	network: clean up message-as-request translation	2020-09-20 10:21:18 -07:00
Henry de Valence	1d3892e1dc	network: rename alias to BoxError This is shorter and consistent with Tower (which is why we use it in the first place).	2020-09-18 18:34:25 -07:00
teor	1f7af0a779	Update the inv message processing comment Cleanup after PR #1028.	2020-09-09 15:29:38 -07:00
Jane Lusby	1b17691dda	improve logging	2020-09-08 12:37:34 -07:00
Jane Lusby	81a3ad3a0d	filter inventory advertisements correctly	2020-09-08 12:37:34 -07:00
Henry de Valence	3f150eb16e	network: implement transaction request handling. (#1016 ) This commit makes several related changes to the network code: - adds a `TransactionsByHash(HashSet<transaction::Hash>)` request and `Transactions(Vec<Arc<Transaction>>)` response pair that allows fetching transactions from a remote peer; - adds a `PushTransaction(Arc<Transaction>)` request that pushes an unsolicited transaction to a remote peer; - adds an `AdvertiseTransactions(HashSet<transaction::Hash>)` request that advertises transactions by hash to a remote peer; - adds an `AdvertiseBlock(block::Hash)` request that advertises a block by hash to a remote peer; Then, it modifies the connection state machine so that outbound requests to remote peers are handled properly: - `TransactionsByHash` generates a `getdata` message and collects the results, like the existing `BlocksByHash` request. - `PushTransaction` generates a `tx` message, and returns `Nil` immediately. - `AdvertiseTransactions` and `AdvertiseBlock` generate an `inv` message, and return `Nil` immediately. Next, it modifies the connection state machine so that messages from remote peers generate requests to the inbound service: - `getdata` messages generate `BlocksByHash` or `TransactionsByHash` requests, depending on the content of the message; - `tx` messages generate `PushTransaction` requests; - `inv` messages generate `AdvertiseBlock` or `AdvertiseTransactions` requests. Finally, it refactors the request routing logic for the peer set to handle advertisement messages, providing three routing methods: - `route_p2c`, which uses p2c as normal (default); - `route_inv`, which uses the inventory registry and falls back to p2c (used for `BlocksByHash` or `TransactionsByHash`); - `route_all`, which broadcasts a request to all ready peers (used for `AdvertiseBlock` and `AdvertiseTransactions`).	2020-09-08 10:16:29 -07:00
teor	b5c653ed93	Use ok_or for constants, rather than a redudant closure * Use ok_or for constants in zebra-network * Use ok_or for constants in zebra-consensus	2020-09-02 14:26:26 +10:00
Jane Lusby	88557ddd0a	address more comments	2020-09-01 21:01:38 -04:00
Jane Lusby	d933abeebf	fix typo	2020-09-01 21:01:38 -04:00
Jane Lusby	96c8809348	Implement Inventory Tracking RFC (#963 ) * Add .cargo to the gitignore file * Implement Inventory Tracking RFC * checkpoint * wire together the inventory registry * add comment documenting condition * make inventory registry optional	2020-09-01 14:28:54 -07:00
Henry de Valence	f91b91b6d8	network: clarify comment on Default for handshake::Builder Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-09-01 13:56:00 -07:00
Henry de Valence	fddba7a336	network: remove handshake::Builder::with_addr Use the listen_addr field already specified in the config. Also, derive Clone for Handshake<S>. Co-authored-by: Jane Lusby <jane@zfnd.org>	2020-09-01 13:56:00 -07:00
Henry de Valence	a5b6f39850	network: don't leak our exact time skew in handshakes.	2020-09-01 13:56:00 -07:00
Henry de Valence	60a0b8c382	network: change Handshake::new to a Builder. This allows more detailed control over the handshake parameters.	2020-09-01 13:56:00 -07:00
Henry de Valence	103b663c40	chain: rename BlockHeight to block::Height	2020-08-17 11:46:34 -07:00
Henry de Valence	61dea90e2f	chain: rename BlockHeaderHash to block::Hash This is the first in a sequence of changes that change the block:: items to not include Block as a prefix in their name, in accordance with the Rust API guidelines.	2020-08-17 11:46:34 -07:00

1 2 3 4 5

229 Commits