Zebra

Commit Graph

Author	SHA1	Message	Date
Janito Vaqueiro Ferreira Filho	4c4dbfe7cd	Reject connections from outdated peers (#2519 ) * Simplify state service initialization in test Use the test helper function to remove redundant code. * Create `BestTipHeight` helper type This type abstracts away the calculation of the best tip height based on the finalized block height and the best non-finalized chain's tip. * Add `best_tip_height` field to `StateService` The receiver endpoint is currently ignored. * Return receiver endpoint from service constructor Make it available so that the best tip height can be watched. * Update finalized height after finalizing blocks After blocks from the queue are finalized and committed to disk, update the finalized block height. * Update best non-finalized height after validation Update the value of the best non-finalized chain tip block height after a new block is committed to the non-finalized state. * Update finalized height after loading from disk When `FinalizedState` is first created, it loads the state from persistent storage, and the finalized tip height is updated. Therefore, the `best_tip_height` must be notified of the initial value. * Update the finalized height on checkpoint commit When a checkpointed block is commited, it bypasses the non-finalized state, so there's an extra place where the finalized height has to be updated. * Add `best_tip_height` to `Handshake` service It can be configured using the `Builder::with_best_tip_height`. It's currently not used, but it will be used to determine if a connection to a remote peer should be rejected or not based on that peer's protocol version. * Require best tip height to init. `zebra_network` Without it the handshake service can't properly enforce the minimum network protocol version from peers. Zebrad obtains the best tip height endpoint from `zebra_state`, and the test vectors simply use a dummy endpoint that's fixed at the genesis height. * Pass `best_tip_height` to proto. ver. negotiation The protocol version negotiation code will reject connections to peers if they are using an old protocol version. An old version is determined based on the current known best chain tip height. * Handle an optional height in `Version` Fallback to the genesis height in `None` is specified. * Reject connections to peers on old proto. versions Avoid connecting to peers that are on protocol versions that don't recognize a network update. * Document why peers on old versions are rejected Describe why it's a security issue above the check. * Test if `BestTipHeight` starts with `None` Check if initially there is no best tip height. * Test if best tip height is max. of latest values After applying a list of random updates where each one either sets the finalized height or the non-finalized height, check that the best tip height is the maximum of the most recently set finalized height and the most recently set non-finalized height. * Add `queue_and_commit_finalized` method A small refactor to make testing easier. The handling of requests for committing non-finalized and finalized blocks is now more consistent. * Add `assert_block_can_be_validated` helper Refactor to move into a separate method some assertions that are done before a block is validated. This is to allow moving these assertions more easily to simplify testing. * Remove redundant PoW block assertion It's also checked in `zebra_state::service::check::block_is_contextually_valid`, and it was getting in the way of tests that received a gossiped block before finalizing enough blocks. * Create a test strategy for test vector chain Splits a chain loaded from the test vectors in two parts, containing the blocks to finalize and the blocks to keep in the non-finalized state. * Test committing blocks update best tip height Create a mock blockchain state, with a chain of finalized blocks and a chain of non-finalized blocks. Commit all the blocks appropriately, and verify that the best tip height is updated. Co-authored-by: teor <teor@riseup.net>	2021-08-08 23:52:52 +00:00
teor	bcd5f2c50d	Gossip dynamic local listener ports to peers (#2277 ) * Gossip dynamically allocated listener ports to peers Previously, Zebra would either gossip port `0`, which is invalid, or skip gossiping its own dynamically allocated listener port. * Improve "no configured peers" warning And downgrade from error to warning, because inbound-only nodes are a valid use case. * Move random_known_port to zebra-test * Add tests for dynamic local listener ports and the AddressBook Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>	2021-06-23 07:59:06 +10:00
teor	1a57023eac	Security: Use canonical SocketAddrs to avoid duplicate peer connections, Feature: Send local listener to peers (#2276 ) * Always send our local listener with the latest time Previously, whenever there was an inbound request for peers, we would clone the address book and update it with the local listener. This had two impacts: - the listener could conflict with an existing entry, rather than unconditionally replacing it, and - the listener was briefly included in the address book metrics. As a side-effect, this change also makes sanitization slightly faster, because it avoids some useless peer filtering and sorting. * Skip listeners that are not valid for outbound connections * Filter sanitized addresses Zebra based on address state This fix correctly prevents Zebra gossiping client addresses to peers, but still keeps the client in the address book to avoid reconnections. * Add a full set of DateTime32 and Duration32 calculation methods * Refactor sanitize to use the new DateTime32/Duration32 methods * Security: Use canonical SocketAddrs to avoid duplicate connections If we allow multiple variants for each peer address, we can make multiple connections to that peer. Also make sure sanitized MetaAddrs are valid for outbound connections. * Test that address books contain the local listener address Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>	2021-06-22 02:16:59 +00:00
teor	92828bbb29	Reliability: send local listener address to peers When peers ask for peer addresses, add our local listener address to the set of addresses, sanitize, then truncate. Sanitize shuffles addresses, so if there are lots of addresses in the address book, our address will only be sent to some peers.	2021-05-18 14:02:19 +10:00
teor	458c26f1e3	Limit initial candidate set fanout to the number of initial peers If there is a small number of initial peers, and they are slow, the initial candidate set update can appear to hang. To avoid this issue, limit the initial candidate set fanout to the number of initial peers. Once the initial peers have sent us more peer addresses, there is no need to limit the fanouts for future updates. Reported by Niklas Long of Equilibrium.	2021-05-18 07:54:03 +10:00
teor	b0b8b2f61a	Add extra instrumentation for initialize and handshakes (#2122 ) * Instrument the crawl task When we created the crawl task, we forgot to instrument it with the global span. This fix makes sure that the git and network span appears on crawl logs. * Instrument the connector * Improve handshake instrumentation Make some spans debug, so there are not too many spans. * Add the address to initial peer connection errors	2021-05-17 16:49:16 -04:00
teor	a8a0d6450c	Security: stop gossiping temporary inbound remote addresses to peers - stop putting inbound addresses in the address book - drop address book entries that can't be used for outbound connections - distinguish between temporary inbound and permanent outbound peer addresses - also create variants to handle proxy connections (but don't use them yet) - avoid tracking connection state for isolated connections - document security constraints for the address book and peer set	2021-05-14 23:45:42 +10:00
Kirill Fomichev	afac2c2846	Use the default port for configured listen addresses with no port (#2043 ) * Allow use listen address in config without port * update comments * remove not used alias * use Network::default_port * Move tests and use toml instead json * change error message * Make match more readable Co-authored-by: teor <teor@riseup.net>	2021-04-21 23:14:29 +00:00
teor	0203d1475a	Refactor and document correctness for std::sync::Mutex<AddressBook>	2021-04-21 17:14:47 -04:00
teor	a417c7c8c7	Use meaningful names for select! variables	2021-04-13 23:56:16 -04:00
teor	fb95de99a6	Refactor the dial result into a From impl	2021-04-13 18:52:49 -04:00
teor	375c8d8700	Fix a deadlock between the crawler and dialer, and other hangs (#1950 ) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.	2021-04-07 10:25:10 -03:00
teor	1a159dfcb6	Add more methods for creating MetaAddrs This refactor lets us remove `MetaAddr::update_last_seen()`.	2021-03-26 07:23:49 +10:00
teor	5a30268d7a	Log address metrics when the peer set has no ready peers	2021-03-17 10:47:04 +10:00
Jane Lusby	03aa6f671f	Implement outbound connection rate limiting - includes config rename with alias (#1855 ) * Implement outbound connection rate limiting * fix breaking change on config Co-authored-by: teor <teor@riseup.net>	2021-03-10 01:36:05 +00:00
teor	d4f2f27218	Add global span to spawned network tasks (#1761 ) Closes #1575	2021-02-20 08:36:50 +10:00
teor	e61b5e50a2	Diagnostics for CI port conflict failures (#1766 ) Log a "Trying..." message before each listener opens, to see if the delay is inside Zebra, or in the test harness or OS. Also report the configured and actual ports where possible, for better diagnostics.	2021-02-18 12:15:09 -03:00
teor	8d1c498234	Log initial peer connection failures And standardise another log message	2021-02-17 09:21:53 -05:00
teor	e85441c914	Add a correctness comment to justify the revert	2021-02-16 05:52:54 +10:00
teor	a02a00a3f5	Revert "Stop using CallAllUnordered in peer_set::add_initial_peers (#1705 )" This reverts commit `241c7ad849`.	2021-02-16 05:52:54 +10:00
Alfredo Garcia	241c7ad849	Stop using CallAllUnordered in peer_set::add_initial_peers (#1705 ) * use ServiceExt::oneshot and FuturesUnordered Co-authored-by: teor <teor@riseup.net>	2021-02-09 08:16:02 +10:00
Alfredo Garcia	221512c733	Async DNS seeder lookups (#1662 ) * replace to_socket_addrs * refactor `resolve()` into `resolve_host()` * use `resolve_host()` to resolve config peers * add DNS_LOOKUP_TIMEOUT constant * don't block the main thread in initialize	2021-02-03 12:20:26 +10:00
Alfredo Garcia	4b34482264	Add hints to port conflict and lock file panics (#1535 ) * add hint for port error * add issue filter for port panic * add lock file hint * add metrics endpoint port conflict hint * add hint for tracing endpoint port conflict * add acceptance test for resource conflics * Split out common conflict test code into a function * Add state, metrics, and tracing conflict tests * Add a full set of stderr acceptance test functions This change makes the stdout and stderr acceptance test interfaces identical. * move Zcash listener opening * add todo about hint for disk full * add constant for lock file * match path in state cache * don't match windows cache path * Use Display for state path logs Avoids weird escaping on Windows when using Debug * Add Windows conflict error messages * Turn PORT_IN_USE_ERROR into a regex And add another alternative Windows-specific port error Co-authored-by: teor <teor@riseup.net> Co-authored-by: Jane Lusby <jane@zfnd.org>	2021-01-29 22:36:33 +10:00
Jane Lusby	15698245e1	Deduplicate metrics dependencies (#1561 ) ## Motivation This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`. ## Solution This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up. ## Related Issues closes https://github.com/ZcashFoundation/zebra/issues/1349 ## Follow Up Work I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible. - https://github.com/ZcashFoundation/zebra/issues/1582 Co-authored-by: teor <teor@riseup.net>	2021-01-12 12:28:56 +10:00
teor	8e2f08221f	Add peer set tracing and unreachable panics (#1468 ) Add some extra tracing and panics to double-check our assumptions about the peer set state machine.	2020-12-14 11:00:39 +10:00
Henry de Valence	00c4f4f0e6	network: record cause of handshake failure	2020-12-01 19:16:41 -08:00
Henry de Valence	add94c1c45	deps: move to tokio 0.3, tower 0.4 This change is mostly mechanical, with the exception of the changes to the `tower-batch` middleware. This middleware was adapted from `tower::buffer`, and the `tower::buffer` code was changed to implement its own bounded queue, because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See `ddc64e8d4d` for more context on the Tower changes. To match Tower as closely as possible in order to be able to upstream `tower-batch`, those changes are copied from `tower::Buffer` to `tower-batch`.	2020-11-20 10:08:16 -08:00
Henry de Valence	6dd7318d3b	deps: use Tower 0.4 from git instead of 0.3.1. This addresses at least three pain points: - we were affected by bugs that were already fixed in git, but not in the released crate; - we can use service combinators to transform requests and responses; - we can use the hedge middleware. The version in git is still marked as 0.3.1 but these changes will be part of tower 0.4: https://github.com/tower-rs/tower/issues/431	2020-09-21 14:16:56 -07:00
Henry de Valence	170f588ffb	network: document load-shedding behavior This was part of the original design and is described in the Connection internals, but we never documented it externally.	2020-09-18 18:34:25 -07:00
Henry de Valence	1d3892e1dc	network: rename alias to BoxError This is shorter and consistent with Tower (which is why we use it in the first place).	2020-09-18 18:34:25 -07:00
Jane Lusby	96c8809348	Implement Inventory Tracking RFC (#963 ) * Add .cargo to the gitignore file * Implement Inventory Tracking RFC * checkpoint * wire together the inventory registry * add comment documenting condition * make inventory registry optional	2020-09-01 14:28:54 -07:00
Henry de Valence	fddba7a336	network: remove handshake::Builder::with_addr Use the listen_addr field already specified in the config. Also, derive Clone for Handshake<S>. Co-authored-by: Jane Lusby <jane@zfnd.org>	2020-09-01 13:56:00 -07:00
Henry de Valence	1b5a824584	network: fix bug in BIP37 relay flag handling. The relay flag in the version message is used in conjunction with BIP37 to receive bloom-filtered transactions. When it is set to false, transactions are not relayed until a bloom filter is set. Since we don't implement BIP37 (it's not useful for shielded transactions), this means we'll never receive transactions.	2020-09-01 13:56:00 -07:00
Henry de Valence	60a0b8c382	network: change Handshake::new to a Builder. This allows more detailed control over the handshake parameters.	2020-09-01 13:56:00 -07:00
Henry de Valence	948b067808	chain: move Network, NetworkUpgrade to parameters Also, avoid using star-imports of the enum variants, which pollutes the namespace.	2020-08-17 11:46:34 -07:00
teor	109666cc48	fix: Tweak the the network listener log (#886 )	2020-08-12 14:22:54 -07:00
Henry de Valence	299afe13df	zebra-network tweaks. (#877 ) * network: move gossiped peer selection logic into address book. * network: return BoxService from init. * zebrad: add note on why we truncate thegossiped peer list Co-authored-by: Jane Lusby <jlusby42@gmail.com> * Remove unused .rustfmt.toml Many of these options are never actually loaded by our CI because of a channel mismatch, where they're not applied on stable but only on nightly (see the logs from a rustfmt job). This means that we can get different settings when running `cargo fmt` on the nightly and stable channels, which was causing a CI failure on this PR. Reverting back to the default rustfmt settings avoids this problem and keeps us in line with upstream rustfmt. There's no loss to us since we were using the defaults anyways. Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-08-11 13:07:44 -07:00
Alfredo Garcia	9c387521bd	Print endpoint addresses at startup (#867 ) * print tracing and metrics endpoints in startup * print network address in startup	2020-08-10 12:47:26 -07:00
Henry de Valence	3d46ab746a	Clean up options in network config section. (#839 ) Closes #536. This removes: - the user-agent (we can add a mechanism to specify extra BIP14 components later, if any users ask us for that feature); - the EWMA parameters (these were put in the config just to avoid making a choice); - the peer connection timeout (we can change the default value if anyone ever has a problem with it); - the peer set request buffer size (setting this too low can make the application deadlock); The new peer interval is left in.	2020-08-06 11:29:00 -07:00
teor	6be0f8ed2f	fix: Warn if the listener port is for the wrong network We'll fix the underlying defaults in #660, with the rest of the listeners.	2020-07-29 16:03:52 +10:00
Henry de Valence	217c25ef07	network: propagate tracing Spans through peer connection	2020-07-09 11:15:06 -07:00
Dimitris Apostolou	ba81d7d4c0	Fix typos	2020-07-07 11:13:49 -07:00
Jane Lusby	431f194c0f	propagate errors out of zebra_network::init (#435 ) Prior to this change, the service returned by `zebra_network::init` would spawn background tasks that could silently fail, causing unexpected errors in the zebra_network service. This change modifies the `PeerSet` that backs `zebra_network::init` to store all of the `JoinHandle`s for each background task it depends on. The `PeerSet` then checks this set of futures to see if any of them have exited with an error or a panic, and if they have it returns the error as part of `poll_ready`.	2020-06-09 12:24:28 -07:00
Jane Lusby	8c178c3ee4	fix panic in seed subcommand (#401 ) Co-authored-by: Jane Lusby <jane@zfnd.org> Prior to this change, the seed subcommand would consistently encounter a panic in one of the background tasks, but would continue running after the panic. This is indicative of two bugs. First, zebrad was not configured to treat panics as non recoverable and instead defaulted to the tokio defaults, which are to catch panics in tasks and return them via the join handle if available, or to print them if the join handle has been discarded. This is likely a poor fit for zebrad as an application, we do not need to maximize uptime or minimize the extent of an outage should one of our tasks / services start encountering panics. Ignoring a panic increases our risk of observing invalid state, causing all sorts of wild and bad bugs. To deal with this we've switched the default panic behavior from `unwind` to `abort`. This makes panics fail immediately and take down the entire application, regardless of where they occur, which is consistent with our treatment of misbehaving connections. The second bug is the panic itself. This was triggered by a duplicate entry in the initial_peers set. To fix this we've switched the storage for the peers from a `Vec` to a `HashSet`, which has similar properties but guarantees uniqueness of its keys.	2020-05-27 17:40:12 -07:00
Jane Lusby	b6b35364f3	cleanup warnings throughout codebase	2020-05-27 15:42:29 -04:00
Henry de Valence	43b2d35dda	Crawl for more peers when we exhaust candidates.	2020-02-21 06:48:25 -05:00
Henry de Valence	afa2c2347f	fmt	2020-02-21 06:48:25 -05:00
Henry de Valence	00edcae0c2	Add metrics for the crawler and candidate set.	2020-02-14 20:14:05 -05:00
Henry de Valence	8000f888fd	Connect to multiple peers concurrently. The previous outbound peer connection logic got requests to connect to new peers and processed them one at a time, making single connection attempts and retrying if the connection attempt failed. This was quite slow, because many connections fail, and we have to wait for timeouts. Instead, this logic connects to new peers concurrently (up to 50 at a time).	2020-02-14 18:23:41 -05:00
Henry de Valence	2965187b91	Upgrade tokio, futures, hyper to released versions.	2019-12-13 17:42:15 -05:00

1 2

56 Commits