Zebra

Commit Graph

Author	SHA1	Message	Date
teor	ce45198c17	Fix comment typo: overflow -> underflow	2021-06-01 16:44:45 +10:00
teor	34702f22b6	clippy: remove needless clone and collect	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	83ac1519e9	Add proptest for future `last_seen` correction Given a generated list of gossiped peers, ensure that after running the `validate_addrs` function none of the resulting peers have a `last_seen` time that's after the specified limit.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	63672a2633	Test underflow handling If the calculation to apply the compensation offset overflows or underflows, the reported times are too distant apart, and could be sent on purpose by a malicious peer, so all addresses from that peer should be rejected.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	f263d85aa4	Test `last_seen` time being equal to the limit If the most recent `last_seen` time reported by a peer is exactly the limit, the offset doesn't need to be applied because no times are in the future.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	f4a7026aa3	Test that offset is applied to all gossiped peers Use some mock gossiped peers where some have `last_seen` times in the past and some have times in the future. Check that all the peers have an offset applied to them by the `validate_addrs` function. This tests if the offset is applied to all peers that a malicious peer gossiped to us.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	60f660e53f	Test if validation doesn't offset past times Use some mock gossiped peers that all have `last_seen` times in the past and check that they don't have any changes to the `last_seen` times applied by the `validate_addrs` function.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	3c9c920bbd	Test if validation offsets times in the future Use some mock gossiped peers that all have `last_seen` times in the future and check that they all have a specific offset applied to them.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	82452621e0	Remove empty list of peers check The `limit_last_seen_times` can now safely handle an empty list.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	966430d400	Update security note to be broader Focus on what can go wrong, and not on the specific causes. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	f3419b7baf	Handle overflow when applying offset If an overflow occurs, the reported `last_seen` times are either very wrong or malicious, so reject all addresses gossiped by that peer.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	5b8f33390c	Add comment to describe purpose Make it clear why all peers have the time offset applied to them. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	9eac43a8bb	Apply offset to all times received from a peer If any of the times gossiped by a peer are in the future, apply the necessary offset to all the times gossiped by that peer. This ensures that all gossiped peers from a malicious peer are moved further back in the queue. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	fa35c9b4f1	Only apply offset to times in the future Times in the past don't have any security implications, so there's no point in trying to apply the offset to them as well.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	876d515dd6	Improve documentation - Make the security impact clearer and in a separate section. - Instead of listing an assumption as almost a side-note, describe it clearly inside a `Panics` section. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	54809a1b89	Don't trust reported peer `last_seen` times Due to clock skew, the peers could end up at the front of the reconnection queue or far at the back. The solution to this is to offset the reported times by the difference between the most recent reported sight (in the remote clock) and the current time (in the local clock).	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	14ecc79f01	Use `DateTime32` in `validate_addrs`	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	b891a96a6d	Improve ergonomics by returning `impl Iterator` Returning `impl IntoIterator` means that the caller will always be forced to call `.into_iter()`, and returning `impl Iterator` still allows them to call `.into_iter()` because it becomes the identity function.	2021-06-01 03:42:08 -03:00
teor	2685fc746e	Remove CandidateSet state and add last seen time limit to candidate_set::validate_addrs (#2177 )	2021-05-21 02:21:13 +00:00
teor	752358d236	Fix some candidate set and meta addr doc links (#2174 ) Suggested by jvff.	2021-05-21 11:40:14 +10:00
teor	c7ea1395e7	Security: Fix CandidateSet timeout and fanout * Refactor: Split CandidateSet::update into separate functions * Security: Apply a timeout to the entire CandidateSet::update * Security: Stop using very large fanout limits during initialization Previously, Zebra used the number of resolved peer addresses. So it was possible for all peers to fail, and for Zebra to hang on the first update. And Zebra could send a fanout for each initial peer, regardless of whether their connection was successful. Also: - wait for at least one successful peer before trying an update - warn if there are no successful initial peers	2021-05-21 06:51:34 +10:00
teor	92828bbb29	Reliability: send local listener address to peers When peers ask for peer addresses, add our local listener address to the set of addresses, sanitize, then truncate. Sanitize shuffles addresses, so if there are lots of addresses in the address book, our address will only be sent to some peers.	2021-05-18 14:02:19 +10:00
teor	458c26f1e3	Limit initial candidate set fanout to the number of initial peers If there is a small number of initial peers, and they are slow, the initial candidate set update can appear to hang. To avoid this issue, limit the initial candidate set fanout to the number of initial peers. Once the initial peers have sent us more peer addresses, there is no need to limit the fanouts for future updates. Reported by Niklas Long of Equilibrium.	2021-05-18 07:54:03 +10:00
teor	b0b8b2f61a	Add extra instrumentation for initialize and handshakes (#2122 ) * Instrument the crawl task When we created the crawl task, we forgot to instrument it with the global span. This fix makes sure that the git and network span appears on crawl logs. * Instrument the connector * Improve handshake instrumentation Make some spans debug, so there are not too many spans. * Add the address to initial peer connection errors	2021-05-17 16:49:16 -04:00
teor	a8a0d6450c	Security: stop gossiping temporary inbound remote addresses to peers - stop putting inbound addresses in the address book - drop address book entries that can't be used for outbound connections - distinguish between temporary inbound and permanent outbound peer addresses - also create variants to handle proxy connections (but don't use them yet) - avoid tracking connection state for isolated connections - document security constraints for the address book and peer set	2021-05-14 23:45:42 +10:00
Kirill Fomichev	afac2c2846	Use the default port for configured listen addresses with no port (#2043 ) * Allow use listen address in config without port * update comments * remove not used alias * use Network::default_port * Move tests and use toml instead json * change error message * Make match more readable Co-authored-by: teor <teor@riseup.net>	2021-04-21 23:14:29 +00:00
teor	0203d1475a	Refactor and document correctness for std::sync::Mutex<AddressBook>	2021-04-21 17:14:47 -04:00
teor	2ed8bb00cf	Clarify CandidateSet state diagram We get inbound connections on the listener port, but the important part is the inbound connection itself.	2021-04-21 01:37:43 -04:00
teor	a417c7c8c7	Use meaningful names for select! variables	2021-04-13 23:56:16 -04:00
teor	fb95de99a6	Refactor the dial result into a From impl	2021-04-13 18:52:49 -04:00
teor	375c8d8700	Fix a deadlock between the crawler and dialer, and other hangs (#1950 ) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.	2021-04-07 10:25:10 -03:00
teor	de6d1c93f3	Clarify a comment	2021-04-07 18:56:38 +10:00
teor	83b88f5b7a	Merge pull request #1972 from ZcashFoundation/peer-set-demand-deadlock-doc Document peer set deadlock resistance	2021-04-01 22:50:17 -04:00
teor	306fa88214	Document the correctness of Poll::Pending wakeups	2021-03-27 08:55:49 -04:00
teor	1a159dfcb6	Add more methods for creating MetaAddrs This refactor lets us remove `MetaAddr::update_last_seen()`.	2021-03-26 07:23:49 +10:00
teor	6fe81d8992	Make MetaAddr.last_seen into a private field	2021-03-26 07:23:49 +10:00
teor	5a30268d7a	Log address metrics when the peer set has no ready peers	2021-03-17 10:47:04 +10:00
Jack Grigg	e51f33a4b9	Use interoperable names for common metrics These names match the equivalent metrics in zcashd, enabling common metrics to be collected across both node types.	2021-03-17 09:38:07 +10:00
teor	e50692bd51	CandidateSet: Add Listener Port Connections Inbound connections on the Zcash protocol listener port perform a handshake. If the handshake is successful, it adds the peer to the AddressBook.	2021-03-09 23:05:18 -05:00
Jane Lusby	03aa6f671f	Implement outbound connection rate limiting - includes config rename with alias (#1855 ) * Implement outbound connection rate limiting * fix breaking change on config Co-authored-by: teor <teor@riseup.net>	2021-03-10 01:36:05 +00:00
teor	d4f2f27218	Add global span to spawned network tasks (#1761 ) Closes #1575	2021-02-20 08:36:50 +10:00
teor	e61b5e50a2	Diagnostics for CI port conflict failures (#1766 ) Log a "Trying..." message before each listener opens, to see if the delay is inside Zebra, or in the test harness or OS. Also report the configured and actual ports where possible, for better diagnostics.	2021-02-18 12:15:09 -03:00
teor	5424e1d8ba	Fix candidate set address state handling (#1709 ) Design: - Add a `PeerAddrState` to each `MetaAddr` - Use a single peer set for all peers, regardless of state - Implement time-based liveness as an `AddressBook` method, rather than a `PeerAddrState` variant - Delete `AddressBook.by_state` Implementation: - Simplify `AddressBook` changes using `update` and `take` modifier methods - Simplify the `AddressBook` iterator implementation, replacing it with methods that are more obviously correct - Consistently collect peer set metrics Documentation: - Expand and update the peer set documentation We can optimise later, but for now we want simple code that is more obviously correct.	2021-02-18 11:18:32 +10:00
teor	86169f6412	Update PeerSet metrics after every change (#1727 )	2021-02-18 07:06:59 +10:00
teor	8d1c498234	Log initial peer connection failures And standardise another log message	2021-02-17 09:21:53 -05:00
teor	e85441c914	Add a correctness comment to justify the revert	2021-02-16 05:52:54 +10:00
teor	a02a00a3f5	Revert "Stop using CallAllUnordered in peer_set::add_initial_peers (#1705 )" This reverts commit `241c7ad849`.	2021-02-16 05:52:54 +10:00
Alfredo Garcia	241c7ad849	Stop using CallAllUnordered in peer_set::add_initial_peers (#1705 ) * use ServiceExt::oneshot and FuturesUnordered Co-authored-by: teor <teor@riseup.net>	2021-02-09 08:16:02 +10:00
Alfredo Garcia	221512c733	Async DNS seeder lookups (#1662 ) * replace to_socket_addrs * refactor `resolve()` into `resolve_host()` * use `resolve_host()` to resolve config peers * add DNS_LOOKUP_TIMEOUT constant * don't block the main thread in initialize	2021-02-03 12:20:26 +10:00
Alfredo Garcia	4b34482264	Add hints to port conflict and lock file panics (#1535 ) * add hint for port error * add issue filter for port panic * add lock file hint * add metrics endpoint port conflict hint * add hint for tracing endpoint port conflict * add acceptance test for resource conflics * Split out common conflict test code into a function * Add state, metrics, and tracing conflict tests * Add a full set of stderr acceptance test functions This change makes the stdout and stderr acceptance test interfaces identical. * move Zcash listener opening * add todo about hint for disk full * add constant for lock file * match path in state cache * don't match windows cache path * Use Display for state path logs Avoids weird escaping on Windows when using Debug * Add Windows conflict error messages * Turn PORT_IN_USE_ERROR into a regex And add another alternative Windows-specific port error Co-authored-by: teor <teor@riseup.net> Co-authored-by: Jane Lusby <jane@zfnd.org>	2021-01-29 22:36:33 +10:00

1 2 3

108 Commits