Commit Graph

108 Commits

Author SHA1 Message Date
teor ce45198c17
Fix comment typo: overflow -> underflow 2021-06-01 16:44:45 +10:00
teor 34702f22b6 clippy: remove needless clone and collect 2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 83ac1519e9 Add proptest for future `last_seen` correction
Given a generated list of gossiped peers, ensure that after running the
`validate_addrs` function none of the resulting peers have a `last_seen`
time that's after the specified limit.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 63672a2633 Test underflow handling
If the calculation to apply the compensation offset overflows or
underflows, the reported times are too distant apart, and could be sent
on purpose by a malicious peer, so all addresses from that peer should
be rejected.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho f263d85aa4 Test `last_seen` time being equal to the limit
If the most recent `last_seen` time reported by a peer is exactly the
limit, the offset doesn't need to be applied because no times are in the
future.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho f4a7026aa3 Test that offset is applied to all gossiped peers
Use some mock gossiped peers where some have `last_seen` times in the
past and some have times in the future. Check that all the peers have
an offset applied to them by the `validate_addrs` function.

This tests if the offset is applied to all peers that a malicious peer
gossiped to us.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 60f660e53f Test if validation doesn't offset past times
Use some mock gossiped peers that all have `last_seen` times in the
past and check that they don't have any changes to the `last_seen` times
applied by the `validate_addrs` function.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 3c9c920bbd Test if validation offsets times in the future
Use some mock gossiped peers that all have `last_seen` times in the
future and check that they all have a specific offset applied to them.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 82452621e0 Remove empty list of peers check
The `limit_last_seen_times` can now safely handle an empty list.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 966430d400 Update security note to be broader
Focus on what can go wrong, and not on the specific causes.

Co-authored-by: teor <teor@riseup.net>
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho f3419b7baf Handle overflow when applying offset
If an overflow occurs, the reported `last_seen` times are either very
wrong or malicious, so reject all addresses gossiped by that peer.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 5b8f33390c Add comment to describe purpose
Make it clear why all peers have the time offset applied to them.

Co-authored-by: teor <teor@riseup.net>
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 9eac43a8bb Apply offset to all times received from a peer
If any of the times gossiped by a peer are in the future, apply the
necessary offset to all the times gossiped by that peer. This ensures
that all gossiped peers from a malicious peer are moved further back in
the queue.

Co-authored-by: teor <teor@riseup.net>
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho fa35c9b4f1 Only apply offset to times in the future
Times in the past don't have any security implications, so there's no
point in trying to apply the offset to them as well.
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 876d515dd6 Improve documentation
- Make the security impact clearer and in a separate section.
- Instead of listing an assumption as almost a side-note, describe it
  clearly inside a `Panics` section.

Co-authored-by: teor <teor@riseup.net>
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 54809a1b89 Don't trust reported peer `last_seen` times
Due to clock skew, the peers could end up at the front of the
reconnection queue or far at the back. The solution to this is to offset
the reported times by the difference between the most recent reported
sight (in the remote clock) and the current time (in the local clock).
2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho 14ecc79f01 Use `DateTime32` in `validate_addrs` 2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho b891a96a6d Improve ergonomics by returning `impl Iterator`
Returning `impl IntoIterator` means that the caller will always be
forced to call `.into_iter()`, and returning `impl Iterator` still
allows them to call `.into_iter()` because it becomes the identity
function.
2021-06-01 03:42:08 -03:00
teor 2685fc746e
Remove CandidateSet state and add last seen time limit to candidate_set::validate_addrs (#2177) 2021-05-21 02:21:13 +00:00
teor 752358d236
Fix some candidate set and meta addr doc links (#2174)
Suggested by jvff.
2021-05-21 11:40:14 +10:00
teor c7ea1395e7 Security: Fix CandidateSet timeout and fanout
* Refactor: Split CandidateSet::update into separate functions
* Security: Apply a timeout to the entire CandidateSet::update
* Security: Stop using very large fanout limits during initialization

Previously, Zebra used the number of resolved peer addresses.
So it was possible for all peers to fail, and for Zebra to hang on the
first update.

And Zebra could send a fanout for each initial peer, regardless
of whether their connection was successful.

Also:
- wait for at least one successful peer before trying an update
- warn if there are no successful initial peers
2021-05-21 06:51:34 +10:00
teor 92828bbb29 Reliability: send local listener address to peers
When peers ask for peer addresses, add our local listener address to the
set of addresses, sanitize, then truncate. Sanitize shuffles addresses,
so if there are lots of addresses in the address book, our address will
only be sent to some peers.
2021-05-18 14:02:19 +10:00
teor 458c26f1e3 Limit initial candidate set fanout to the number of initial peers
If there is a small number of initial peers, and they are slow, the
initial candidate set update can appear to hang. To avoid this issue,
limit the initial candidate set fanout to the number of initial peers.

Once the initial peers have sent us more peer addresses, there is no need
to limit the fanouts for future updates.

Reported by Niklas Long of Equilibrium.
2021-05-18 07:54:03 +10:00
teor b0b8b2f61a
Add extra instrumentation for initialize and handshakes (#2122)
* Instrument the crawl task

When we created the crawl task, we forgot to instrument it with the
global span. This fix makes sure that the git and network span appears on
crawl logs.

* Instrument the connector

* Improve handshake instrumentation

Make some spans debug, so there are not too many spans.

* Add the address to initial peer connection errors
2021-05-17 16:49:16 -04:00
teor a8a0d6450c Security: stop gossiping temporary inbound remote addresses to peers
- stop putting inbound addresses in the address book
- drop address book entries that can't be used for outbound connections
  - distinguish between temporary inbound and permanent outbound peer
    addresses
  - also create variants to handle proxy connections
    (but don't use them yet)
  - avoid tracking connection state for isolated connections
- document security constraints for the address book and peer set
2021-05-14 23:45:42 +10:00
Kirill Fomichev afac2c2846
Use the default port for configured listen addresses with no port (#2043)
* Allow use listen address in config without port

* update comments

* remove not used alias

* use Network::default_port

* Move tests and use toml instead json

* change error message

* Make match more readable

Co-authored-by: teor <teor@riseup.net>
2021-04-21 23:14:29 +00:00
teor 0203d1475a Refactor and document correctness for std::sync::Mutex<AddressBook> 2021-04-21 17:14:47 -04:00
teor 2ed8bb00cf Clarify CandidateSet state diagram
We get inbound connections on the listener port,
but the important part is the inbound connection
itself.
2021-04-21 01:37:43 -04:00
teor a417c7c8c7 Use meaningful names for select! variables 2021-04-13 23:56:16 -04:00
teor fb95de99a6 Refactor the dial result into a From impl 2021-04-13 18:52:49 -04:00
teor 375c8d8700
Fix a deadlock between the crawler and dialer, and other hangs (#1950)
* Stop ignoring inbound message errors and handshake timeouts

To avoid hangs, Zebra needs to maintain the following invariants in the
handshake and heartbeat code:
- each handshake should run in a separate spawned task
  (not yet implemented)
- every message, error, timeout, and shutdown must update the peer address state
- every await that depends on the network must have a timeout

Once the Connection is created, it should handle timeouts.
But we need to handle timeouts during handshake setup.

* Avoid hangs by adding a timeout to the candidate set update

Also increase the fanout from 1 to 2, to increase address diversity.

But only return permanent errors from `CandidateSet::update`, because
the crawler task exits if `update` returns an error.

Also log Peers response errors in the CandidateSet.

* Use the select macro in the crawler to reduce hangs

The `select` function is biased towards its first argument, risking
starvation.

As a side-benefit, this change also makes the code a lot easier to read
and maintain.

* Split CrawlerAction::Demand into separate actions

This refactor makes the code a bit easier to read, at the cost of
sometimes blocking the crawler on `candidates.next()`.

That's ok, because `next` only has a short (< 100 ms) delay. And we're
just about to spawn a separate task for each handshake.

* Spawn a separate task for each handshake

This change avoids deadlocks by letting each handshake make progress
independently.

* Move the dial task into a separate function

This refactor improves readability.

* Fix buggy future::select function usage

And document the correctness of the new code.
2021-04-07 10:25:10 -03:00
teor de6d1c93f3
Clarify a comment 2021-04-07 18:56:38 +10:00
teor 83b88f5b7a
Merge pull request #1972 from ZcashFoundation/peer-set-demand-deadlock-doc
Document peer set deadlock resistance
2021-04-01 22:50:17 -04:00
teor 306fa88214 Document the correctness of Poll::Pending wakeups 2021-03-27 08:55:49 -04:00
teor 1a159dfcb6 Add more methods for creating MetaAddrs
This refactor lets us remove `MetaAddr::update_last_seen()`.
2021-03-26 07:23:49 +10:00
teor 6fe81d8992 Make MetaAddr.last_seen into a private field 2021-03-26 07:23:49 +10:00
teor 5a30268d7a Log address metrics when the peer set has no ready peers 2021-03-17 10:47:04 +10:00
Jack Grigg e51f33a4b9 Use interoperable names for common metrics
These names match the equivalent metrics in zcashd, enabling common
metrics to be collected across both node types.
2021-03-17 09:38:07 +10:00
teor e50692bd51 CandidateSet: Add Listener Port Connections
Inbound connections on the Zcash protocol listener port
perform a handshake. If the handshake is successful, it
adds the peer to the AddressBook.
2021-03-09 23:05:18 -05:00
Jane Lusby 03aa6f671f
Implement outbound connection rate limiting - includes config rename with alias (#1855)
* Implement outbound connection rate limiting
* fix breaking change on config

Co-authored-by: teor <teor@riseup.net>
2021-03-10 01:36:05 +00:00
teor d4f2f27218
Add global span to spawned network tasks (#1761)
Closes #1575
2021-02-20 08:36:50 +10:00
teor e61b5e50a2
Diagnostics for CI port conflict failures (#1766)
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.

Also report the configured and actual ports where possible, for better
diagnostics.
2021-02-18 12:15:09 -03:00
teor 5424e1d8ba
Fix candidate set address state handling (#1709)
Design:
- Add a `PeerAddrState` to each `MetaAddr`
- Use a single peer set for all peers, regardless of state
- Implement time-based liveness as an `AddressBook` method, rather than
  a `PeerAddrState` variant
- Delete `AddressBook.by_state`

Implementation:
- Simplify `AddressBook` changes using `update` and `take` modifier
  methods
- Simplify the `AddressBook` iterator implementation, replacing it with
  methods that are more obviously correct
- Consistently collect peer set metrics

Documentation:
- Expand and update the peer set documentation

We can optimise later, but for now we want simple code that is more
obviously correct.
2021-02-18 11:18:32 +10:00
teor 86169f6412
Update PeerSet metrics after every change (#1727) 2021-02-18 07:06:59 +10:00
teor 8d1c498234 Log initial peer connection failures
And standardise another log message
2021-02-17 09:21:53 -05:00
teor e85441c914 Add a correctness comment to justify the revert 2021-02-16 05:52:54 +10:00
teor a02a00a3f5 Revert "Stop using CallAllUnordered in peer_set::add_initial_peers (#1705)"
This reverts commit 241c7ad849.
2021-02-16 05:52:54 +10:00
Alfredo Garcia 241c7ad849
Stop using CallAllUnordered in peer_set::add_initial_peers (#1705)
* use ServiceExt::oneshot and FuturesUnordered

Co-authored-by: teor <teor@riseup.net>
2021-02-09 08:16:02 +10:00
Alfredo Garcia 221512c733
Async DNS seeder lookups (#1662)
* replace to_socket_addrs
* refactor `resolve()` into `resolve_host()`
* use `resolve_host()` to resolve config peers
* add DNS_LOOKUP_TIMEOUT constant
* don't block the main thread in initialize
2021-02-03 12:20:26 +10:00
Alfredo Garcia 4b34482264
Add hints to port conflict and lock file panics (#1535)
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests

* Add a full set of stderr acceptance test functions

This change makes the stdout and stderr acceptance test interfaces
identical.

* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path

* Use Display for state path logs

Avoids weird escaping on Windows when using Debug

* Add Windows conflict error messages

* Turn PORT_IN_USE_ERROR into a regex

And add another alternative Windows-specific port error

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
2021-01-29 22:36:33 +10:00