Given a generated list of gossiped peers, ensure that after running the
`validate_addrs` function none of the resulting peers have a `last_seen`
time that's after the specified limit.
If the calculation to apply the compensation offset overflows or
underflows, the reported times are too distant apart, and could be sent
on purpose by a malicious peer, so all addresses from that peer should
be rejected.
Use some mock gossiped peers where some have `last_seen` times in the
past and some have times in the future. Check that all the peers have
an offset applied to them by the `validate_addrs` function.
This tests if the offset is applied to all peers that a malicious peer
gossiped to us.
Use some mock gossiped peers that all have `last_seen` times in the
past and check that they don't have any changes to the `last_seen` times
applied by the `validate_addrs` function.
If any of the times gossiped by a peer are in the future, apply the
necessary offset to all the times gossiped by that peer. This ensures
that all gossiped peers from a malicious peer are moved further back in
the queue.
Co-authored-by: teor <teor@riseup.net>
- Make the security impact clearer and in a separate section.
- Instead of listing an assumption as almost a side-note, describe it
clearly inside a `Panics` section.
Co-authored-by: teor <teor@riseup.net>
Due to clock skew, the peers could end up at the front of the
reconnection queue or far at the back. The solution to this is to offset
the reported times by the difference between the most recent reported
sight (in the remote clock) and the current time (in the local clock).
Returning `impl IntoIterator` means that the caller will always be
forced to call `.into_iter()`, and returning `impl Iterator` still
allows them to call `.into_iter()` because it becomes the identity
function.
* Refactor: Split CandidateSet::update into separate functions
* Security: Apply a timeout to the entire CandidateSet::update
* Security: Stop using very large fanout limits during initialization
Previously, Zebra used the number of resolved peer addresses.
So it was possible for all peers to fail, and for Zebra to hang on the
first update.
And Zebra could send a fanout for each initial peer, regardless
of whether their connection was successful.
Also:
- wait for at least one successful peer before trying an update
- warn if there are no successful initial peers
When peers ask for peer addresses, add our local listener address to the
set of addresses, sanitize, then truncate. Sanitize shuffles addresses,
so if there are lots of addresses in the address book, our address will
only be sent to some peers.
If there is a small number of initial peers, and they are slow, the
initial candidate set update can appear to hang. To avoid this issue,
limit the initial candidate set fanout to the number of initial peers.
Once the initial peers have sent us more peer addresses, there is no need
to limit the fanouts for future updates.
Reported by Niklas Long of Equilibrium.
* Instrument the crawl task
When we created the crawl task, we forgot to instrument it with the
global span. This fix makes sure that the git and network span appears on
crawl logs.
* Instrument the connector
* Improve handshake instrumentation
Make some spans debug, so there are not too many spans.
* Add the address to initial peer connection errors
- stop putting inbound addresses in the address book
- drop address book entries that can't be used for outbound connections
- distinguish between temporary inbound and permanent outbound peer
addresses
- also create variants to handle proxy connections
(but don't use them yet)
- avoid tracking connection state for isolated connections
- document security constraints for the address book and peer set
* Allow use listen address in config without port
* update comments
* remove not used alias
* use Network::default_port
* Move tests and use toml instead json
* change error message
* Make match more readable
Co-authored-by: teor <teor@riseup.net>
* Stop ignoring inbound message errors and handshake timeouts
To avoid hangs, Zebra needs to maintain the following invariants in the
handshake and heartbeat code:
- each handshake should run in a separate spawned task
(not yet implemented)
- every message, error, timeout, and shutdown must update the peer address state
- every await that depends on the network must have a timeout
Once the Connection is created, it should handle timeouts.
But we need to handle timeouts during handshake setup.
* Avoid hangs by adding a timeout to the candidate set update
Also increase the fanout from 1 to 2, to increase address diversity.
But only return permanent errors from `CandidateSet::update`, because
the crawler task exits if `update` returns an error.
Also log Peers response errors in the CandidateSet.
* Use the select macro in the crawler to reduce hangs
The `select` function is biased towards its first argument, risking
starvation.
As a side-benefit, this change also makes the code a lot easier to read
and maintain.
* Split CrawlerAction::Demand into separate actions
This refactor makes the code a bit easier to read, at the cost of
sometimes blocking the crawler on `candidates.next()`.
That's ok, because `next` only has a short (< 100 ms) delay. And we're
just about to spawn a separate task for each handshake.
* Spawn a separate task for each handshake
This change avoids deadlocks by letting each handshake make progress
independently.
* Move the dial task into a separate function
This refactor improves readability.
* Fix buggy future::select function usage
And document the correctness of the new code.
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.
Also report the configured and actual ports where possible, for better
diagnostics.
Design:
- Add a `PeerAddrState` to each `MetaAddr`
- Use a single peer set for all peers, regardless of state
- Implement time-based liveness as an `AddressBook` method, rather than
a `PeerAddrState` variant
- Delete `AddressBook.by_state`
Implementation:
- Simplify `AddressBook` changes using `update` and `take` modifier
methods
- Simplify the `AddressBook` iterator implementation, replacing it with
methods that are more obviously correct
- Consistently collect peer set metrics
Documentation:
- Expand and update the peer set documentation
We can optimise later, but for now we want simple code that is more
obviously correct.
* replace to_socket_addrs
* refactor `resolve()` into `resolve_host()`
* use `resolve_host()` to resolve config peers
* add DNS_LOOKUP_TIMEOUT constant
* don't block the main thread in initialize
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests
* Add a full set of stderr acceptance test functions
This change makes the stdout and stderr acceptance test interfaces
identical.
* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path
* Use Display for state path logs
Avoids weird escaping on Windows when using Debug
* Add Windows conflict error messages
* Turn PORT_IN_USE_ERROR into a regex
And add another alternative Windows-specific port error
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>