Zebra avoids having a majority of addresses from a single peer by asking
3 peers for new addresses.
Also update a bunch of security comments and related documentation.
* Make proptest dependencies consistent between chain and network
* Implement Arbitrary for InventoryHash and use it in tests
* Impl Arbitrary for MetaAddr and use it in tests
Also test some extreme times in MetaAddr sanitization.
* Stop ignoring inbound message errors and handshake timeouts
To avoid hangs, Zebra needs to maintain the following invariants in the
handshake and heartbeat code:
- each handshake should run in a separate spawned task
(not yet implemented)
- every message, error, timeout, and shutdown must update the peer address state
- every await that depends on the network must have a timeout
Once the Connection is created, it should handle timeouts.
But we need to handle timeouts during handshake setup.
* Avoid hangs by adding a timeout to the candidate set update
Also increase the fanout from 1 to 2, to increase address diversity.
But only return permanent errors from `CandidateSet::update`, because
the crawler task exits if `update` returns an error.
Also log Peers response errors in the CandidateSet.
* Use the select macro in the crawler to reduce hangs
The `select` function is biased towards its first argument, risking
starvation.
As a side-benefit, this change also makes the code a lot easier to read
and maintain.
* Split CrawlerAction::Demand into separate actions
This refactor makes the code a bit easier to read, at the cost of
sometimes blocking the crawler on `candidates.next()`.
That's ok, because `next` only has a short (< 100 ms) delay. And we're
just about to spawn a separate task for each handshake.
* Spawn a separate task for each handshake
This change avoids deadlocks by letting each handshake make progress
independently.
* Move the dial task into a separate function
This refactor improves readability.
* Fix buggy future::select function usage
And document the correctness of the new code.
* Move the preallocate tests into their own files
And move the MetaAddr proptest into its own file.
Also do some minor formatting and cleanups.
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
* Implement SafePreallocate. Resolves#1880
* Add proptests for SafePreallocate
* Apply suggestions from code review
Comments which did not include replacement code will be addressed in a follow-up commit.
Co-authored-by: teor <teor@riseup.net>
* Rename [Safe-> Trusted]Allocate. Add doc and tests
Add tests to show that the largest allowed vec under TrustedPreallocate
is small enough to fit in a Zcash block/message (depending on type).
Add doc comments to all TrustedPreallocate test cases.
Tighten bounds on max_trusted_alloc for some types.
Note - this commit does NOT include TrustedPreallocate
impls for JoinSplitData, String, and Script.
These impls will be added in a follow up commit
* Implement SafePreallocate. Resolves#1880
* Add proptests for SafePreallocate
* Apply suggestions from code review
Comments which did not include replacement code will be addressed in a follow-up commit.
Co-authored-by: teor <teor@riseup.net>
* Rename [Safe-> Trusted]Allocate. Add doc and tests
Add tests to show that the largest allowed vec under TrustedPreallocate
is small enough to fit in a Zcash block/message (depending on type).
Add doc comments to all TrustedPreallocate test cases.
Tighten bounds on max_trusted_alloc for some types.
Note - this commit does NOT include TrustedPreallocate
impls for JoinSplitData, String, and Script.
These impls will be added in a follow up commit
* Impl TrustedPreallocate for Joinsplit
* Impl ZcashDeserialize for Vec<u8>
* Arbitrary, TrustedPreallocate, Serialize, and tests for Spend<SharedAnchor>
Co-authored-by: teor <teor@riseup.net>
`sanitize` could be misused in two ways:
* accidentally modifying the addresses in the address book itself
* forgetting to sanitize new fields added to `MetaAddr`
This change prevents accidental modification by taking `&self`, and
explicitly creates a new sanitized `MetaAddr` with all fields listed.
Zebra's latest alpha checkpoints on Canopy activation, continues our work on NU5, and fixes a security issue.
Some notable changes include:
## Added
- Log address book metrics when PeerSet or CandidateSet don't have many peers (#1906)
- Document test coverage workflow (#1919)
- Add a final job to CI, so we can easily require all the CI jobs to pass (#1927)
## Changed
- Zebra has moved its mandatory checkpoint from Sapling to Canopy (#1898, #1926)
- This is a breaking change for users that depend on the exact height of the mandatory checkpoint.
## Fixed
- tower-batch: wake waiting workers on close to avoid hangs (#1908)
- Assert that pre-Canopy blocks use checkpointing (#1909)
- Fix CI disk space usage by disabling incremental compilation in coverage builds (#1923)
## Security
- Stop relying on unchecked length fields when preallocating vectors (#1925)
Zebra already uses `Read::take` to enforce message, body, and block
maximum sizes.
So using `Read::take` on untrusted sizes can result in short reads,
without a corresponding `UnexpectedEof` error. (The old code was
correct, but copying it elsewhere would have been risky.)
* Add NU5 variant to NetworkUpgrade
* Add consensus branch ID for NU5
* Add network protocol versions for NU5
* Add NU5 to the protocol::version_consistent test
* Make unimplemented panic messages more specific
* Block target spacing doesn't change in NU5
* add comments for future updates for NU5
Co-authored-by: teor <teor@riseup.net>
* Retry each peer DNS a few times individually
We retry each peer individually, as well as retrying if there are no
peers in the combined list.
DNS failures are correlated, so all peers can fail DNS, leaving Zebra
with a small list of custom-configured IP address peers.
Individual retries avoid this issue.
* Rename parse_peers to resolve_peers
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
- Add an `ExitClient` transition, used when the internal client channel
is closed or dropped, and there are no more pending requests
- Ignore pending requests after an `ExitClient` transition
- Reject pending requests when the peer has caused an error
(the `Exit` and `ExitRequest` transitions)
- Remove `PeerError::ConnectionDropped`, because it is now handled by
`ExitClient`. (Which is an internal error, not a peer error.)
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.
Also report the configured and actual ports where possible, for better
diagnostics.
Design:
- Add a `PeerAddrState` to each `MetaAddr`
- Use a single peer set for all peers, regardless of state
- Implement time-based liveness as an `AddressBook` method, rather than
a `PeerAddrState` variant
- Delete `AddressBook.by_state`
Implementation:
- Simplify `AddressBook` changes using `update` and `take` modifier
methods
- Simplify the `AddressBook` iterator implementation, replacing it with
methods that are more obviously correct
- Consistently collect peer set metrics
Documentation:
- Expand and update the peer set documentation
We can optimise later, but for now we want simple code that is more
obviously correct.
* replace to_socket_addrs
* refactor `resolve()` into `resolve_host()`
* use `resolve_host()` to resolve config peers
* add DNS_LOOKUP_TIMEOUT constant
* don't block the main thread in initialize
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests
* Add a full set of stderr acceptance test functions
This change makes the stdout and stderr acceptance test interfaces
identical.
* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path
* Use Display for state path logs
Avoids weird escaping on Windows when using Debug
* Add Windows conflict error messages
* Turn PORT_IN_USE_ERROR into a regex
And add another alternative Windows-specific port error
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
* Bump versions where appropriate
Tested with cargo install --locked --path etc
* Remove fixed panics from 'Known Issues'
* Change to alpha release series in the README
Co-authored-by: teor <teor@riseup.net>
The clippy unknown lints attribute was deprecated in
nightly in rust-lang/rust#80524. The old lint name now produces a
warning.
Since we're using `allow(unknown_lints)` to suppress warnings, we need to
add the canonical name, so we can continue to build without warnings on
nightly.
But we also need to keep the old name, so we can continue to build
without warnings on stable.
And therefore, we also need to disable the "removed lints" warning,
otherwise we'll get warnings about the old name on nightly.
We'll need to keep this transitional clippy config until rustc 1.51 is
stable.
We can't rule out the connection state changing between the state checks
and any eventual failures, particularly in the presence of async code.
So we turn this panic into a warning.
zebra-network's Connection expects that `fail_with` is only called once
per connection, but the overload handling code continues to process the
current request after an overload error, potentially leading to further
failures.
Closes#1599
## Motivation
This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`.
## Solution
This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up.
## Related Issues
closes https://github.com/ZcashFoundation/zebra/issues/1349
## Follow Up Work
I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible.
- https://github.com/ZcashFoundation/zebra/issues/1582
Co-authored-by: teor <teor@riseup.net>
The `peer::Client` translates `Request`s into `ClientRequest`s, which
it sends to a background task. If the send is `Ok(())`, it will assume
that it is safe to unconditionally poll the `Receiver` tied to the
`Sender` used to create the `ClientRequest`.
We enforce this invariant via the type system, by converting
`ClientRequest`s to `InProgressClientRequest`s when they are received by
the background task. These conversions are implemented by
`ClientRequestReceiver`.
Changes:
* Revert `ClientRequest` so it uses a `oneshot::Sender`
* Add `InProgressClientRequest`, which is the same as `ClientRequest`,
but has a `MustUseOneshotSender`
* `impl From<ClientRequest> for InProgressClientRequest`
* Add a new `ClientRequestReceiver` type that wraps a
`mpsc::Receiver<ClientRequest>`
* `impl Stream<InProgressClientRequest> for ClientRequestReceiver`,
converting the successful result of `inner.poll_next_unpin` into an
`InProgressClientRequest`
* Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection`
with the new `ClientRequestReceiver` type
* `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`
This fix also changes heartbeat behaviour in the following ways:
* if the queue is full, the connection is closed. Previously, the sender
would wait until the queue had emptied
* if the queue flush fails, Zebra panics, because it can't send an error
on the ClientRequest sender, so the invariant is broken