Commit Graph

184 Commits

Author SHA1 Message Date
teor c608260256
Support witnessed transaction IDs in zebra-network requests and responses (#2638)
* Rename internal network requests for wide transaction IDs

fastmod TransactionsByHash TransactionsById zebra*
fastmod AdvertiseTransactions AdvertiseTransactionIds zebra*
fastmod MempoolTransactions MempoolTransactionIds zebra*
fastmod TransactionHashes TransactionIds zebra*

* Update network transaction request/response comments

* Rename a transaction hash method for wide transaction IDs

fastmod transaction_hashes transaction_ids zebra-network

* Add UnminedTxId methods and conversions for InventoryHash

* Map WtxIds to unmined transaction network messages

Also, use UnminedTxId and UnminedTx in:
* Zebra's internal request and response format, and
* external Zcash network protocol messages.

* Enable WtxId mempool inventory tracking for peers

* Further clarify transaction IDs

* Use Witnessed rather than Wide for transaction IDs

And rename narrow to legacy when it only applies to v1-v4 transactions.
Otherwise, rename it to mined ID.

* Rename a missed binding
* Remove an incorrectly named binding

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2021-08-18 22:55:24 +00:00
teor 1a57023eac
Security: Use canonical SocketAddrs to avoid duplicate peer connections, Feature: Send local listener to peers (#2276)
* Always send our local listener with the latest time

Previously, whenever there was an inbound request for peers, we would
clone the address book and update it with the local listener.

This had two impacts:
- the listener could conflict with an existing entry,
  rather than unconditionally replacing it, and
- the listener was briefly included in the address book metrics.

As a side-effect, this change also makes sanitization slightly faster,
because it avoids some useless peer filtering and sorting.

* Skip listeners that are not valid for outbound connections

* Filter sanitized addresses Zebra based on address state

This fix correctly prevents Zebra gossiping client addresses to peers,
but still keeps the client in the address book to avoid reconnections.

* Add a full set of DateTime32 and Duration32 calculation methods

* Refactor sanitize to use the new DateTime32/Duration32 methods

* Security: Use canonical SocketAddrs to avoid duplicate connections

If we allow multiple variants for each peer address, we can make multiple
connections to that peer.

Also make sure sanitized MetaAddrs are valid for outbound connections.

* Test that address books contain the local listener address

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2021-06-22 02:16:59 +00:00
Alfredo Garcia 96a1b661f0
Rate limit initial genesis block download retries, Credit: Equilibrium (#2255)
* implement and test a rate limit in `request_genesis()`
* add `request_genesis_is_rate_limited` test to sync
* add ensure_timeouts constraint for GENESIS_TIMEOUT_RETRY
* Suppress expected warning logs in zebrad tests

Co-authored-by: teor <teor@riseup.net>
2021-06-09 23:39:51 +00:00
teor 92828bbb29 Reliability: send local listener address to peers
When peers ask for peer addresses, add our local listener address to the
set of addresses, sanitize, then truncate. Sanitize shuffles addresses,
so if there are lots of addresses in the address book, our address will
only be sent to some peers.
2021-05-18 14:02:19 +10:00
teor 74e155ff9f
Spelling: gossipped -> gossiped (#2119) 2021-05-07 13:01:11 +02:00
Kirill Fomichev 5b2f1cdfd5
Add journald support through tracing-journald (#2034)
* Add journald support through tracing-journald

* change journald to use_journald

* more fixes
2021-04-22 09:31:06 +10:00
teor 96b3c94dbc
Add the new commit count and git hash to the version (#2038)
* Use the git version + new commit count + hash for the app version

This helps diagnose bugs in versions of Zebra built from git branches,
rather than git version tags.

* Fill in assert

* Also log semver string

* Fix syntax

* Handle vergen using the cargo package version or raw git tag

* s/Semver/SemVer/

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
2021-04-21 22:14:36 +00:00
teor 0203d1475a Refactor and document correctness for std::sync::Mutex<AddressBook> 2021-04-21 17:14:47 -04:00
Kirill Fomichev 43e792b9a4
Update to vergen 5, add branch, commit time, and build target to the panic metadata, automatically update app version from crate version (#2029)
* build(deps): bump vergen from 3.2.0 to 5.1.1

* fix hardcoded version for Tracing struct

* add additional metadata

* remove extra allocations for metadata

* Remove zebrad code version from release checklist

The zebrad code automatically uses the crate version now.

* Sort panic metadata into rough categories

Co-authored-by: teor <teor@riseup.net>
2021-04-20 06:48:14 +10:00
teor a417c7c8c7 Use meaningful names for select! variables 2021-04-13 23:56:16 -04:00
Alfredo Garcia 5ec05e91e1 update version strings for v1.0.0-alpha.6 2021-04-08 18:48:34 -04:00
teor 306fa88214 Document the correctness of Poll::Pending wakeups 2021-03-27 08:55:49 -04:00
teor 829a6f11c5 Document the behaviour of the `select!` macro 2021-03-27 08:55:49 -04:00
Deirdre Connolly ca1d2de87d
Bump versions for v1.0.0-alpha.5 (#1932)
Zebra's latest alpha checkpoints on Canopy activation, continues our work on NU5, and fixes a security issue.

Some notable changes include:

## Added
- Log address book metrics when PeerSet or CandidateSet don't have many peers (#1906)
- Document test coverage workflow (#1919)
- Add a final job to CI, so we can easily require all the CI jobs to pass (#1927)

## Changed
- Zebra has moved its mandatory checkpoint from Sapling to Canopy (#1898, #1926)
  - This is a breaking change for users that depend on the exact height of the mandatory checkpoint.

## Fixed
- tower-batch: wake waiting workers on close to avoid hangs (#1908)
- Assert that pre-Canopy blocks use checkpointing (#1909)
- Fix CI disk space usage by disabling incremental compilation in coverage builds (#1923)

## Security
- Stop relying on unchecked length fields when preallocating vectors (#1925)
2021-03-22 22:05:01 -04:00
Alfredo Garcia d49eaab68e
Bump versions for zebrad 1.0.0-alpha.4 (#1913)
* Bump versions for zebrad 1.0.0-alpha.4

* add Cargo.lock
2021-03-16 21:12:37 -03:00
Jack Grigg bae9a7ecd5 Expose binary data in metrics
This enables slicing and aggregating metrics based on zebrad version:
https://www.robustperception.io/exposing-the-software-version-to-prometheus
2021-03-17 09:38:07 +10:00
teor d494af1e90 Document how the syncer resists memory DoS 2021-03-11 06:24:46 -05:00
teor c6358b157c Reduce inbound concurrency to limit memory usage
Inbound malicious blocks can use a large amount of RAM when
deserialized. Limit inbound concurrency, so that the total amount
of RAM remains small.
2021-03-11 06:24:46 -05:00
teor 7558f74c78 Bump versions for zebrad 1.0.0-alpha.3 2021-02-23 10:39:13 -05:00
teor e61b5e50a2
Diagnostics for CI port conflict failures (#1766)
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.

Also report the configured and actual ports where possible, for better
diagnostics.
2021-02-18 12:15:09 -03:00
teor 972103d797 Fix tracing macro syntax 2021-02-17 11:09:22 -05:00
teor 253d1c02b3 Make sync logging a bit less verbose
And tweak some log content
2021-02-17 11:09:22 -05:00
teor cc7d5bd2ad
Update comments for the inbound service (#1740) 2021-02-16 06:14:40 +10:00
teor 372a432179
Update the call_all comment in Inbound (#1737) 2021-02-16 06:14:16 +10:00
teor 0b76352468
Document a state_contains bug (#1715)
* Document a state_contains bug in the syncer and Inbound
2021-02-10 09:05:14 +10:00
Deirdre Connolly 0c5daa8410 Bump versions for zebrad 1.0.0-alpha.2
Including tower-batch bump to 0.2.0, tower-fallback to 0.2.0, zebra-script to 1.0.0-alpha.3
2021-02-09 16:14:29 -05:00
teor dce11358d7
Log when the syncer awaits peer readiness (#1714) 2021-02-10 07:09:27 +10:00
Alfredo Garcia d7c40af2a8
Fix shutdown panics (#1637)
* add a shutdown flag in zebra_chain::shutdown
* fix network panic on shutdown
* fix checkpoint panic on shutdown
2021-02-03 19:03:28 +10:00
teor 6679a124e3 Require Inbound setup handlers to provide a result
Rather than having them default to `Ok(())`, which is incorrect
for some error handlers.
2021-02-03 08:32:10 +10:00
teor 09c8c89462 Make sure FailedInit never escapes Inbound::poll_ready 2021-02-03 08:32:10 +10:00
teor 134a5e78bd Consistently use `network_setup` for the Inbound Setup 2021-02-03 08:32:10 +10:00
teor 1c8362fe01 Remove unused imports 2021-02-03 08:32:10 +10:00
Jane Lusby 4cf331562c combine network setup into an exhaustive match 2021-02-03 08:32:10 +10:00
Jane Lusby 4d6ef89248 avoid using async blocks to avoid lifetime bug with generators 2021-02-03 08:32:10 +10:00
Jane Lusby 685a592399 Add clonable wrapper around TryRecvError 2021-02-03 08:32:10 +10:00
teor 6ffeb670ed Log the failed response in an unreachable panic 2021-02-03 08:32:10 +10:00
teor eac4fd181a Add a Setup enum to manage Inbound network setup internal state
This change encodes a bunch of invariants in the type system,
and adds explicit failure states for:
* a closed oneshot,
* bugs in the initialization code.
2021-02-03 08:32:10 +10:00
teor 32b032204a Consistently return Response::Nil during setup
And log an info-level message as a diagnostic, in case setup takes a
long time.
2021-02-03 08:32:10 +10:00
teor 94eb91305b Stop using ServiceExt::call_all due to buffer bugs
ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use
it in Zebra.

Instead, use a loop in the returned future.

See #1593 for details.
2021-02-03 08:32:10 +10:00
teor 64bc45cd2e Fix state readiness hangs for Inbound
Use `ServiceExt::oneshot` to perform state requests.

Explain that `ServiceExt::call_all` calls `poll_ready` internally.
Document a state service invariant imposed by `ServiceExt::call_all`.
2021-02-03 08:32:10 +10:00
teor 4d1a2fd02e Make the Inbound invariant clearer 2021-02-03 08:32:10 +10:00
teor 2a25b9ee72 Remove services that are never `call`ed from Inbound
Uses the `ServiceExt::oneshot` design pattern from #1593.
2021-02-03 08:32:10 +10:00
Alfredo Garcia 4b34482264
Add hints to port conflict and lock file panics (#1535)
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests

* Add a full set of stderr acceptance test functions

This change makes the stdout and stderr acceptance test interfaces
identical.

* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path

* Use Display for state path logs

Avoids weird escaping on Windows when using Debug

* Add Windows conflict error messages

* Turn PORT_IN_USE_ERROR into a regex

And add another alternative Windows-specific port error

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
2021-01-29 22:36:33 +10:00
teor 21b0360114 Limit concurrent inbound gossipped block requests
Uses the "load shed directly" design pattern from #1618.
2021-01-29 11:02:26 +10:00
teor 3d9888f736 Rewrite a sync comment 2021-01-29 11:02:26 +10:00
Deirdre Connolly 1b09538277
Bump versions for zebrad 1.0.0-alpha.1 (#1646)
* Bump versions where appropriate

Tested with cargo install --locked --path etc

* Remove fixed panics from 'Known Issues'

* Change to alpha release series in the README

Co-authored-by: teor <teor@riseup.net>
2021-01-27 20:31:39 -05:00
teor 391c53aa60 Move BoxError to zebrad's lib.rs
For consistency with other crates.
2021-01-27 12:14:27 -08:00
teor 9cdf41f5f4
Panic if the lookahead limit is misconfigured (#1589) 2021-01-14 14:06:30 +10:00
teor 92d95d4be5 Refactor inbound members into a consistent order
And add download comments
2021-01-13 20:46:25 -05:00
teor fb76eb2e6b Add download and verify timeouts to the inbound service 2021-01-13 20:46:25 -05:00
teor 973aec8ccc Refactor sync members into a consistent order
And add comments about correctness and usage.
2021-01-13 20:46:25 -05:00
teor c2893dce51 Warn when the user's configured lookahead limit is ignored 2021-01-13 20:46:25 -05:00
teor 3699bbdae6 Add some additional sync correctness constraints
And adjust the sync restart delay as a consequence.
2021-01-13 20:46:25 -05:00
teor cef0a492d8 Add a timeout to sync service block verification
This timeout stops the sync service hanging when it is missing required
blocks, but the lookahead queue is full of dependent verify tasks, so the
missing blocks never get downloaded.
2021-01-13 20:46:25 -05:00
teor b1f14f47c6
Rewrite GetData handling to match the zcashd implementation (#1518)
* Rewrite GetData handling to match the zcashd implementation

`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
e7b425298f/src/main.cpp (L5497)

This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.

Also change Zebra's GetData responses to peer request so they ignore
missing blocks.

* Stop hanging on incomplete transaction or block responses

Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
   that were successfully received
2. if none of the expected blocks or transactions were received, return
   an error, and close the connection
2021-01-04 13:25:35 +10:00
teor 69fcf64d6c
Disable issue URLs for "duplicate hash" errors (#1517)
In our README, we tell users to ignore these errors, so we should also
disable the issue URL.

Also include the hash in the error. (We don't want the span active for
all messages, we just want the hash in the error.)
2020-12-16 08:14:42 +10:00
Alfredo Garcia 41833340c1
downgrade remaining version strings to 1.0.0-alpha.0 (#1488) 2020-12-15 11:21:00 +10:00
Deirdre Connolly 44e1051dee Debug 2020-12-09 13:06:18 -05:00
Deirdre Connolly 25f6fd25b3 Test catching panic 2020-12-09 13:06:18 -05:00
teor 97d1a81b7c Automatically disable colors when tracing to a file 2020-12-02 10:25:44 -08:00
Jane Lusby fceef849cf remove unused mutability to defuse deadlock 2020-12-01 11:03:13 -05:00
Henry de Valence 1df9284444 zebrad: add a use_color option to the tracing config.
This is useful for creating searchable logs without having to filter color codes after the fact.
2020-11-30 15:25:50 -08:00
Henry de Valence e8c16b172f zebrad: pass TracingSection to Tracing component 2020-11-30 15:25:50 -08:00
Alfredo Garcia 4544463059
Inbound `FindBlocks` and `FindHeaders` (#1347)
* implement inbound `FindBlocks`
* Handle inbound peer FindHeaders requests
* handle request before having any chain tip
* Split `find_chain_hashes` into smaller functions

Add a `max_len` argument to support `FindHeaders` requests.

Rewrite the hash collection code to use heights, so we can handle the
`stop` hash and "no intersection" cases correctly.

* Split state height functions into "any chain" and "best chain"
* Rename the best chain block method to `best_block`
* Move fmt utilities to zebra_chain::fmt
* Summarise Debug for some Message variants

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-12-01 07:30:37 +10:00
Henry de Valence fa02b266ca clippy 2020-11-25 10:55:44 -08:00
Henry de Valence de8415dcb1 tidy spans 2020-11-25 10:55:44 -08:00
Henry de Valence 05837797b1 tidy imports 2020-11-25 10:55:44 -08:00
Henry de Valence 77bf327b07 fix errors (2) 2020-11-25 10:55:44 -08:00
Henry de Valence 527f4d39ed fix errors 2020-11-25 10:55:44 -08:00
Henry de Valence e645e3bf0c remove async 2020-11-25 10:55:44 -08:00
Henry de Valence 6569977549 test compile change 2020-11-25 10:55:44 -08:00
Alfredo Garcia 486e55104a create Downloads for Inbound 2020-11-25 10:55:44 -08:00
Henry de Valence 2a4a89c002 state,zebrad: tidy span levels for good INFO output
This provides useful and not too noisy output at INFO level.  We do an
info-level message on every block commit instead of trying to do one
message every N blocks, because this is useful both for initial block
sync as well as continuous state updates on new blocks.
2020-11-23 14:16:39 +10:00
Henry de Valence f0810b028d state,consensus,sync: shorten span lengths
These changes help reduce the size of the resulting spans, making the
output more compact.  Together they save about 30-40 characters.
2020-11-23 14:16:39 +10:00
Henry de Valence ba3c19142c deps: update hyper, metrics to tokio 0.3
The metrics code becomes much simpler because the current version of the
metrics crate builds its own single-threaded runtime on a dedicated worker
thread, so no dependency on the main Zebra Tokio runtime is required.
2020-11-20 10:08:16 -08:00
Henry de Valence add94c1c45 deps: move to tokio 0.3, tower 0.4
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware.  This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method.  See

ddc64e8d4d

for more context on the Tower changes.  To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
2020-11-20 10:08:16 -08:00
Henry de Valence 4953f21670 fixup! zebrad: hack to skip alreadyverified errors 2020-11-18 03:09:06 -05:00
Henry de Valence aa7538ab15 zebrad: hack to skip alreadyverified errors 2020-11-17 14:56:27 -08:00
Henry de Valence e55392b61e zebrad: explicitly select the threaded scheduler. 2020-11-17 14:56:27 -08:00
Henry de Valence 6de824bd99 zebrad: remove block verification timeout
Because we set the lookahead limit to be at least twice the size of a checkpoint, we don't have a risk of timeouts.
2020-11-17 14:56:27 -08:00
Henry de Valence e9c847bbd7 zebrad: avoid a borrow in the ChainSync future 2020-11-17 14:56:27 -08:00
Henry de Valence b632a24436 zebrad: add diagnostics on cancelled download tasks 2020-11-17 14:56:27 -08:00
Henry de Valence ec411574ee zebrad: improve sync diagnostics 2020-11-17 14:56:27 -08:00
Henry de Valence e0c92167bc Revert "Hedge every syncer block download request"
This reverts commit 656bd24ba7.

The Hedge middleware keeps a pair of histograms, writing into one in the
current time interval and reading from the previous time interval's
data.  This means that the reverted change resulted in doubling all
block downloads until after at least the second measurement interval
(which means that the time measurements are also incorrect, as they're
operating under double the network load...)
2020-11-12 16:45:47 -05:00
Alfredo Garcia 128643d81e
Call `zebra_test::init` where needed. (#1227)
* Add missing `zebra_test::init()` to zebra-chain
* Add missing `zebra_test::init()` to zebra-consensus
* Add missing `zebra_test::init()` to zebra-network
* Add missing `zebra_test::init()` to zebra-state
* Add missing `zebra_test::init()` to zebra-test
* Add missing `zebra_test::init()` to zebrad
2020-11-10 10:29:25 +10:00
Henry de Valence 0ad648fb6a zebrad: make lookahead limit configurable.
Sets the default value to the previous lookahead limit.  My testing on
mainnet suggested that the newly lower value (changed when the
checkpoint frequency was decreased) is low enough to cause stalls, even
when using hedged requests.
2020-11-01 10:47:46 -08:00
teor 92c623eddf Log each genesis download
This change helps us diagnose sync hangs.
2020-10-28 11:31:04 -04:00
teor 656bd24ba7 Hedge every syncer block download request
Remove the minimum data points from the syncer hedge configuragtion.
When there are no data points, hedge sends the second request
immediately.

Where there are less than 1/(1-latency_percentile) data points (20),
hedge delays the second request by the highest recent download time.

This change should improve genesis and post-restart sync latency.
2020-10-28 11:31:04 -04:00
Henry de Valence 4c960c4e6d zebrad: treat duplicate downloads as an error
We should error if we notice that we're attempting to download the same
blocks multiple times, because that indicates that peers reported bad
information to us, or we got confused trying to interpret their
responses.
2020-10-26 12:05:35 -07:00
Henry de Valence 4127d086ea zebrad: clarify hedge layering motivation
Co-authored-by: teor <teor@riseup.net>
2020-10-26 12:05:35 -07:00
Henry de Valence 253bab042e sync: add a concurrency limit for block downloads 2020-10-26 12:05:35 -07:00
Henry de Valence 0a405c737d zebrad: check state in obtaintips, not extendtips.
The original sync algorithm split the sync process into two phases, one
that obtained prospective chain tips, and another that attempted to
extend those chain tips as far as possible until encountering an error
(at which point the prospective state is discarded and the process
restarts).

Because a previous implementation of this algorithm didn't properly
enforce linkage between segments of the chain while extending tips,
sometimes it would get confused and fail to discard responses that did
not extend a tip.  To mitigate this, a check against the state was
added.  However, this check can cause stalls while checkpointing,
because when a checkpoint is reached we may suddenly need to commit
thousands of blocks to the state.  Because the sync algorithm now has a
a `CheckedTip` structure that ensures that a new segment of hashes
actually extends an existing one, we don't need to check against the
state while extending a tip, because we don't get confused while
interpreting responses.

This change results in significantly smoother progress on mainnet.
2020-10-26 12:05:35 -07:00
Henry de Valence ce2ac3336f zebrad: add debug message before state check
This reveals that there may be contention in access to the state, as
this takes a long time.
2020-10-26 12:05:35 -07:00
Henry de Valence 91469faf3c zebrad: eliminate duplicate span in sync 2020-10-26 12:05:35 -07:00
Henry de Valence b5a43f4516 zebrad: remove implementation details from docs
The timeout behavior in zebra-network is an implementation detail, not a
feature of the public API.  So it shouldn't be mentioned in the doc
comments -- if we want timeout behavior, we have to layer it ourselves.
2020-10-26 12:05:35 -07:00
Henry de Valence 1d7309afe2 zebrad: correctly handle duplicates in DownloadSet
Using the cancel_handles, we can deduplicate requests.  This is
important to do, because otherwise when we insert the second cancel
handle, we'd drop the first one, cancelling an existing task for no
reason.
2020-10-26 12:05:35 -07:00
Henry de Valence 56fe4f4379 zebrad: unify sync restart logic
This lets us keep the main loop simple and just write `continue 'sync;`
to keep going.
2020-10-26 12:05:35 -07:00
Henry de Valence 12d25159c6 zebrad: use hedged requests in sync
The hedge middleware implements hedged requests, as described in _The
Tail At Scale_. The idea is that we auto-tune our retry logic according
to the actual network conditions, pre-emptively retrying requests that
exceed some latency percentile. This would hopefully solve the problem
where our timeouts are too long on mainnet and too slow on testnet.
2020-10-26 12:05:35 -07:00
Henry de Valence 5f229d1475 zebrad: use Downloads in sync
Try to use the better cancellation logic to revert to previous sync
algorithm.  As designed, the sync algorithm is supposed to proceed by
downloading state prospectively and handle errors by flushing the
pipeline and starting over.  This hasn't worked well, because we didn't
previously cancel tasks properly.  Now that we can, try to use something
in the spirit of the original sync algorithm.
2020-10-26 12:05:35 -07:00
Henry de Valence b90581a3d7 zebrad: create a Downloads Stream for syncing.
This makes two changes relative to the existing download code:

1.  It uses a oneshot to attempt to cancel the download task after it
    has started;

2.  It encapsulates the download creation and cancellation logic into a
    Downloads struct.
2020-10-26 12:05:35 -07:00