Commit Graph

240 Commits

Author SHA1 Message Date
teor d076b999f3
Fix syncer download order and add sync tests (#3168)
* Refactor so that RetryLimit::Future is std::marker::Sync

* Make the syncer future std::marker::Send by spawning tips futures

* Download synced blocks in chain order, not HashSet order

* Improve MockService failure messages

* Add closure-based responses to the MockService API

* Move MockChainTip to zebra-chain

* Add a MockChainTipSender type alias

* Support MockChainTip in ChainSync and its downloader

* Add syncer tests for obtain tips, extend tips, and wrong block hashes

* Add block too high tests for obtain tips and extend tips

* Add syncer tests for duplicate FindBlocks response hashes

* Allow longer request delays for mocked services in syncer tests
2022-01-11 14:11:35 -03:00
Dimitris Apostolou 1a1ce3dbff
Fix typos (#3314) 2022-01-04 11:25:00 +01:00
teor 3b75e912d1
Add a copy-state zebrad command, which copies blocks between two state services (#3175)
* Add a copy-state command, which copies blocks between two state services

* Check blocks were written correctly

* Add extra logging to debug shutdown

* Add a block height limit argument

* Let the target state start from any height

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2021-12-22 02:07:52 +00:00
teor a4d1a1801c
Security: Drop blocks that are a long way ahead of the tip (#3167)
* Document the chain verifier

* Drop gossiped blocks that are too far ahead of the tip

* Add extra gossiped block metrics

* Allow extra gossiped blocks, now we have a stricter limit

* Fix a comment

* Check the exact number of blocks in a downloaded block response

* Drop synced blocks that are too far ahead of the tip

* Add extra synced block metrics

* Test dropping gossiped blocks that are too far ahead of the tip

* Allow an extra checkpoint's worth of blocks in the verifier queues

* Actually let's try two extra checkpoints

* Scale extra height limit with lookahead limit

* Also drop blocks that are behind the finalized tip

* Downgrade a noisy log

* Use a debug log for already verified gossiped blocks

* Use debug logs for already verified synced blocks
2021-12-17 13:31:51 -03:00
Alfredo Garcia f01e5bb817
Add and use `debug_skip_parameter_preload` config option (#3197)
* add and use a config option to skip groth16 parameters download

* correct doc

* enable parameters download in `sync_past_mandatory_checkpoint` test

* change logging location

* fix import

* add argument to `create_cached_database_height()`

Co-authored-by: teor <teor@riseup.net>
2021-12-14 21:43:07 +00:00
teor 37808eaadb
Security: When there are no new peers, stop crawler using CPU and writing logs (#3177)
* Stop useless crawler attempts when there are no peers and no crawl responses

* Disable GitHub bug report URLs when the disk is full

* Add help text for the `zebrad start` tracing filter option
2021-12-10 00:19:52 +00:00
teor c85ea18b43
Fix slow Zebra startup times, to reduce CI failures (#3104)
* Tweak a log message

* Only retry failed DNS once, then use the other DNS responses

* Limit broadcasts to half the peers

* Use a longer minimum interval for GetAddr requests

* Reduce the syncer and mempool crawler fanouts

* Stop resetting the mempool twice when it starts up

This spawns two crawlers, which send two fanouts,
so it can use up a lot of peers.

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-11-30 21:04:32 +00:00
teor 0ef4629232
Automatically download and load sprout parameters (#3085)
* Download and load Sprout parameters using zcash_proofs

Also update some librustzcash dependencies, to avoid duplicate dependencies.

* Update upstream orchard to avoid a compilation error

* Skip librustzcash batch refactor for now, to avoid compilation errors

* Change the cache ID, so we actually cache Sprout

* Move existing file checks into zcash_proofs

* Add a 1 hour timeout to parameter file downloads

* Give other tasks priority, before spawning the download task

* Update to the latest version of our modified librustzcash fork

* Change the cache key for Sprout

* Add 40 minutes to CI timeouts for occasional sprout downloads

* Update to zcash_proofs with split downloads

* Check file sizes to help debug parameter load failures in zcash_proofs

* Start the second download once the first has finished in zcash_proofs

* Document the parameter download task

* Stop hashing existing files twice
2021-11-25 13:26:32 -03:00
teor 68d7198e9f
Re-order Zebra startup, so slow services are launched last (#3091)
* Start network before verifiers

This makes the Groth16 download task start as late as possible.

* Explain why the Groth16 download must happen first

* Speed up Zebra shutdown: skip waiting for the tokio runtime
2021-11-23 17:42:44 +00:00
teor f7202bfbc0
Download Zcash Sapling parameters and load them from cached files (#3057)
* Replace Zcash parameters crates with pre-downloaded local parameter files

* Download Zcash parameters using the `zcashd` script in CI and Docker

* Add a zcash_proofs dependency to zebra-consensus

* Download Sapling parameters using zcash_proofs, rather than fetch-params.sh

* Add a new `zebrad download` subcommand

This command isn't required for nomrmal usage.
But it's useful when testing, or launching multiple Zebra instances.

* Use `zebrad download` in CI to pre-download parameters

* Log a helpful hint if downloading fails

* Allow some duplicate dependencies currently hidden by orchard

* Spawn a separate task to download Groth16 parameters

* Run the parameter download with code coverage

This avoids re-compining Zebra with and without coverage.

* Update Cargo.lock after rebase

* Try to pass `download` as an argument to `zebrad` in coverage CI

* Fix copy and paste comment typos

* Add path and download examples, like zcash_proofs

* Download params in CI just like zcash_proofs does

* Delete a redundant build step

* Implement graceful shutdown for zebrad start

* Send coverage summary to /dev/null when getting the params path

* Use the correct parameters path and download commands in CI

* Explain pre-downloads

* Avoid calling params_folder twice

* Rename parameter types and methods for consistency

```sh
fastmod SaplingParams SaplingParameters zebra*
fastmod Groth16Params Groth16Parameters zebra*
fastmod PARAMS GROTH16_PARAMETERS zebra*
fastmod params_folder directory zebra*
```

And a manual variable name tweak.

* rustfmt

* Remove a redundant coverage step

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2021-11-19 23:02:56 +00:00
teor 303c8cf5ef
Add a queue checker task, to make sure mempool transactions propagate (#2888)
* Guarantee unique IDs in mempool service responses

* Guarantee unique IDs in crawler task mempool Queue requests

Also update the tests to use unique IDs.

* Add a CheckForVerifiedTransactions mempool request

Also document the mempool request and response variants.

* Spawn a QueueChecker task to check for newly verified transactions

This task makes sure that transactions reliably propagate,
rather than relying on peer requests or responses to trigger propagation.

* Update the start command documentation

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-10-18 19:23:21 +00:00
teor b64ed62777
Add a debug config that enables the mempool (#2862)
* Update some comments

* Add a mempool debug_enable_at_height config

* Rename a field in the mempool crawler

* Propagate syncer channel errors through the crawler

We don't want to ignore these errors, because they might indicate a shutdown.
(Or a bug that we should fix.)

* Use debug_enable_at_height in the mempool crawler

* Log when the mempool is activated or deactivated

* Deny unknown fields and apply defaults for all configs

* Move Duration last, as required for TOML tables

* Add a basic mempool acceptance test

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
2021-10-13 15:04:49 +00:00
teor b274ee4066
Pass the mempool config to the mempool (#2861)
* Split mempool config into its own module

Also:
- expand config docs
- clean up mempool imports

* Pass the mempool config to the mempool

* Create the transaction sender channel inside the mempool 1/2

This simplifies all the code that calls the mempool.

Also:
- update the mempool enabled state before returning the new mempool
- add some test module doc comments

* Refactor a setup function out of the mempool unit tests 2/2

Also:
- update the setup function to handle the latest mempool changes

* Clarify a comment

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
2021-10-12 17:31:54 +00:00
Alfredo Garcia 724967d488
Send `AdvertiseTransactionIds` to peers (#2823)
* bradcast transactions to peers after they get inserted into mempool

* remove network argument from mempool init

* remove dbg left

* remove return value in mempool enable call

* rename channel sender and receiver vars

* change unwrap() to expect()

* change the channel to a hashset

* fix build

* fix tests

* rustfmt

* fix tiny space issue inside macro

Co-authored-by: teor <teor@riseup.net>

* check errors/panics in transaction gossip tests

* fix build of newly added tests

* Stop dropping the inbound service and mempool in a test

Keeping the mempool around avoids a transaction broadcast task error,
so we can test that there are no other errors in the task.

* Tweak variable names and add comments

* Avoid unexpected drops by returning a mempool guard in tests

* Use BoxError to simplify service types in tests

* Make all returned service types consistent in tests

We want to be able to change the setup without changing the tests.

Co-authored-by: teor <teor@riseup.net>
2021-10-08 08:59:46 -03:00
teor 04d2cfb3d0
Gossip recently verified block hashes to peers (#2729)
* Implement a task that gossips verified block hashes

* Log an info message for block broadcasts

* Simplify the gossip task

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>

* Re-use the old tip change if there is no new tip change

Also improve the comments.

* Add an assertion message

* Rename task join handles and futures in start method

* Add a dedicated BlockGossipError type

This type helps distinguish between syncer and state errors.

* Test that committed blocks are gossiped to peers

Also do a minor type cleanup on the existing test code,
replacing `Option<Vec<_>>` with `Vec<_>`.

* Formatting

* Remove excess newlines

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>

* Clear the initial gossiped blocks during test setup

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-10-07 07:46:37 -03:00
Janito Vaqueiro Ferreira Filho 5d9893cf31
Send crawled transaction IDs to downloader (#2801)
* Rename type parameter to be more explicit

Replace the single letter with a proper name.

* Remove imports for `Request` and `Response`

The type names will conflict with the ones for the mempool service.

* Attach `Mempool` service to the `Crawler`

Add a field to the `Crawler` type to store a way to access the `Mempool`
service.

* Forward crawled transactions to downloader

The crawled transactions are now sent to the transaction downloader and
verifier, to be included in the mempool.

* Derive `Eq` and `PartialEq` for `mempool::Request`

Make it simpler to use the `MockService::expect_request` method.

* Test if crawled transactions are downloaded

Create some dummy crawled transactions, and let the crawler discover
them. Then check if they are forwarded to the mempool to be downloaded
and verified.

* Don't send empty transaction ID list to downloader

Ignore response from peers that don't provide any crawled transactions.

* Log errors when forwarding crawled transaction IDs

Calling the Mempool service should not fail, so if an error happens it
should be visible. However, errors when downloading individual
transactions can happen from time to time, so there's no need for them
to be very visible.

* Document existing `mempool::Crawler` test

Provide some depth as to what the test expect from the crawler's
behavior.

* Refactor to create `setup_crawler` helper function

Make it easier to reuse the common test setup code.

* Simplify code to expect requests

Now that `zebra_network::Request` implement `Eq`, the call can be
simplified into `expect_request`.

* Refactor to create `respond_with_transaction_ids`

A helper function that checks for a network crawl request and responds
with the given list of crawled transaction IDs.

* Refactor to create `crawler_iterator` helper

A function to intercept and respond to the fanned-out requests sent
during a single crawl iteration.

* Refactor to create `respond_to_queue_request`

Reduce the repeated code necessary to intercept and reply to a request
for queuing transactions to be downloaded.

* Add `respond_to_queue_request_with_error` helper

Intercepts a mempool request to queue transactions to be downloaded, and
responds with an error, simulating an internal problem in the mempool
service implementation.

* Derive `Arbitrary` for `NetworkUpgrade`

This is required for deriving `Arbitrary` for some error types.

* Derive `Arbitrary` for `TransactionError`

Allow random transaction errors to be generated for property tests.

* Derive `Arbitrary` for `MempoolError`

Allow random Mempool errors to be generated for property tests.

* Test if errors don't stop the mempool crawler

The crawler should be robust enough to continue operating even if the
mempool service fails to download transactions or even fails to handle
requests to enqueue transactions.

* Reduce the log level for download errors

They should happen regularly, so there's no need to have them with a
high visibility level.

Co-authored-by: teor <teor@riseup.net>

* Stop crawler if service stops

If `Mempool::poll_ready` returns an error, it's because the mempool
service has stopped and can't handle any requests, so the crawler should
stop as well.

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
2021-10-05 10:55:42 +10:00
Alfredo Garcia 37595c4b32
Mempool support for transaction expiration (#2774)
* mempool - support transaction expiration

* use `LatestChainTip` instead of state call

* clippy

* remove spawn task

* remove non needed async from function

* remove return value

* add a `expiry_height_mut()` method to `Transaction` for testing purposes

* fix `remove_expired_transactions()`

* add a `mempool_transaction_expiration()` test

* tidy cleanup to `expiry_height()`

* improve docs

* fix the build

* try fix macos build

* extend tests

* add doc to function

* clippy

* fix build

* start tests at block two
2021-09-29 16:52:44 +00:00
Marek 061ad55144
Sneak chain_tip_change into mempool (#2785)
* Pass ChainTipChange to the mempool

* Fix nits
2021-09-21 17:06:52 +00:00
Conrado Gouvea 957e12e4ca
Pass sync_status to mempool (#2754)
* Pass sync_status to mempool

* Update zebrad/src/components/mempool.rs

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>

* Remove enabled flag for now; will be handled in #2723

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2021-09-15 22:13:29 +00:00
Conrado Gouvea 8825a52bb8
Move transaction download and verify stream into the mempool service (#2741)
* Move transaction dowloader and verifier into the mempool service

* add test for `Storage::contains_rejected()`

* Rename DownloadAndVerify->Queue; move should_download_or_verify() to previous impl

* GossipedTx -> Gossip

* Revamp error handling

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2021-09-13 16:28:07 -04:00
Conrado Gouvea a2993e8df0
Skip download and verification if the transaction is already in the mempool or state (#2718)
* Check if tx already exists in mempool or state before downloading

* Reorder checks

* Add rejected test; refactor into separate function

* Wrap mempool in buffered service

* Rename RejectedTransactionsById -> RejectedTransactionsIds

* Add RejectedTransactionIds response; fix request name

* Organize imports

* add a test for Storage::rejected_transactions

* add test for mempool `Request::RejectedTransactionIds`

* change buffer size to 1 in the test

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2021-09-08 18:51:17 +00:00
Alfredo Garcia 9c220afdc8
Reply to `Request::MempoolTransactionIds` with mempool content (#2720)
* reply to `Request::MempoolTransactionIds`

* remove boilerplate

* get storage from mempool with a method

* change panic message

* try fix for mac

* use normal init instead of init_tests for state service

* newline

* rustfmt

* fix test build
2021-09-02 13:42:31 +00:00
Conrado Gouvea 1ccb2de7c7
Add transaction downloader and verifier (#2679)
* Add transaction downloader

* Changed mempool downloader to be like inbound

* Verifier working (logs result)

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Fix coinbase check for mempool, improve is_coinbase() docs

* Change other downloads.rs docs to reflect the mempool downloads.rs changes

* Change TIMEOUTs to downloads.rs; add docs

* Renamed is_coinbase() to has_valid_coinbase_transaction_inputs() and contains_coinbase_input() to has_any_coinbase_inputs(); reorder checks

* Validate network upgrade for V4 transactions; check before computing sighash (for V5 too)

* Add block_ prefix to downloads and verifier

* Update zebra-consensus/src/transaction.rs

Co-authored-by: teor <teor@riseup.net>

* Add consensus doc; add more Block prefixes

Co-authored-by: teor <teor@riseup.net>
2021-09-02 00:06:20 +00:00
teor b6fe816473
Add a `ChainTipChange` type to `await` chain tip changes (#2715)
* Rename ChainTipReceiver to CurrentChainTip

`fastmod ChainTipReceiver CurrentChainTip zebra*`

* Update chain tip documentation and variable names

* Basic chain tip change implementation, without resets

Also includes the following name changes:
```
fastmod CurrentChainTip LatestChainTip zebra*
fastmod chain_tip_receiver latest_chain_tip zebra*
```

* Clarify the difference between `LatestChainTip` and `ChainTipChange`
2021-09-01 22:31:16 +00:00
Janito Vaqueiro Ferreira Filho 8bff71e857
Only enable the mempool crawler after synchronization reaches the chain tip (#2667)
* Store a `SyncStatus` handle in the `Crawler`

The helper type will make it easier to determine if the crawler is
enabled or not.

* Pause crawler if mempool is disabled

Implement waiting until the mempool becomes enabled, so that the crawler
does not run while the mempool is disabled.

If the `MempoolStatus` helper is unable to determine if the mempool is
enabled, stop the crawler task entirely.

* Update test to consider when crawler is paused

Change the mempool crawler test so that it's a proptest that tests
different chain sync. lengths. This leads to different scenarios with
the crawler pausing and resuming.

Co-authored-by: teor <teor@riseup.net>
2021-08-31 10:42:25 +00:00
Janito Vaqueiro Ferreira Filho 83a2e30e33
Create a `SyncStatus` helper type (#2685)
* Create a `SyncStatus` helper type

Keeps track if the synchronizer is close to the chain tip or not.

* Refactor `ChainSync` ctor. to return `SyncStatus`

Change the constructor API so that it returns a higher level construct.

* Test if `SyncStatus` waits for the chain tip

Test if waiting for the chain tip to be reached correctly finishes when
the chain tip is reached. This is done by sending recent sync lengths to
the `SyncStatus` instance, and checking that every time a separate
`SyncStatus` instance determines it has reached the tip the original
instance wakes up.

* Add a temporary attribute to allow dead code

The code added isn't used yet, so we'll add a temporary waiver until
another PR is merged to use them.
2021-08-30 10:01:33 +10:00
teor d2e14b22f9
Refactor BestTipHeight into a generic ChainTip sender and receiver (#2676)
* Rename BestTipHeight so it can be generalised to ChainTipSender

`fastmod BestTipHeight ChainTipSender zebra*`

For senders:
`fastmod best_tip_height chain_tip_sender zebra*`

For receivers:
`fastmod best_tip_height chain_tip_receiver zebra*`

* Rename best_tip_height module to chain_tip

* Wrap the chain tip watch channel in a ChainTipReceiver type

* Create a ChainTip trait to avoid tricky crate dependencies

And add convenience impls for optional and empty chain tips.

* Use the ChainTip trait in zebra-network

* Replace `Option<ChainTip>` with `NoChainTip`

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2021-08-27 11:34:33 +10:00
teor ace7aec933
Return a transaction verifier from `zebra_consensus::init` (#2665)
* Return a transaction verifier from `zebra_consensus::init`

This verifier is temporarily created separately from the block verifier's
transaction verifier.

* Return the same transaction verifier used by the block verifier

* Clarify that the mempool verifier is the transaction verifier

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
2021-08-25 15:07:26 +00:00
Janito Vaqueiro Ferreira Filho 069f7716db
Create initial transaction crawler for the mempool (#2646)
* Create initial `mempool::Crawler` type

The mempool crawler is responsible for periodically asking peers for
transactions to insert into the local mempool. This initial
implementation will periodically ask for transactions, but won't do
anything with them yet.

Also, the crawler is currently configured to be always enabled, but this
should be fixed to avoid crawling while Zebra is still syncing the
chain.

* Add a timeout to peer responses

Prevent the crawler from getting stuck if there's communication with a
peer that takes too long to respond.

* Run the mempool crawler in Zebra

Spawn a task for the crawler when Zebra starts.

* Test if the crawler is sending requests

Create a mock for the `PeerSet` service to intercept requests and verify
that the transaction requests are sent periodically.

* Use `full` Tokio features when testing

Make it simpler to select the features for test builds.

Co-authored-by: teor <teor@riseup.net>

* Link to the issue for crawler activation

Make it easy to navigate from the `TODO` comment to the current project
planning.

Co-authored-by: teor <teor@riseup.net>

* Link to the issue for downloading transactions

Make it easy to navigate from the `TODO` comment to the current project
planning.

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2021-08-24 11:23:53 -03:00
teor 6f8f4d8987
Provide recent syncer response lengths as a watch channel (#2602)
* Minimal recent sync lengths implementation

Also includes metrics and logging, to make diagnosing bugs easier.

* Add logging to check what happens when Zebra reaches the chain tip

* Add tests for recent sync lengths

- initially empty
- pruned to correct length
- newest entries go first

* Drop a redundant `/` from a Cargo.lock URL

This seems to be a nightly or beta Rust change,
but hopefully stable just accepts it.

* Use metrics histograms to avoid overwriting values

* Add detailed syncer monitoring dashboard

* Increase the recent sync length to 4

This length makes it easier to distinguish between temporary and
sustained errors/syncs.

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2021-08-19 23:16:16 +00:00
Janito Vaqueiro Ferreira Filho 4c4dbfe7cd
Reject connections from outdated peers (#2519)
* Simplify state service initialization in test

Use the test helper function to remove redundant code.

* Create `BestTipHeight` helper type

This type abstracts away the calculation of the best tip height based on
the finalized block height and the best non-finalized chain's tip.

* Add `best_tip_height` field to `StateService`

The receiver endpoint is currently ignored.

* Return receiver endpoint from service constructor

Make it available so that the best tip height can be watched.

* Update finalized height after finalizing blocks

After blocks from the queue are finalized and committed to disk, update
the finalized block height.

* Update best non-finalized height after validation

Update the value of the best non-finalized chain tip block height after
a new block is committed to the non-finalized state.

* Update finalized height after loading from disk

When `FinalizedState` is first created, it loads the state from
persistent storage, and the finalized tip height is updated. Therefore,
the `best_tip_height` must be notified of the initial value.

* Update the finalized height on checkpoint commit

When a checkpointed block is commited, it bypasses the non-finalized
state, so there's an extra place where the finalized height has to be
updated.

* Add `best_tip_height` to `Handshake` service

It can be configured using the `Builder::with_best_tip_height`. It's
currently not used, but it will be used to determine if a connection to
a remote peer should be rejected or not based on that peer's protocol
version.

* Require best tip height to init. `zebra_network`

Without it the handshake service can't properly enforce the minimum
network protocol version from peers. Zebrad obtains the best tip height
endpoint from `zebra_state`, and the test vectors simply use a dummy
endpoint that's fixed at the genesis height.

* Pass `best_tip_height` to proto. ver. negotiation

The protocol version negotiation code will reject connections to peers
if they are using an old protocol version. An old version is determined
based on the current known best chain tip height.

* Handle an optional height in `Version`

Fallback to the genesis height in `None` is specified.

* Reject connections to peers on old proto. versions

Avoid connecting to peers that are on protocol versions that don't
recognize a network update.

* Document why peers on old versions are rejected

Describe why it's a security issue above the check.

* Test if `BestTipHeight` starts with `None`

Check if initially there is no best tip height.

* Test if best tip height is max. of latest values

After applying a list of random updates where each one either sets the
finalized height or the non-finalized height, check that the best tip
height is the maximum of the most recently set finalized height and the
most recently set non-finalized height.

* Add `queue_and_commit_finalized` method

A small refactor to make testing easier. The handling of requests for
committing non-finalized and finalized blocks is now more consistent.

* Add `assert_block_can_be_validated` helper

Refactor to move into a separate method some assertions that are done
before a block is validated. This is to allow moving these assertions
more easily to simplify testing.

* Remove redundant PoW block assertion

It's also checked in
`zebra_state::service::check::block_is_contextually_valid`, and it was
getting in the way of tests that received a gossiped block before
finalizing enough blocks.

* Create a test strategy for test vector chain

Splits a chain loaded from the test vectors in two parts, containing the
blocks to finalize and the blocks to keep in the non-finalized state.

* Test committing blocks update best tip height

Create a mock blockchain state, with a chain of finalized blocks and a
chain of non-finalized blocks. Commit all the blocks appropriately, and
verify that the best tip height is updated.

Co-authored-by: teor <teor@riseup.net>
2021-08-08 23:52:52 +00:00
teor 74e155ff9f
Spelling: gossipped -> gossiped (#2119) 2021-05-07 13:01:11 +02:00
teor 7e2c3a2fc7 Clarify a duplicate log message 2021-04-21 23:59:29 -04:00
teor 24f1b9bad1
Document the Inbound service in the start module (#1653) 2021-01-29 22:19:06 +10:00
Alfredo Garcia 486e55104a create Downloads for Inbound 2020-11-25 10:55:44 -08:00
Henry de Valence e9c847bbd7 zebrad: avoid a borrow in the ChainSync future 2020-11-17 14:56:27 -08:00
Henry de Valence 253bab042e sync: add a concurrency limit for block downloads 2020-10-26 12:05:35 -07:00
Henry de Valence 65e0c22fbe state: don't pre-buffer the service
There's no reason to return a pre-Buffer'd service (there's no need for
internal access to the state service, as in zebra-network), but wrapping
it internally removes control of the buffer size from the caller.
2020-10-26 12:05:35 -07:00
Henry de Valence cab96aa1a8
zebrad: clarify config help text (#1194) 2020-10-22 15:03:01 +10:00
Alfredo Garcia 21ad6ffc47
Reverse displayed endianness of transaction and block hashes (#1171)
* Reverse displayed endianness of transaction and block hashes
* fix zebra-checkpoints utility for new hash order
* Stop using "zebrad revhex" in zebrad-hash-lookup
* Rebuild checkpoint lists in new hash order
This change also adds additional checkpoints to the end of each list.

* Replace TransactionHash with transaction::Hash
This change should have been made in #905, but we missed Debug impls
and some docs.

Co-authored-by: Ramana Venkata <vramana@users.noreply.github.com>
Co-authored-by: teor <teor@riseup.net>
2020-10-22 07:54:02 +10:00
Henry de Valence 55f46967b2 zebrad: serve blocks from Inbound service
The original version of this commit ran into

https://github.com/rust-lang/rust/issues/64552

again.  Thanks to @yaahc for suggesting a workaround (using futures combinators
to avoid writing an async block).
2020-09-18 18:34:25 -07:00
Henry de Valence 170f588ffb network: document load-shedding behavior
This was part of the original design and is described in the Connection
internals, but we never documented it externally.
2020-09-18 18:34:25 -07:00
Henry de Valence 1d0ebf89c6 zebrad: move seed command into inbound component
Remove the seed command entirely, and make the behavior it provided
(responding to `Request::Peers`) part of the ordinary functioning of the
start command.

The new `Inbound` service should be expanded to handle all request
types.
2020-09-18 18:34:25 -07:00
Henry de Valence 1d3892e1dc network: rename alias to BoxError
This is shorter and consistent with Tower (which is why we use it in the
first place).
2020-09-18 18:34:25 -07:00
teor b1e1291f45 Log inbound peer requests at debug
Logging at info was a bit too verbose.

Also add a short log message.
2020-09-10 09:46:53 -07:00
Henry de Valence 9b6e66c1b9 zebrad: rename Syncer to ChainSync
This name clarifies what is being synced and avoids an agent-noun
construction.
2020-09-10 09:45:52 -07:00
Henry de Valence 0bc79686b8 zebrad: move sync into components module.
Part of #1030.
2020-09-10 09:45:52 -07:00
teor adafe1d189 Restart sync after the first failed ObtainTips
The ObtainTips retry was redundant. The timeout wasn't much shorter, but
it made the code and sync logic more complicated.
2020-09-09 15:35:09 -07:00
teor 2a68ef5acb Update the peerset buffer size and sync timeout
Also add a bunch of comments and documentation for network-constrained
nodes, and for testnet.
2020-09-08 12:44:33 -07:00
teor b062a682b0 Refactor "waiting for pending blocks" log 2020-09-08 12:44:33 -07:00
teor e6e859dce2 Tweak sync timeouts
* increase the EWMA default and decay
* increase the block download retries
* increase the request and block download timeouts
* increase the sync timeout
2020-09-08 12:44:33 -07:00
teor ce12d4dadc Add timeouts for tip responses and block verify tasks 2020-09-08 12:44:33 -07:00
teor 379ce5c1b8 Retry obtain and extend tips on failure 2020-09-08 12:44:33 -07:00
teor 48497d4857
Ignore sync errors when the block is already verified (#980)
* Ignore sync errors when the block is already verified

If we get an error for a block that is already in our state, we don't
need to restart the sync. It was probably a duplicate download.

Also:

Process any ready tasks before reset, so the logs and metrics are
up to date. (But ignore the errors, because we're about to reset.)

Improve sync logging and metrics during the download and verify task.

* Remove duplicate hashes in logs
Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* Log the sync hash span at warn level
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-09-04 08:13:00 +10:00
teor 437549d8e9
Always drop the final hash in peer responses (#991)
To workaround a zcashd bug that squashes responses together.
2020-09-04 08:09:34 +10:00
teor c770daa51f
If the first ExtendTips hash is bad, discard it and re-check (#992) 2020-09-04 08:08:19 +10:00
Alfredo Garcia 5485f4429a
Add config path to acceptance tests (#946)
* add and apply config mode to get_child

* remove option to read config from current directory

* remove argument from get_child
2020-09-03 13:13:23 -07:00
Jane Lusby ffdec0cb23
Remove in-memory state service (#974)
* Remove in-memory state service

* make the config compatible with toml again

* checkpoint commit to see how much I still have to revert

* back to the starting point...

* remove unused dependency

* reorganize error handling a bit

* need to make a new color-eyre release now

* reorder again because I have problems

* remove unnecessary helpers

* revert changes to config loading

* add back missing space

* Switch to released color-eyre version

* add back missing newline again...

* improve error message on unix when terminated by signal

* add context to last few asserts in acceptance tests

* instrument some of the helpers

* remove accidental extra space

* try to make this compile on windows

* reorg platform specific code

* hide on_disk module and fix broken link
2020-09-01 12:39:04 -07:00
teor 3fdfcb3179 fix: remove old tips that are behind new tips
This change makes sync less reliant on the exact order of ObtainTips and
ExtendTips responses.
2020-09-01 11:42:48 -04:00
teor 78201b456d feature: Implement checkpoint_sync for checkpoint verification
* add CheckpointList::new_up_to(limit: NetworkUpgrade)
* if checkpoint_sync is false, limit checkpoints to Sapling
* update tests for CheckpointList and chain::init
2020-08-24 15:34:46 +10:00
teor b8e8d4f548 fix: Remove some deeply-nested instrument spans
Closes #923.
2020-08-20 14:52:39 -04:00
Henry de Valence 103b663c40 chain: rename BlockHeight to block::Height 2020-08-17 11:46:34 -07:00
Henry de Valence 61dea90e2f chain: rename BlockHeaderHash to block::Hash
This is the first in a sequence of changes that change the block:: items
to not include Block as a prefix in their name, in accordance with the
Rust API guidelines.
2020-08-17 11:46:34 -07:00
Henry de Valence 948b067808 chain: move Network, NetworkUpgrade to parameters
Also, avoid using star-imports of the enum variants, which pollutes the
namespace.
2020-08-17 11:46:34 -07:00
Henry de Valence 0d1f56ad2f chain: remove utils module
A catch-all utils module can really easily slip into being a place to stash
miscellaneous functions that don't really belong anywhere in particular.
2020-08-17 11:46:34 -07:00
Henry de Valence a79ce97957
Fix sync algorithm. (#887)
* checkpoint: reject older of duplicate verification requests.

If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled.  Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.

* sync: add a timeout layer to block requests.

Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.

* sync: restart syncing on error

Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.

* sync: additional debug info

* sync: handle lookahead limit correctly.

Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped.  This meant that completed tasks could be left until the limit was
exceeded again.  Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.

* network: add debug instrumentation to retry policy

* sync: instrument the spawned task

* sync: streamline ObtainTips/ExtendTips logic & tracing

This change does three things:

1.  It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method.  This means that when debugging we only have one
deduplication algorithm to focus on.

2.  It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.

3.  It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip").  The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.

* sync: hack to work around zcashd behavior

* sync: localize debug statement in extend_tips

* sync: change algorithm to define tips as pairs of hashes.

This is different enough from the existing description that its comments no
longer apply, so I removed them.  A further chunk of work is to change the sync
RFC to document this algorithm.

* sync: reduce block timeout

* state: add resource limits for sled

Closes #888

* sync: add a restart timeout constant

* sync: de-pub constants
2020-08-12 16:48:01 -07:00
Henry de Valence 299afe13df
zebra-network tweaks. (#877)
* network: move gossiped peer selection logic into address book.

* network: return BoxService from init.

* zebrad: add note on why we truncate thegossiped peer list

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* Remove unused .rustfmt.toml

Many of these options are never actually loaded by our CI because of a channel
mismatch, where they're not applied on stable but only on nightly (see the logs
from a rustfmt job).  This means that we can get different settings when
running `cargo fmt` on the nightly and stable channels, which was causing a CI
failure on this PR.  Reverting back to the default rustfmt settings avoids this
problem and keeps us in line with upstream rustfmt.  There's no loss to us
since we were using the defaults anyways.

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-08-11 13:07:44 -07:00
teor 2550c44d48
Make sync ignore known hashes (#853)
* fix: Handle known ObtainTips correctly

enumerate never returns a value beyond the end of the vector.

* fix: Ignore known tips in ExtendTips

Some peers send us known tips when we try to extend.

* fix: Ignore known hashes when downloading

Despite all our other checks, we still end up downloading some hashes
multiple times.

* fix: Increase the number of retries

The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.

Instead, just retry the fetches that fail.

* fix: Tweak comments

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* fix: Cleanup the state_contains interface in Sync

* Fix brackets

Oops

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-08-10 16:17:50 -07:00
teor e95358dbe3 fix: Increase the number of retries
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.

Instead, just retry the fetches that fail.
2020-08-10 18:58:21 +10:00
teor faac50697c feature: Add a verified blocks metrics counter
We have a counter for pending "download and verify" futures. But these
futures are spawned, so they can complete in any order. They can also
complete before we receive their results.
2020-08-10 15:12:08 +10:00
teor 6aeefcee8b fix: Improve sync diagnostics 2020-08-10 15:12:08 +10:00
Henry de Valence a77328ad7c
Refactor tracing components (#834)
* Split tracing component code into modules.

* Repatriate Tracing and simplify config handling.

We upstreamed our Tracing component, expecting not to have to exert fine
control over the tracing settings.  But this turned out not to be the case, and
now that we want to do other things (flamegraphs, journalctl, opentelemetry,
etc), we end up with really awkward code (as in the current flamegraph
handling).

This also makes use of the changes to `init()` to load the config early to pass
configuration data into the components, which avoids the need for the
refactoring in #775.

Finally, we restore support for the `-v` flag when the filter is unset.  Closes #831.

* Disable tracing and metrics endpoints by default.

Closes #660.

* Switch back to upstream Abscissa.

* Integrate flamegraph support into the new Tracing component.

* Pass -v in acceptance tests to get info-level output.

* Clean up acceptance test code.
2020-08-06 10:29:31 -07:00
Jane Lusby 867dd0b475
Setup tracing-flame for use profiling zebrad (#436)
* Setup tracing-flame for use profiling zebrad

* start work on conditional flamegraph generation

* review time!

* update comments

* Update Cargo.toml

* disable default features for inferno

* reorganize

* missing one trait

* Apply suggestions from code review

* graceful shutdown!

* remove special case handling on ctrlc for cleanup

* rename signal fn to better represent its responsibility

* remove unused global hook for flushing flamegraph

* move tracing logic to the right file

* just copy linkerd's signal handling logic

* update book

* make zebrad app drop on shutdown normally

* Update zebrad/src/components/tokio.rs

Co-authored-by: teor <teor@riseup.net>

* Update zebrad/src/application.rs

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* cleanup a little

* ooh yea there's an API for that

* setup env-filter for backup subscriber

* document env filter

* document return codes

* forgot to save

* Update book/src/applications/zebrad.md

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2020-08-05 16:35:56 -07:00
Henry de Valence 82da4a5326 Remove connect command. 2020-08-04 23:34:45 -07:00
Alfredo Garcia f2d7bb3177
Command execution tests (#690)
* add zebrad acceptance tests
* add custom command test helpers that work with kill
* add and use info event for start and seed commands
* combine conflicting tests into one test case

Co-authored-by: Jane Lusby <jane@zfnd.org>
2020-08-01 16:15:26 +10:00
teor 11090dbf91 feature: Separate Mainnet and Testnet state 2020-07-29 01:45:19 -04:00
Alfredo Garcia 5b3c6e4c6c
Port bash checkpoint scripts to zebra-checkpoints single rust binary (#740)
* make zebra-checkpoints
* fix LOOKAHEAD_LIMIT scope
* add a default cli path
* change doc usage text
* add tracing
* move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus
* do byte_reverse_hex in a map
2020-07-25 17:53:00 +10:00
Henry de Valence b59cfc49b7 sync: create requests sequentially to respect backpressure.
This seems like a better design on principle but also appears to give a much
nicer sawtooth pattern of queued blocks in the checkpointer and a much smoother
pattern of block requests.
2020-07-24 18:36:00 -04:00
teor 2acfcf3a90
Make the CheckpointVerifier handle partial restarts (#736)
Also put generic bounds on the BlockVerifier struct,
so we get better compilation errors.
2020-07-24 11:47:48 +10:00
teor 77a1fefa1e
Download genesis (#731)
* feature: Add more CheckpointVerifier tracing

* fix: Download the genesis block
2020-07-23 10:56:52 -07:00
teor c95c825707 fix: Lookup the genesis hash based on the network 2020-07-23 03:46:24 -04:00
Henry de Valence 4a98b8fa0d Add basic metrics to the syncer. 2020-07-22 21:59:00 -07:00
Henry de Valence c2c2a28e8b Improve tracing output in chain verifier 2020-07-22 21:59:00 -07:00
Jane Lusby 7d4e717182
Add block locator request to state layer (#712)
* Add block locator request to state layer

* pass genesis in request

* Update zebrad/src/commands/start/sync.rs

* fix errors
2020-07-22 18:01:31 -07:00
Henry de Valence 49aa41544d sync: try to ignore spurious inv messages.
Closes #697.

per  https://github.com/ZcashFoundation/zebra/issues/697#issuecomment-662742971

The response to a getblocks message is an inv message with the hashes of the
following blocks. However, inv messages are also sent unsolicited to gossip new
blocks across the network. Normally, this wouldn't be a problem, because for
every other request we filter only for the messages that are relevant to us.
But because the response to a getblocks message is an inv, the network layer
doesn't (and can't) distinguish between the response inv and the unsolicited
inv.

But there is a mitigation we can do. In our sync algorithm we have two phases:
(1) "ObtainTips" to get a set of tips to chase down, (2) repeatedly call
"ExtendTips" to extend those as far as possible. The unsolicited inv messages
have length 1, but when extending tips we expect to get more than one hash. So
we could reject responses in ExtendTips that have length 1 in order to ignore
these messages. This way we automatically ignore gossip messages during initial
block sync (while we're extending a tip) but we don't ignore length-1 responses
while trying to obtain tips (while querying the network for new tips).
2020-07-22 17:55:52 -07:00
teor 9b97ebbd61 feature: Choose checkpoints based on the config 2020-07-23 10:26:25 +10:00
teor 3d721a96a5 feature: Add the state config to the config file 2020-07-23 10:26:25 +10:00
teor 89ac2793d6 feature: Use ChainVerifier in the sync service 2020-07-23 10:26:25 +10:00
Henry de Valence 928b0beb5d sync: unindent fetch task 2020-07-21 20:16:23 -07:00
Henry de Valence b722818e02 sync: remove redundant tracing specifier
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-07-21 20:16:23 -07:00
Henry de Valence 1047d2f690 sync: add backpressure to syncer
Closes #617.
Closes #698.

The remaining work on the syncer is alluded to in a new comment:

1. Correctly constructing a block locator object
2. Detecting when we've stopped making progress syncing and restarting obtain_tips.
2020-07-21 20:16:23 -07:00
Alfredo Garcia db2eb80b3e
Create consensus utils and move byte_reverse_hex function to it (#705)
* move byte_reverse_hex function
2020-07-22 12:29:14 +10:00
teor e5bb96715f fix: Reduce sync error logs to info or warn
Network issues are very common.
2020-07-21 10:13:03 -07:00
teor a0dbe85acd fix: Rewrite the config usage comment 2020-07-21 12:58:55 -04:00
teor 851afad01f
fix: Resist CheckpointVerifier memory DoS attacks (#635)
* fix: Resist CheckpointVerifier memory DoS attacks

Allow a maximum of 2 queued blocks at each height, as a tradeoff between
efficient bad block rejection, and memory usage.

Closes #628.

* fix: Make max queued blocks at height equal to fanout

* fix: Just allocate all the capacity upfront

* fix: Use with_capacity(1) and reserve_exact(1)
2020-07-15 13:27:10 -07:00
teor 78459afe97 fix: Stop revhex on EOF 2020-07-15 19:19:02 +10:00
teor 12b9fa8ae2
Let zebrad revhex read from stdin (#648)
* Log at warn level for commands that use stdout
* Let zebrad revhex read from stdin

Most unix tools support reading from stdin, so they can be used in
pipelines.

Part of #564.
2020-07-15 16:16:07 +10:00
teor 8b5ec155f0
Consensus refactor (#629)
* Flatten consensus::verify::* to consensus::*
* Move consensus::*::tests into their own files
* Move CheckpointList into its own file
* Move Progress and Target into a types module

QueuedBlock and QueuedBlockList can stay in checkpoint.rs, because
they are tightly coupled to CheckpointVerifier.
2020-07-10 16:51:01 +10:00
Henry de Valence ff4e722cd7 sync: touch up tracing output. 2020-07-09 11:15:06 -07:00
Dimitris Apostolou ba81d7d4c0 Fix typos 2020-07-07 11:13:49 -07:00