* Check if tx already exists in mempool or state before downloading
* Reorder checks
* Add rejected test; refactor into separate function
* Wrap mempool in buffered service
* Rename RejectedTransactionsById -> RejectedTransactionsIds
* Add RejectedTransactionIds response; fix request name
* Organize imports
* add a test for Storage::rejected_transactions
* add test for mempool `Request::RejectedTransactionIds`
* change buffer size to 1 in the test
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* reply to `Request::MempoolTransactionIds`
* remove boilerplate
* get storage from mempool with a method
* change panic message
* try fix for mac
* use normal init instead of init_tests for state service
* newline
* rustfmt
* fix test build
* Rename ChainTipReceiver to CurrentChainTip
`fastmod ChainTipReceiver CurrentChainTip zebra*`
* Update chain tip documentation and variable names
* Basic chain tip change implementation, without resets
Also includes the following name changes:
```
fastmod CurrentChainTip LatestChainTip zebra*
fastmod chain_tip_receiver latest_chain_tip zebra*
```
* Clarify the difference between `LatestChainTip` and `ChainTipChange`
* Store a `SyncStatus` handle in the `Crawler`
The helper type will make it easier to determine if the crawler is
enabled or not.
* Pause crawler if mempool is disabled
Implement waiting until the mempool becomes enabled, so that the crawler
does not run while the mempool is disabled.
If the `MempoolStatus` helper is unable to determine if the mempool is
enabled, stop the crawler task entirely.
* Update test to consider when crawler is paused
Change the mempool crawler test so that it's a proptest that tests
different chain sync. lengths. This leads to different scenarios with
the crawler pausing and resuming.
Co-authored-by: teor <teor@riseup.net>
* Create a `SyncStatus` helper type
Keeps track if the synchronizer is close to the chain tip or not.
* Refactor `ChainSync` ctor. to return `SyncStatus`
Change the constructor API so that it returns a higher level construct.
* Test if `SyncStatus` waits for the chain tip
Test if waiting for the chain tip to be reached correctly finishes when
the chain tip is reached. This is done by sending recent sync lengths to
the `SyncStatus` instance, and checking that every time a separate
`SyncStatus` instance determines it has reached the tip the original
instance wakes up.
* Add a temporary attribute to allow dead code
The code added isn't used yet, so we'll add a temporary waiver until
another PR is merged to use them.
* Rename BestTipHeight so it can be generalised to ChainTipSender
`fastmod BestTipHeight ChainTipSender zebra*`
For senders:
`fastmod best_tip_height chain_tip_sender zebra*`
For receivers:
`fastmod best_tip_height chain_tip_receiver zebra*`
* Rename best_tip_height module to chain_tip
* Wrap the chain tip watch channel in a ChainTipReceiver type
* Create a ChainTip trait to avoid tricky crate dependencies
And add convenience impls for optional and empty chain tips.
* Use the ChainTip trait in zebra-network
* Replace `Option<ChainTip>` with `NoChainTip`
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Return a transaction verifier from `zebra_consensus::init`
This verifier is temporarily created separately from the block verifier's
transaction verifier.
* Return the same transaction verifier used by the block verifier
* Clarify that the mempool verifier is the transaction verifier
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Create initial `mempool::Crawler` type
The mempool crawler is responsible for periodically asking peers for
transactions to insert into the local mempool. This initial
implementation will periodically ask for transactions, but won't do
anything with them yet.
Also, the crawler is currently configured to be always enabled, but this
should be fixed to avoid crawling while Zebra is still syncing the
chain.
* Add a timeout to peer responses
Prevent the crawler from getting stuck if there's communication with a
peer that takes too long to respond.
* Run the mempool crawler in Zebra
Spawn a task for the crawler when Zebra starts.
* Test if the crawler is sending requests
Create a mock for the `PeerSet` service to intercept requests and verify
that the transaction requests are sent periodically.
* Use `full` Tokio features when testing
Make it simpler to select the features for test builds.
Co-authored-by: teor <teor@riseup.net>
* Link to the issue for crawler activation
Make it easy to navigate from the `TODO` comment to the current project
planning.
Co-authored-by: teor <teor@riseup.net>
* Link to the issue for downloading transactions
Make it easy to navigate from the `TODO` comment to the current project
planning.
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Minimal recent sync lengths implementation
Also includes metrics and logging, to make diagnosing bugs easier.
* Add logging to check what happens when Zebra reaches the chain tip
* Add tests for recent sync lengths
- initially empty
- pruned to correct length
- newest entries go first
* Drop a redundant `/` from a Cargo.lock URL
This seems to be a nightly or beta Rust change,
but hopefully stable just accepts it.
* Use metrics histograms to avoid overwriting values
* Add detailed syncer monitoring dashboard
* Increase the recent sync length to 4
This length makes it easier to distinguish between temporary and
sustained errors/syncs.
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Simplify state service initialization in test
Use the test helper function to remove redundant code.
* Create `BestTipHeight` helper type
This type abstracts away the calculation of the best tip height based on
the finalized block height and the best non-finalized chain's tip.
* Add `best_tip_height` field to `StateService`
The receiver endpoint is currently ignored.
* Return receiver endpoint from service constructor
Make it available so that the best tip height can be watched.
* Update finalized height after finalizing blocks
After blocks from the queue are finalized and committed to disk, update
the finalized block height.
* Update best non-finalized height after validation
Update the value of the best non-finalized chain tip block height after
a new block is committed to the non-finalized state.
* Update finalized height after loading from disk
When `FinalizedState` is first created, it loads the state from
persistent storage, and the finalized tip height is updated. Therefore,
the `best_tip_height` must be notified of the initial value.
* Update the finalized height on checkpoint commit
When a checkpointed block is commited, it bypasses the non-finalized
state, so there's an extra place where the finalized height has to be
updated.
* Add `best_tip_height` to `Handshake` service
It can be configured using the `Builder::with_best_tip_height`. It's
currently not used, but it will be used to determine if a connection to
a remote peer should be rejected or not based on that peer's protocol
version.
* Require best tip height to init. `zebra_network`
Without it the handshake service can't properly enforce the minimum
network protocol version from peers. Zebrad obtains the best tip height
endpoint from `zebra_state`, and the test vectors simply use a dummy
endpoint that's fixed at the genesis height.
* Pass `best_tip_height` to proto. ver. negotiation
The protocol version negotiation code will reject connections to peers
if they are using an old protocol version. An old version is determined
based on the current known best chain tip height.
* Handle an optional height in `Version`
Fallback to the genesis height in `None` is specified.
* Reject connections to peers on old proto. versions
Avoid connecting to peers that are on protocol versions that don't
recognize a network update.
* Document why peers on old versions are rejected
Describe why it's a security issue above the check.
* Test if `BestTipHeight` starts with `None`
Check if initially there is no best tip height.
* Test if best tip height is max. of latest values
After applying a list of random updates where each one either sets the
finalized height or the non-finalized height, check that the best tip
height is the maximum of the most recently set finalized height and the
most recently set non-finalized height.
* Add `queue_and_commit_finalized` method
A small refactor to make testing easier. The handling of requests for
committing non-finalized and finalized blocks is now more consistent.
* Add `assert_block_can_be_validated` helper
Refactor to move into a separate method some assertions that are done
before a block is validated. This is to allow moving these assertions
more easily to simplify testing.
* Remove redundant PoW block assertion
It's also checked in
`zebra_state::service::check::block_is_contextually_valid`, and it was
getting in the way of tests that received a gossiped block before
finalizing enough blocks.
* Create a test strategy for test vector chain
Splits a chain loaded from the test vectors in two parts, containing the
blocks to finalize and the blocks to keep in the non-finalized state.
* Test committing blocks update best tip height
Create a mock blockchain state, with a chain of finalized blocks and a
chain of non-finalized blocks. Commit all the blocks appropriately, and
verify that the best tip height is updated.
Co-authored-by: teor <teor@riseup.net>
There's no reason to return a pre-Buffer'd service (there's no need for
internal access to the state service, as in zebra-network), but wrapping
it internally removes control of the buffer size from the caller.
* Reverse displayed endianness of transaction and block hashes
* fix zebra-checkpoints utility for new hash order
* Stop using "zebrad revhex" in zebrad-hash-lookup
* Rebuild checkpoint lists in new hash order
This change also adds additional checkpoints to the end of each list.
* Replace TransactionHash with transaction::Hash
This change should have been made in #905, but we missed Debug impls
and some docs.
Co-authored-by: Ramana Venkata <vramana@users.noreply.github.com>
Co-authored-by: teor <teor@riseup.net>
The original version of this commit ran into
https://github.com/rust-lang/rust/issues/64552
again. Thanks to @yaahc for suggesting a workaround (using futures combinators
to avoid writing an async block).
Remove the seed command entirely, and make the behavior it provided
(responding to `Request::Peers`) part of the ordinary functioning of the
start command.
The new `Inbound` service should be expanded to handle all request
types.
* increase the EWMA default and decay
* increase the block download retries
* increase the request and block download timeouts
* increase the sync timeout
* Ignore sync errors when the block is already verified
If we get an error for a block that is already in our state, we don't
need to restart the sync. It was probably a duplicate download.
Also:
Process any ready tasks before reset, so the logs and metrics are
up to date. (But ignore the errors, because we're about to reset.)
Improve sync logging and metrics during the download and verify task.
* Remove duplicate hashes in logs
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Log the sync hash span at warn level
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Remove in-memory state service
* make the config compatible with toml again
* checkpoint commit to see how much I still have to revert
* back to the starting point...
* remove unused dependency
* reorganize error handling a bit
* need to make a new color-eyre release now
* reorder again because I have problems
* remove unnecessary helpers
* revert changes to config loading
* add back missing space
* Switch to released color-eyre version
* add back missing newline again...
* improve error message on unix when terminated by signal
* add context to last few asserts in acceptance tests
* instrument some of the helpers
* remove accidental extra space
* try to make this compile on windows
* reorg platform specific code
* hide on_disk module and fix broken link
* add CheckpointList::new_up_to(limit: NetworkUpgrade)
* if checkpoint_sync is false, limit checkpoints to Sapling
* update tests for CheckpointList and chain::init
This is the first in a sequence of changes that change the block:: items
to not include Block as a prefix in their name, in accordance with the
Rust API guidelines.
* checkpoint: reject older of duplicate verification requests.
If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled. Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.
* sync: add a timeout layer to block requests.
Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.
* sync: restart syncing on error
Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.
* sync: additional debug info
* sync: handle lookahead limit correctly.
Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped. This meant that completed tasks could be left until the limit was
exceeded again. Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.
* network: add debug instrumentation to retry policy
* sync: instrument the spawned task
* sync: streamline ObtainTips/ExtendTips logic & tracing
This change does three things:
1. It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method. This means that when debugging we only have one
deduplication algorithm to focus on.
2. It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.
3. It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip"). The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.
* sync: hack to work around zcashd behavior
* sync: localize debug statement in extend_tips
* sync: change algorithm to define tips as pairs of hashes.
This is different enough from the existing description that its comments no
longer apply, so I removed them. A further chunk of work is to change the sync
RFC to document this algorithm.
* sync: reduce block timeout
* state: add resource limits for sled
Closes#888
* sync: add a restart timeout constant
* sync: de-pub constants
* network: move gossiped peer selection logic into address book.
* network: return BoxService from init.
* zebrad: add note on why we truncate thegossiped peer list
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Remove unused .rustfmt.toml
Many of these options are never actually loaded by our CI because of a channel
mismatch, where they're not applied on stable but only on nightly (see the logs
from a rustfmt job). This means that we can get different settings when
running `cargo fmt` on the nightly and stable channels, which was causing a CI
failure on this PR. Reverting back to the default rustfmt settings avoids this
problem and keeps us in line with upstream rustfmt. There's no loss to us
since we were using the defaults anyways.
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* fix: Handle known ObtainTips correctly
enumerate never returns a value beyond the end of the vector.
* fix: Ignore known tips in ExtendTips
Some peers send us known tips when we try to extend.
* fix: Ignore known hashes when downloading
Despite all our other checks, we still end up downloading some hashes
multiple times.
* fix: Increase the number of retries
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.
Instead, just retry the fetches that fail.
* fix: Tweak comments
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* fix: Cleanup the state_contains interface in Sync
* Fix brackets
Oops
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.
Instead, just retry the fetches that fail.
We have a counter for pending "download and verify" futures. But these
futures are spawned, so they can complete in any order. They can also
complete before we receive their results.
* Split tracing component code into modules.
* Repatriate Tracing and simplify config handling.
We upstreamed our Tracing component, expecting not to have to exert fine
control over the tracing settings. But this turned out not to be the case, and
now that we want to do other things (flamegraphs, journalctl, opentelemetry,
etc), we end up with really awkward code (as in the current flamegraph
handling).
This also makes use of the changes to `init()` to load the config early to pass
configuration data into the components, which avoids the need for the
refactoring in #775.
Finally, we restore support for the `-v` flag when the filter is unset. Closes#831.
* Disable tracing and metrics endpoints by default.
Closes#660.
* Switch back to upstream Abscissa.
* Integrate flamegraph support into the new Tracing component.
* Pass -v in acceptance tests to get info-level output.
* Clean up acceptance test code.
* Setup tracing-flame for use profiling zebrad
* start work on conditional flamegraph generation
* review time!
* update comments
* Update Cargo.toml
* disable default features for inferno
* reorganize
* missing one trait
* Apply suggestions from code review
* graceful shutdown!
* remove special case handling on ctrlc for cleanup
* rename signal fn to better represent its responsibility
* remove unused global hook for flushing flamegraph
* move tracing logic to the right file
* just copy linkerd's signal handling logic
* update book
* make zebrad app drop on shutdown normally
* Update zebrad/src/components/tokio.rs
Co-authored-by: teor <teor@riseup.net>
* Update zebrad/src/application.rs
Co-authored-by: teor <teor@riseup.net>
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* cleanup a little
* ooh yea there's an API for that
* setup env-filter for backup subscriber
* document env filter
* document return codes
* forgot to save
* Update book/src/applications/zebrad.md
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* add zebrad acceptance tests
* add custom command test helpers that work with kill
* add and use info event for start and seed commands
* combine conflicting tests into one test case
Co-authored-by: Jane Lusby <jane@zfnd.org>
* make zebra-checkpoints
* fix LOOKAHEAD_LIMIT scope
* add a default cli path
* change doc usage text
* add tracing
* move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus
* do byte_reverse_hex in a map
This seems like a better design on principle but also appears to give a much
nicer sawtooth pattern of queued blocks in the checkpointer and a much smoother
pattern of block requests.
Closes#697.
per https://github.com/ZcashFoundation/zebra/issues/697#issuecomment-662742971
The response to a getblocks message is an inv message with the hashes of the
following blocks. However, inv messages are also sent unsolicited to gossip new
blocks across the network. Normally, this wouldn't be a problem, because for
every other request we filter only for the messages that are relevant to us.
But because the response to a getblocks message is an inv, the network layer
doesn't (and can't) distinguish between the response inv and the unsolicited
inv.
But there is a mitigation we can do. In our sync algorithm we have two phases:
(1) "ObtainTips" to get a set of tips to chase down, (2) repeatedly call
"ExtendTips" to extend those as far as possible. The unsolicited inv messages
have length 1, but when extending tips we expect to get more than one hash. So
we could reject responses in ExtendTips that have length 1 in order to ignore
these messages. This way we automatically ignore gossip messages during initial
block sync (while we're extending a tip) but we don't ignore length-1 responses
while trying to obtain tips (while querying the network for new tips).
Closes#617.
Closes#698.
The remaining work on the syncer is alluded to in a new comment:
1. Correctly constructing a block locator object
2. Detecting when we've stopped making progress syncing and restarting obtain_tips.
* fix: Resist CheckpointVerifier memory DoS attacks
Allow a maximum of 2 queued blocks at each height, as a tradeoff between
efficient bad block rejection, and memory usage.
Closes#628.
* fix: Make max queued blocks at height equal to fanout
* fix: Just allocate all the capacity upfront
* fix: Use with_capacity(1) and reserve_exact(1)
* Log at warn level for commands that use stdout
* Let zebrad revhex read from stdin
Most unix tools support reading from stdin, so they can be used in
pipelines.
Part of #564.
* Flatten consensus::verify::* to consensus::*
* Move consensus::*::tests into their own files
* Move CheckpointList into its own file
* Move Progress and Target into a types module
QueuedBlock and QueuedBlockList can stay in checkpoint.rs, because
they are tightly coupled to CheckpointVerifier.
* also download tips and filter tips
* dispatch all block downloads together
* tweek to match henry's changes
* switch to more intuitive match
Co-authored-by: Jane Lusby <jane@zfnd.org>