* Limit the size and age of the ZIP-401 rejected transaction ID list
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Fix bug in EvictionList; improve documentation
* Separate public and non-public parts of the documentation
* Fix tests
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Fix bug in EvictionList::len()
* Make EvictionList::len() mutable; prune the list inside it
* Limit the size of EvictedList::ordered_entries
* Increase eviction_list_time_mixed time constants to try to make it pass on MacOS
* Simplify logic by assuming refreshes will never happen
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Compiling fixes
* Remove MEMPOOL_SIZE and just rely on the ZIP-401 cost limit
Co-authored-by: teor <teor@riseup.net>
* ZIP-401 weighted random mempool eviction
* rename zcash.mempool.total_cost.bytes to zcash.mempool.cost.bytes
Co-authored-by: teor <teor@riseup.net>
* Remove duplicated lines
* Add cost() method to UnminedTx
Update serialization failure messages
* More docs quoting ZIP-401 rules
* Change mempool::Storage::new() to handle Copy-less HashMap, HashSet
* mempool: tidy cost types and evict_one()
* More consensus rule docs
* Refactor calculating mempool costs for Unmined transactions
* Add a note on asympotic performance of calculating weights of txs in mempool
* Bump test mempool / storage config to avoid weighted random cost limits
* Use mempool tx_cost_limit = u64::MAX for some tests
* Remove failing tests for now
* Allow(clippy::field-reassign-with-default) because of a move on a type that doesn't impl Copy
* Fix mistaken doctest formatting
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Increase test timeout for Windows builds
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Try simulating a chain growth
* Adjust the transaction expiry height
The mempool evicts expired transactions. When working with mocked data,
appending a new block typically clears the mempool because transactions become
expired. For this reason, the expiry height of each transactions is adjusted so
that it is greater than the new chain tip's height.
* Refactor the code so that it works with `VerifiedUnminedTx`
* Fix a typo
* Fix clippy warnings
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
There are a lot of these messages when Zebra starts up.
They might be slowing down CI and causing timeouts.
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
This matches the settings for `sync_large_checkpoints_mainnet`.
Also reduce the number of blocks synced to reduce network load.
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Increment the crates that have new commits since the last version
* Increment the crates that depend on crates that have changed
* Increment the version of `zebra-script`
* Use the `zebrad` version in the `zebra-network` user agent string
* Use the `v1.0.0-alpha.19` git tag in `README.md`
* Copy the draft changelog into `CHANGELOG.md`
* Delete bumps
* Update CHANGELOG.md
Co-authored-by: teor <teor@riseup.net>
* Add newly merged PRs
Co-authored-by: teor <teor@riseup.net>
* Create a `NowOrLater` helper type
A replacement for `FutureExt::now_or_never` that ensures that the task
is scheduled for waking up later when the inner future is ready.
* Use `NowOrLater` to fix possible delay bug
Previous usage of `now_or_never` meant that the underlying task wasn't
being scheduled to awake when the `Downloads` stream produced a new
item. Using `NowOrLater` instead fixes that issue.
* Increase the restart test timeout to 10 seconds
It shouldn't take this long.
But maybe the CI VMs are under a lot of load?
* Add extensive logging to diagnose CI state reload failures
* Ignore AlreadyInChain error in the syncer
* Split Cancelled errors; add them to should_restart_sync exceptions
* Also filter 'block is already comitted'; try to detect a wrong downcast
* Rename tx downloader & verifier metrics
* Add version to mempool metrics
* Add new metrics
* Make sure mempool gauges are zeroed when instances are dropped
* Updated mempool grafana dashboard
* Removed transaction verification dashboard; moved to mempool
* Update mempool dashboard
* Add reason to error labels in mempool dashboard
* Rename some metrics per review
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Guarantee unique IDs in mempool service responses
* Guarantee unique IDs in crawler task mempool Queue requests
Also update the tests to use unique IDs.
* Add a CheckForVerifiedTransactions mempool request
Also document the mempool request and response variants.
* Spawn a QueueChecker task to check for newly verified transactions
This task makes sure that transactions reliably propagate,
rather than relying on peer requests or responses to trigger propagation.
* Update the start command documentation
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Impl Drop, Default and take() for ActiveState
* Refactor Mempool::poll_ready to check disabled and reset first
Also remove some levels of nesting.
* Use the same code for dropping and resetting the mempool
* Document where the tasks are dropped when switching states
* Log mempool resets at info level
And add heights to mempool enable/disable/reset logs
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Guarantee unique IDs in mempool service responses
* Guarantee unique IDs in crawler task mempool Queue requests
Also update the tests to use unique IDs.
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Create a new VerifiedUnminedTx containing the miner fee
* Use VerifiedUnminedTx in mempool verification responses
And do a bunch of other cleanups.
* Use VerifiedUnminedTx in mempool download and verifier
* Use VerifiedUnminedTx in mempool storage and verified set
* Impl Display for VerifiedUnminedTx, and some convenience methods
* Use VerifiedUnminedTx in existing tests
* add some additional checks to the acceptance mempool test
* add an additional mempool test
* do proposed fixes to `sync_until`
* Ignore "can't kill an exited process" errors
Co-authored-by: teor <teor@riseup.net>
* Get the transaction fee from utxos
* Return the transaction fee from the verifier
* Avoid calculating the fee for coinbase transactions
Coinbase transactions don't have fees. In case of a coinbase transaction, the
verifier returns a zero fee.
* Update the result obtained by `Downloads`
* Update some comments
* Add a mempool debug_enable_at_height config
* Rename a field in the mempool crawler
* Propagate syncer channel errors through the crawler
We don't want to ignore these errors, because they might indicate a shutdown.
(Or a bug that we should fix.)
* Use debug_enable_at_height in the mempool crawler
* Log when the mempool is activated or deactivated
* Deny unknown fields and apply defaults for all configs
* Move Duration last, as required for TOML tables
* Add a basic mempool acceptance test
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* do not advertise rejected transactions
* do not broadcast transaction that are expired
* change dummy var name
* simplify code, performance
* clippy
* add some test coverage
* clippy
Co-authored-by: teor <teor@riseup.net>
* Split mempool config into its own module
Also:
- expand config docs
- clean up mempool imports
* Pass the mempool config to the mempool
* Create the transaction sender channel inside the mempool 1/2
This simplifies all the code that calls the mempool.
Also:
- update the mempool enabled state before returning the new mempool
- add some test module doc comments
* Refactor a setup function out of the mempool unit tests 2/2
Also:
- update the setup function to handle the latest mempool changes
* Clarify a comment
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Limit the size of rejection lists when there is a spend conflict
Previously, `insert` would return early with an error,
and skip limiting the rejection list sizes.
* Use prop_assert macros in proptests, rather than assert
* Add `HashSet`s to help spend conflict detection
Keep track of the spent transparent outpoints and the revealed
nullifiers.
Clippy complained that the `ActiveState` had variants with large size
differences, but that was expected, so I disabled that lint on that
`enum`.
* Clear the `HashSet`s when clearing the mempool
Clear them so that they remain consistent with the set of verified
transactions.
* Use `HashSet`s to check for spend conflicts
Store new outputs into its respective `HashSet`, and abort if a
duplicate output is found.
* Remove inserted outputs when aborting
Restore the `HashSet` to its previous state.
* Remove tracked outputs when removing a transaction
Keep the mempool storage in a consistent state when a transaction is
removed.
* Remove tracked outputs when evicting from mempool
Ensure eviction also keeps the tracked outputs consistent with the
verified transactions.
* Refactor to create a `VerifiedSet` helper type
Move the code to handle the output caches into the new type. Also move
the eviction code to make things a little simpler.
* Refactor to have a single `remove` method
Centralize the code that handles the removal of a transaction to avoid
mistakes.
* Move mempool size limiting back to `Storage`
Because the evicted transactions must be added to the rejected list.
* Remove leftover `dbg!` statement
Leftover from some temporary testing code.
Co-authored-by: teor <teor@riseup.net>
* Remove unnecessary `TODO`
It is more speculation than planning, so it doesn't add much value.
Co-authored-by: teor <teor@riseup.net>
* Fix typo in documentation
The verb should match the subject "transactions" which is plural.
Co-authored-by: teor <teor@riseup.net>
* Add a comment to warn about correctness
There's a subtle but important detail in the implementation that should
be made more visible to avoid mistakes in the future.
Co-authored-by: teor <teor@riseup.net>
* Remove outdated comment
Left-over from the attempt to move the eviction into the `VerifiedSet`.
* Improve comment explaining lint removal
Rewrite the comment explaining why the Clippy lint was ignored.
* Check for spend conflicts in `VerifiedSet`
Refactor to avoid API misuse.
* Test rejected transaction rollback
Using two transactions, perform the same test adding a conflict to both
of them to check if the second inserted transaction is properly
rejected. Then remove any conflicts from the second transaction and add
it again. That should work, because if it doesn't it means that when the
second transaction was rejected it left things it shouldn't in the
cache.
* Test removal of multiple transactions
When removing multiple transactions from the mempool storage, all of the
ones requested should be removed and any other transaction should be
still be there afterwards.
* Increase mempool size to 4, so that spend conflict tests work
If the mempool size is smaller than 4,
these tests don't fail on a trivial removal bug.
Because we need a minimum number of transactions in the mempool
to trigger the bug.
Also commit a proptest seed that fails on a trivial removal bug.
(This seed fails if we remove indexes in order,
because every index past the first removes the wrong transaction.)
* Summarise transaction data in proptest error output
* Summarise spend conflict field data in proptest error output
* Summarise multiple removal field data in proptest error output
And replace the very large proptest debug output with the new summary.
Co-authored-by: teor <teor@riseup.net>
* bradcast transactions to peers after they get inserted into mempool
* remove network argument from mempool init
* remove dbg left
* remove return value in mempool enable call
* rename channel sender and receiver vars
* change unwrap() to expect()
* change the channel to a hashset
* fix build
* fix tests
* rustfmt
* fix tiny space issue inside macro
Co-authored-by: teor <teor@riseup.net>
* check errors/panics in transaction gossip tests
* fix build of newly added tests
* Stop dropping the inbound service and mempool in a test
Keeping the mempool around avoids a transaction broadcast task error,
so we can test that there are no other errors in the task.
* Tweak variable names and add comments
* Avoid unexpected drops by returning a mempool guard in tests
* Use BoxError to simplify service types in tests
* Make all returned service types consistent in tests
We want to be able to change the setup without changing the tests.
Co-authored-by: teor <teor@riseup.net>
* Split mempool storage errors into tip-based and chain-based
* Expire tip rejections every time we get a new block
FailedVerification and SpendConflict rejections only apply to the current tip.
The next tip can provide missing inputs, or evict conflicting transactions.
* Enforce MAX_EVICTION_MEMORY_ENTRIES for mempool reject lists
* Implement a task that gossips verified block hashes
* Log an info message for block broadcasts
* Simplify the gossip task
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Re-use the old tip change if there is no new tip change
Also improve the comments.
* Add an assertion message
* Rename task join handles and futures in start method
* Add a dedicated BlockGossipError type
This type helps distinguish between syncer and state errors.
* Test that committed blocks are gossiped to peers
Also do a minor type cleanup on the existing test code,
replacing `Option<Vec<_>>` with `Vec<_>`.
* Formatting
* Remove excess newlines
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Clear the initial gossiped blocks during test setup
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Remove unused mempool storage errors
Preparation for ticket #2819.
Removing these errors means that we don't have to decide
which type of transaction ID match we want for them.
* Remove unused mempool errors, and deduplicate storage errors
* rustfmt
* Add a mempool transaction removal method for mined IDs
And use this method to remove expired transactions,
because all transactions with the same mined ID expire at the same height.
* Remove mined transaction IDs from the mempool
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Update versions for zebra v1.0.0-alpha.18 release
* WIP: Initial PR list
* Remove uninteresting version bumps from CHANGELOG
* Categorise and group PRs in CHANGELOG, removing uninteresting PRs
* Further refine and categorise changelog entries
* Fix tag url
* Final changes to CHANGELOG
* Add a changelog description
* Spacing
* Clarify and fix changelog PR descriptions
* Add PRs that are about to be merged
* More slight clarifications
* Spacing
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Rename type parameter to be more explicit
Replace the single letter with a proper name.
* Remove imports for `Request` and `Response`
The type names will conflict with the ones for the mempool service.
* Attach `Mempool` service to the `Crawler`
Add a field to the `Crawler` type to store a way to access the `Mempool`
service.
* Forward crawled transactions to downloader
The crawled transactions are now sent to the transaction downloader and
verifier, to be included in the mempool.
* Derive `Eq` and `PartialEq` for `mempool::Request`
Make it simpler to use the `MockService::expect_request` method.
* Test if crawled transactions are downloaded
Create some dummy crawled transactions, and let the crawler discover
them. Then check if they are forwarded to the mempool to be downloaded
and verified.
* Don't send empty transaction ID list to downloader
Ignore response from peers that don't provide any crawled transactions.
* Log errors when forwarding crawled transaction IDs
Calling the Mempool service should not fail, so if an error happens it
should be visible. However, errors when downloading individual
transactions can happen from time to time, so there's no need for them
to be very visible.
* Document existing `mempool::Crawler` test
Provide some depth as to what the test expect from the crawler's
behavior.
* Refactor to create `setup_crawler` helper function
Make it easier to reuse the common test setup code.
* Simplify code to expect requests
Now that `zebra_network::Request` implement `Eq`, the call can be
simplified into `expect_request`.
* Refactor to create `respond_with_transaction_ids`
A helper function that checks for a network crawl request and responds
with the given list of crawled transaction IDs.
* Refactor to create `crawler_iterator` helper
A function to intercept and respond to the fanned-out requests sent
during a single crawl iteration.
* Refactor to create `respond_to_queue_request`
Reduce the repeated code necessary to intercept and reply to a request
for queuing transactions to be downloaded.
* Add `respond_to_queue_request_with_error` helper
Intercepts a mempool request to queue transactions to be downloaded, and
responds with an error, simulating an internal problem in the mempool
service implementation.
* Derive `Arbitrary` for `NetworkUpgrade`
This is required for deriving `Arbitrary` for some error types.
* Derive `Arbitrary` for `TransactionError`
Allow random transaction errors to be generated for property tests.
* Derive `Arbitrary` for `MempoolError`
Allow random Mempool errors to be generated for property tests.
* Test if errors don't stop the mempool crawler
The crawler should be robust enough to continue operating even if the
mempool service fails to download transactions or even fails to handle
requests to enqueue transactions.
* Reduce the log level for download errors
They should happen regularly, so there's no need to have them with a
high visibility level.
Co-authored-by: teor <teor@riseup.net>
* Stop crawler if service stops
If `Mempool::poll_ready` returns an error, it's because the mempool
service has stopped and can't handle any requests, so the crawler should
stop as well.
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Move mempool tests into `tests::vector` sub-module
Make it consistent with other test modules and prepare for adding
property tests.
* Reorder imports
Make it consistent with the general guidelines followed on other
modules.
* Export `ChainTipBlock` and `ChainTipSender`
Allow these types to be used in other crates for testing purposes.
* Derive `Arbitrary` for `ChainTipBlock`
Make it easy to generate random `ChainTipBlock`s for usage in property
tests.
* Refactor to move test methods into `tests` module
Reduce the repeated test configuration attributes and make it easier to
see what is test specific and what is part of the general
implementation.
* Add a `Mempool::dummy_call` test helper method
Performs a dummy call just so that `poll_ready` gets called.
* Use `dummy_call` in existing tests
Replace the custom dummy requests with the helper method.
* Test if the mempool is cleared on chain reset
A chain reset should force the mempool storage to be cleared so that
transaction verification can restart using the new chain tip.
* Test if mempool is cleared on syncer restart
If the block synchronizer falls behind and then starts catching up
again, the mempool should be disabled and therefore the storage should
be cleared.
* Send mined transaction IDs to the download and verify task for cancellation
* Pass a HashSet of transaction hashes to be cancelled
* Add mempool_cancel_mined() test
* Fix starvation in test
* Fix typo in comment
* mempool - support transaction expiration
* use `LatestChainTip` instead of state call
* clippy
* remove spawn task
* remove non needed async from function
* remove return value
* add a `expiry_height_mut()` method to `Transaction` for testing purposes
* fix `remove_expired_transactions()`
* add a `mempool_transaction_expiration()` test
* tidy cleanup to `expiry_height()`
* improve docs
* fix the build
* try fix macos build
* extend tests
* add doc to function
* clippy
* fix build
* start tests at block two
* Cancel download and verify tasks when the mempool is deactivated
* Refactor enable/disable logic to use a state enum
* Add helper test functions to enable/disable the mempool
* Add documentation about errors on service calls
* Improvements from review
* Improve documentation
* Fix bug in test
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Add `Transaction::spent_outpoints` getter method
Returns an iterator over the UTXO `OutPoint`s spent by the transaction.
* Add `mempool::Error::Conflict` variant
An error representing that a transaction was rejected because it
conflicts with another transaction that's already in the mempool.
* Reject conflicting mempool transactions
Reject including a transaction in the mempool if it spends outputs
already spent by, or reveals nullifiers already revealed by another
transaction in the mempool.
* Fix typo in documentation
Remove the `r` that was incorrectly added.
Co-authored-by: teor <teor@riseup.net>
* Specify that the conflict is a spend conflict
Make the situation clearer, because there are other types of conflict.
Co-authored-by: teor <teor@riseup.net>
* Clarify that the outpoints are from inputs
Because otherwise it could lead to confusion because it could also mean
the outputs of the transaction represented as `OutPoint` references.
Co-authored-by: teor <teor@riseup.net>
* Create `storage::tests::vectors` module
Refactor to follow the convention used for other tests.
* Add an `AtLeastOne::first_mut` method
A getter to allow changing the first element.
* Add an `AtLeastOne::push` method
Allow appending elements to the collection.
* Derive `Arbitrary` for `FieldNotPresent`
This is just to make the code that generates arbitrary anchors a bit
simpler.
* Test if conflicting transactions are rejected
Generate two transactions (either V4 or V5) and insert a conflicting
spend, which can be either a transparent UTXO, or a nullifier for one of
the shielded pools. Check that any attempt to insert both transactions
causes one to be accepted and the other to be rejected.
* Delete a TODO comment that we decided not to do
Co-authored-by: teor <teor@riseup.net>
* Add tests for mempool Request::Queue
* Update test to work after refactoring
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* Send `Response::Nil` instead of sending empty `Message`s
This matches `zcashd`'s behaviour more closely.
In most cases, the network layer filters these out already.
But this change makes the the inbound service code clearer.
* revert changes made to `AdvertiseTransactionIds` and `PushTransaction`
* remove newline
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Update the expiry TODO
* Clear the mempool at a chain tip reset
* Clear the mempool by using a sync method (#2777)
* Clear the mempool by using a sync method
* Update docs
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Refactor last_tip_change()
* Apply suggestions from code review
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Fix brackets
* Use best_tip_block instead of manual borrowing
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Remove return of redundant vector length
An attempt to improve readability a bit by not returning a tuple with a
value that can be obtained from a single return type.
* Refactor `unmined_transactions_in_blocks`
Use a more functional style to try to make it a bit clearer.
* Use ranges in `unmined_transactions_in_blocks`
Allow a finer control over the block range to extract the transactions
from.
* Refactor mempool test code to improve clarity
It was previously not clear that only the first genesis transaction was
being used. The remaining transactions in the genesis block were
discarded and then readded later.
* Replace `oneshot` with `call`
Remove a redundant internal `ready_and` call.
* Return an `impl Iterator` instead of a `Vec<_>`
Remove unnecessary deserializations and heap allocations.
* Refactor `mempool_storage_basic_for_network` test
Make the separation between the transactions expected to be in the
mempool and those expected to be rejected clearer.
* Replace `Iterator` with `DoubleEndedIterator`
Allow getting the last transaction more easily.
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Use `MockService` in inbound test
Refactor the `mempool_requsets_for_transactions` test so that it uses a
`MockService` instead of the `mock_peer_set` function.
* Use `MockService` in the basic mempool test
Refactor the `mempool_service_basic` test so that it uses a
`MockService` instead of the `mock_peer_set` helper function.
* Remove the `mock_peer_set` helper function
It is not used anymore, since the usages were replaced with
`MockService`s.
* add tests for mempool inbound requests
* Use MockService for transaction verifier
* Refactor creation of mock `peer_set`
Use the same style as the mock transaction verifier.
* Derive `Eq` for `zebra_network::Request`
Make it easy to use the `MockService::expect_request` method.
* Return mocked peer set service from `setup`
Allow it to be used to respond to requests.
* Add bindings for the transaction used for testing
Allow them to be moved into futures later.
* Respond to transaction download request
Make sure that the test transaction appears to the mempool as if it had
been downloaded by the peer set service.
* Assert that no unexpected requests were received
Check that the mempool doesn't send unexpected requests to the peer set
service.
* add tests for mempool inbound requests
* Use MockService for transaction verifier
* add missing `expect_no_requests` to `mempool_advertise_transaction_ids` test
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Use `MockService` in inbound test
Refactor the `mempool_requsets_for_transactions` test so that it uses a
`MockService` instead of the `mock_peer_set` function.
* Use `MockService` in the basic mempool test
Refactor the `mempool_service_basic` test so that it uses a
`MockService` instead of the `mock_peer_set` helper function.
* Remove the `mock_peer_set` helper function
It is not used anymore, since the usages were replaced with
`MockService`s.
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Allow deliberate instances of the new nightly clippy::derivable_impls lint
We want our config defaults to be explicit.
Not so sure about the application defaults, but they also contain a config.
* Also allow unknown lint names
Stable doesn't know about this lint, but nightly does.
* Implement initial service mocking helpers
Adds a [`MockService`] type, which can be configured and built for usage
in unit tests or proptests. The mocked service can then be used to
intercept requests and respond indivdiually to them.
* Use `MockService in the `mempool::Crawler` test
Refactor it to remove the helper mock function, and use the new
`MockService` helper type.
* Use `MockService` in `CandidateSet` test vectors
Refactor to remove the manual mocking of the peer set service.
* Panic if a response is not sent by `MockService`
Change the current semantics to require all `MockService` usages to
respond to every intercepted request.
A `must_use` attribute was added to the `ResponseSender` so that the
compiler can warn when this doesn't happen.
* Allow generic error types in `MockService`
Replace the hard-coded `BoxError` as the `Service`'s error type with a
generic type parameter. This allows mocking services in locations that
require specific error types.
* Add a `ResponseSender::request` getter
Allow inspecting the request again before responding, and using
information from the request in the response.
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Pass sync_status to mempool
* Update zebrad/src/components/mempool.rs
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* Remove enabled flag for now; will be handled in #2723
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* Check if tx already exists in mempool or state before downloading
* Reorder checks
* Add rejected test; refactor into separate function
* Wrap mempool in buffered service
* Rename RejectedTransactionsById -> RejectedTransactionsIds
* Add RejectedTransactionIds response; fix request name
* Organize imports
* add a test for Storage::rejected_transactions
* add test for mempool `Request::RejectedTransactionIds`
* change buffer size to 1 in the test
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Using `&mut self` as the receiver in the method signatures allows Rust
to infer that the type is properly `Sync`, and therefore `Send`. This
allows removing the `Mutex` work-around.
* Decide if Zebra is at the chain tip
* Avoid division by zero
* Try increasing EVENT_TIMEOUT
* Increase MAX_TEST_EXECUTION
* Implement basic tests
* Resolve Clippy's erorrs
* change doc comments to normal
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* reply to `Request::MempoolTransactionIds`
* remove boilerplate
* get storage from mempool with a method
* change panic message
* try fix for mac
* use normal init instead of init_tests for state service
* newline
* rustfmt
* fix test build
* Rename ChainTipReceiver to CurrentChainTip
`fastmod ChainTipReceiver CurrentChainTip zebra*`
* Update chain tip documentation and variable names
* Basic chain tip change implementation, without resets
Also includes the following name changes:
```
fastmod CurrentChainTip LatestChainTip zebra*
fastmod chain_tip_receiver latest_chain_tip zebra*
```
* Clarify the difference between `LatestChainTip` and `ChainTipChange`
* Store a `SyncStatus` handle in the `Crawler`
The helper type will make it easier to determine if the crawler is
enabled or not.
* Pause crawler if mempool is disabled
Implement waiting until the mempool becomes enabled, so that the crawler
does not run while the mempool is disabled.
If the `MempoolStatus` helper is unable to determine if the mempool is
enabled, stop the crawler task entirely.
* Update test to consider when crawler is paused
Change the mempool crawler test so that it's a proptest that tests
different chain sync. lengths. This leads to different scenarios with
the crawler pausing and resuming.
Co-authored-by: teor <teor@riseup.net>
* Create a `SyncStatus` helper type
Keeps track if the synchronizer is close to the chain tip or not.
* Refactor `ChainSync` ctor. to return `SyncStatus`
Change the constructor API so that it returns a higher level construct.
* Test if `SyncStatus` waits for the chain tip
Test if waiting for the chain tip to be reached correctly finishes when
the chain tip is reached. This is done by sending recent sync lengths to
the `SyncStatus` instance, and checking that every time a separate
`SyncStatus` instance determines it has reached the tip the original
instance wakes up.
* Add a temporary attribute to allow dead code
The code added isn't used yet, so we'll add a temporary waiver until
another PR is merged to use them.
This avoids peer set contention when most peers are busy.
Also exit the task if the peer service returns a readiness error,
because that means it's permanently unusable.
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* First pass at a Mempool Service, incl. a storage layer underneath
* Fixed up Mempool service and storage
* allow dead code where needed
* clippy
* typo
* only drain if the mempool is full
* add a basic storage test
* remove space
* fix test for when MEMPOOL_SIZE change
* group some imports
* add a basic mempool service test
* add clippy suggestions
* remove not needed allow dead code
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: teor <teor@riseup.net>
* Rename BestTipHeight so it can be generalised to ChainTipSender
`fastmod BestTipHeight ChainTipSender zebra*`
For senders:
`fastmod best_tip_height chain_tip_sender zebra*`
For receivers:
`fastmod best_tip_height chain_tip_receiver zebra*`
* Rename best_tip_height module to chain_tip
* Wrap the chain tip watch channel in a ChainTipReceiver type
* Create a ChainTip trait to avoid tricky crate dependencies
And add convenience impls for optional and empty chain tips.
* Use the ChainTip trait in zebra-network
* Replace `Option<ChainTip>` with `NoChainTip`
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Return a transaction verifier from `zebra_consensus::init`
This verifier is temporarily created separately from the block verifier's
transaction verifier.
* Return the same transaction verifier used by the block verifier
* Clarify that the mempool verifier is the transaction verifier
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Create initial `mempool::Crawler` type
The mempool crawler is responsible for periodically asking peers for
transactions to insert into the local mempool. This initial
implementation will periodically ask for transactions, but won't do
anything with them yet.
Also, the crawler is currently configured to be always enabled, but this
should be fixed to avoid crawling while Zebra is still syncing the
chain.
* Add a timeout to peer responses
Prevent the crawler from getting stuck if there's communication with a
peer that takes too long to respond.
* Run the mempool crawler in Zebra
Spawn a task for the crawler when Zebra starts.
* Test if the crawler is sending requests
Create a mock for the `PeerSet` service to intercept requests and verify
that the transaction requests are sent periodically.
* Use `full` Tokio features when testing
Make it simpler to select the features for test builds.
Co-authored-by: teor <teor@riseup.net>
* Link to the issue for crawler activation
Make it easy to navigate from the `TODO` comment to the current project
planning.
Co-authored-by: teor <teor@riseup.net>
* Link to the issue for downloading transactions
Make it easy to navigate from the `TODO` comment to the current project
planning.
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Minimal recent sync lengths implementation
Also includes metrics and logging, to make diagnosing bugs easier.
* Add logging to check what happens when Zebra reaches the chain tip
* Add tests for recent sync lengths
- initially empty
- pruned to correct length
- newest entries go first
* Drop a redundant `/` from a Cargo.lock URL
This seems to be a nightly or beta Rust change,
but hopefully stable just accepts it.
* Use metrics histograms to avoid overwriting values
* Add detailed syncer monitoring dashboard
* Increase the recent sync length to 4
This length makes it easier to distinguish between temporary and
sustained errors/syncs.
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Rename internal network requests for wide transaction IDs
fastmod TransactionsByHash TransactionsById zebra*
fastmod AdvertiseTransactions AdvertiseTransactionIds zebra*
fastmod MempoolTransactions MempoolTransactionIds zebra*
fastmod TransactionHashes TransactionIds zebra*
* Update network transaction request/response comments
* Rename a transaction hash method for wide transaction IDs
fastmod transaction_hashes transaction_ids zebra-network
* Add UnminedTxId methods and conversions for InventoryHash
* Map WtxIds to unmined transaction network messages
Also, use UnminedTxId and UnminedTx in:
* Zebra's internal request and response format, and
* external Zcash network protocol messages.
* Enable WtxId mempool inventory tracking for peers
* Further clarify transaction IDs
* Use Witnessed rather than Wide for transaction IDs
And rename narrow to legacy when it only applies to v1-v4 transactions.
Otherwise, rename it to mined ID.
* Rename a missed binding
* Remove an incorrectly named binding
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Simplify state service initialization in test
Use the test helper function to remove redundant code.
* Create `BestTipHeight` helper type
This type abstracts away the calculation of the best tip height based on
the finalized block height and the best non-finalized chain's tip.
* Add `best_tip_height` field to `StateService`
The receiver endpoint is currently ignored.
* Return receiver endpoint from service constructor
Make it available so that the best tip height can be watched.
* Update finalized height after finalizing blocks
After blocks from the queue are finalized and committed to disk, update
the finalized block height.
* Update best non-finalized height after validation
Update the value of the best non-finalized chain tip block height after
a new block is committed to the non-finalized state.
* Update finalized height after loading from disk
When `FinalizedState` is first created, it loads the state from
persistent storage, and the finalized tip height is updated. Therefore,
the `best_tip_height` must be notified of the initial value.
* Update the finalized height on checkpoint commit
When a checkpointed block is commited, it bypasses the non-finalized
state, so there's an extra place where the finalized height has to be
updated.
* Add `best_tip_height` to `Handshake` service
It can be configured using the `Builder::with_best_tip_height`. It's
currently not used, but it will be used to determine if a connection to
a remote peer should be rejected or not based on that peer's protocol
version.
* Require best tip height to init. `zebra_network`
Without it the handshake service can't properly enforce the minimum
network protocol version from peers. Zebrad obtains the best tip height
endpoint from `zebra_state`, and the test vectors simply use a dummy
endpoint that's fixed at the genesis height.
* Pass `best_tip_height` to proto. ver. negotiation
The protocol version negotiation code will reject connections to peers
if they are using an old protocol version. An old version is determined
based on the current known best chain tip height.
* Handle an optional height in `Version`
Fallback to the genesis height in `None` is specified.
* Reject connections to peers on old proto. versions
Avoid connecting to peers that are on protocol versions that don't
recognize a network update.
* Document why peers on old versions are rejected
Describe why it's a security issue above the check.
* Test if `BestTipHeight` starts with `None`
Check if initially there is no best tip height.
* Test if best tip height is max. of latest values
After applying a list of random updates where each one either sets the
finalized height or the non-finalized height, check that the best tip
height is the maximum of the most recently set finalized height and the
most recently set non-finalized height.
* Add `queue_and_commit_finalized` method
A small refactor to make testing easier. The handling of requests for
committing non-finalized and finalized blocks is now more consistent.
* Add `assert_block_can_be_validated` helper
Refactor to move into a separate method some assertions that are done
before a block is validated. This is to allow moving these assertions
more easily to simplify testing.
* Remove redundant PoW block assertion
It's also checked in
`zebra_state::service::check::block_is_contextually_valid`, and it was
getting in the way of tests that received a gossiped block before
finalizing enough blocks.
* Create a test strategy for test vector chain
Splits a chain loaded from the test vectors in two parts, containing the
blocks to finalize and the blocks to keep in the non-finalized state.
* Test committing blocks update best tip height
Create a mock blockchain state, with a chain of finalized blocks and a
chain of non-finalized blocks. Commit all the blocks appropriately, and
verify that the best tip height is updated.
Co-authored-by: teor <teor@riseup.net>
* Use the block verifier and non-finalized state in the cached state tests
This substantially increases test coverage.
Previously, the cached state tests were configured with
`checkpoint_sync = true`, which only uses the checkpoint
verifier and the finalized state.
* Log the source of blocks in commit_finalized_direct
This lets us check that we're actually testing the non-finalized state
and block verifier in the cached state tests.
It also improves diagnostics for state errors.
* Fail cached state tests if they're using incorrect heights or configs
This makes sure that the cached state tests actually test the transition
from checkpoint to block verification, and the non-finalized state.
* Update versions for zebra v1.0.0-alpha.12 release
* Update Cargo.lock
* Update release checklist with latest version changes to help keep track for future releases
* Remove reference to the fact that tower-fallback was not updated
* add legacy chain check and tests
* improve has_network_upgrade check
* add docs to legacy_chain_check()
* change arbitrary module structure
* change the panic message
* move legacy chain acceptance into existing tests
* use a reduced_branch_id_strategy()
* add docs to strategy function
* add argument to check for legacy chain into sync_until()
* Disable IPv6 tests when $ZEBRA_SKIP_IPV6_TESTS is set
This allows users to disable IPv6 tests in environments where IPv6 is not
configured.
* Add network test env var constants
* Replace env strings with constants
fastmod '"ZEBRA_SKIP_NETWORK_TESTS"' zebra_test::net::ZEBRA_SKIP_NETWORK_TESTS
fastmod '"ZEBRA_SKIP_IPV6_TESTS"' zebra_test::net::ZEBRA_SKIP_IPV6_TESTS
* Add functions to skip network tests
* Replace test network env var checks with test function
fastmod --fixed-strings 'env::var_os(zebra_test::net::ZEBRA_SKIP_NETWORK_TESTS).is_some()' 'zebra_test::net::zebra_skip_network_tests()'
fastmod --fixed-strings 'env::var_os(zebra_test::net::ZEBRA_SKIP_IPV6_TESTS).is_some()' 'zebra_test::net::zebra_skip_ipv6_tests()'
* Remove redundant logging and use statements
* Gossip dynamically allocated listener ports to peers
Previously, Zebra would either gossip port `0`, which is invalid, or skip
gossiping its own dynamically allocated listener port.
* Improve "no configured peers" warning
And downgrade from error to warning, because inbound-only nodes are a
valid use case.
* Move random_known_port to zebra-test
* Add tests for dynamic local listener ports and the AddressBook
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Always send our local listener with the latest time
Previously, whenever there was an inbound request for peers, we would
clone the address book and update it with the local listener.
This had two impacts:
- the listener could conflict with an existing entry,
rather than unconditionally replacing it, and
- the listener was briefly included in the address book metrics.
As a side-effect, this change also makes sanitization slightly faster,
because it avoids some useless peer filtering and sorting.
* Skip listeners that are not valid for outbound connections
* Filter sanitized addresses Zebra based on address state
This fix correctly prevents Zebra gossiping client addresses to peers,
but still keeps the client in the address book to avoid reconnections.
* Add a full set of DateTime32 and Duration32 calculation methods
* Refactor sanitize to use the new DateTime32/Duration32 methods
* Security: Use canonical SocketAddrs to avoid duplicate connections
If we allow multiple variants for each peer address, we can make multiple
connections to that peer.
Also make sure sanitized MetaAddrs are valid for outbound connections.
* Test that address books contain the local listener address
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
- Add a custom semver match for `zebrad` versions
- Prefer "line contains string" matches, so tests ignore minor changes
- Escape regex meta-characters when a literal string match is intended
- Rename test functions so they are more precise
- Rewrite match internals to remove duplicate code and enable custom matches
- Document match functions
* implement and test a rate limit in `request_genesis()`
* add `request_genesis_is_rate_limited` test to sync
* add ensure_timeouts constraint for GENESIS_TIMEOUT_RETRY
* Suppress expected warning logs in zebrad tests
Co-authored-by: teor <teor@riseup.net>
* Standardise lints across Zebra crates, and add missing docs
The only remaining module with missing docs is `zebra_test::command`
* Todo -> TODO
* Clarify what a transcript ErrorChecker does
Also change `Error` -> `BoxError`
* TransError -> ExpectedTranscriptError
* Output Descriptions -> Output descriptions
When peers ask for peer addresses, add our local listener address to the
set of addresses, sanitize, then truncate. Sanitize shuffles addresses,
so if there are lots of addresses in the address book, our address will
only be sent to some peers.
* Use the git version + new commit count + hash for the app version
This helps diagnose bugs in versions of Zebra built from git branches,
rather than git version tags.
* Fill in assert
* Also log semver string
* Fix syntax
* Handle vergen using the cargo package version or raw git tag
* s/Semver/SemVer/
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Enable builds where:
* there is no google cloud git commit env var, and
* there is no `.git` directory.
By making all `vergen` env vars optional, and skipping any env vars that
don't exist.
* build(deps): bump vergen from 3.2.0 to 5.1.1
* fix hardcoded version for Tracing struct
* add additional metadata
* remove extra allocations for metadata
* Remove zebrad code version from release checklist
The zebrad code automatically uses the crate version now.
* Sort panic metadata into rough categories
Co-authored-by: teor <teor@riseup.net>
Zebra's latest alpha checkpoints on Canopy activation, continues our work on NU5, and fixes a security issue.
Some notable changes include:
## Added
- Log address book metrics when PeerSet or CandidateSet don't have many peers (#1906)
- Document test coverage workflow (#1919)
- Add a final job to CI, so we can easily require all the CI jobs to pass (#1927)
## Changed
- Zebra has moved its mandatory checkpoint from Sapling to Canopy (#1898, #1926)
- This is a breaking change for users that depend on the exact height of the mandatory checkpoint.
## Fixed
- tower-batch: wake waiting workers on close to avoid hangs (#1908)
- Assert that pre-Canopy blocks use checkpointing (#1909)
- Fix CI disk space usage by disabling incremental compilation in coverage builds (#1923)
## Security
- Stop relying on unchecked length fields when preallocating vectors (#1925)
`node2.is_running()` can return `true` on Windows, even though `node2`
has logged a panic. This cleanup code only runs if `node2` fails to panic
and exit as expected. So it's ok for us to skip it.
See #1781 for details.
On Windows, if a process is killed after it is dead, it returns `true`
for `was_killed`. Instead, check if the process is running before killing
it.
Also make the section where processes are running as short as possible,
and include context for both processes in every error.
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.
Also report the configured and actual ports where possible, for better
diagnostics.
* Increase the conflict acceptance test launch delay
Also rename the tests - the listener is for the Zcash protocol,
but the state, metrics, and tracing are Zebra-specific.
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
This change encodes a bunch of invariants in the type system,
and adds explicit failure states for:
* a closed oneshot,
* bugs in the initialization code.
Use `ServiceExt::oneshot` to perform state requests.
Explain that `ServiceExt::call_all` calls `poll_ready` internally.
Document a state service invariant imposed by `ServiceExt::call_all`.
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests
* Add a full set of stderr acceptance test functions
This change makes the stdout and stderr acceptance test interfaces
identical.
* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path
* Use Display for state path logs
Avoids weird escaping on Windows when using Debug
* Add Windows conflict error messages
* Turn PORT_IN_USE_ERROR into a regex
And add another alternative Windows-specific port error
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
* Bump versions where appropriate
Tested with cargo install --locked --path etc
* Remove fixed panics from 'Known Issues'
* Change to alpha release series in the README
Co-authored-by: teor <teor@riseup.net>
The clippy unknown lints attribute was deprecated in
nightly in rust-lang/rust#80524. The old lint name now produces a
warning.
Since we're using `allow(unknown_lints)` to suppress warnings, we need to
add the canonical name, so we can continue to build without warnings on
nightly.
But we also need to keep the old name, so we can continue to build
without warnings on stable.
And therefore, we also need to disable the "removed lints" warning,
otherwise we'll get warnings about the old name on nightly.
We'll need to keep this transitional clippy config until rustc 1.51 is
stable.
* Stop failing acceptance tests if their directories already exist
* Add an immutable config writing helper
and use it in the cached sapling acceptance tests.
Also:
* consistently create missing config and state directories
* refactor the common config writing code into a separate function
* only ignore NotFound errors in replace_config
* enforce config immutability using the type system
This timeout stops the sync service hanging when it is missing required
blocks, but the lookahead queue is full of dependent verify tasks, so the
missing blocks never get downloaded.
Check misconfigured ephemeral doesn't create a state dir
Add extra misconfigured `zebrad` ephemeral mode checks:
* doesn't create a state directory
* doesn't create unexpected files or directories in the working directory
Check ephemeral doesn't delete an existing state directory
Refactor all the ephemeral configs and checks into a single test
function.
Also:
* cleanup acceptance tests using utility functions
* make some checks consistent between tests
* make error messages consistent
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Add the configured network to error reports
* Log the configured network at error level
* Create the global span immediately after activating tracing
And leak the span guard, so the span is always active.
* Include panic metadata in the report and URL
* Use `Main` and `Test` in the global span
`net=Mainnet` is a bit redundant
When `cargo run` is run in the workspace directory, it can see two
executables:
- `zebrad`
- `zebra_checkpoints`
Adding `default-run = "zebrad"` to `zebrad/Cargo.toml` makes the
workspace run `zebrad` by default. (Even though it's redundant for the
`zebrad` crate itself.)
* Rewrite GetData handling to match the zcashd implementation
`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
e7b425298f/src/main.cpp (L5497)
This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.
Also change Zebra's GetData responses to peer request so they ignore
missing blocks.
* Stop hanging on incomplete transaction or block responses
Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
that were successfully received
2. if none of the expected blocks or transactions were received, return
an error, and close the connection
In our README, we tell users to ignore these errors, so we should also
disable the issue URL.
Also include the hash in the error. (We don't want the span active for
all messages, we just want the hash in the error.)
This change avoids errors when tests are cancelled and re-run within a
short period of time, for example, using `cargo watch`.
It introduces a slight risk of port conflicts between the endpoint tests,
and with (ephemeral) ports used by other services. The risk of conflicts
across 2 tests is very low, and tests should be run in an isolated
environment on busy servers.
vergen's implementation of REBUILD_ON_HEAD_CHANGE assumes that the .git
directory is in the crate root, but Zebra uses a workspace.
Temporary fix for rustyhorde/vergen#21.
Because the new version of the prometheus exporter launches its own
single-threaded runtime on a dedicated worker thread, there's no need
for the tokio and hyper versions it uses internally to align with the
versions used in other crates. So we don't need to use our fork with
tokio 0.3, and can just use the published alpha. Advancing to a later
alpha may fix the missing-metrics issues.
As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs.
To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs.
Co-authored-by: teor <teor@riseup.net>
* implement inbound `FindBlocks`
* Handle inbound peer FindHeaders requests
* handle request before having any chain tip
* Split `find_chain_hashes` into smaller functions
Add a `max_len` argument to support `FindHeaders` requests.
Rewrite the hash collection code to use heights, so we can handle the
`stop` hash and "no intersection" cases correctly.
* Split state height functions into "any chain" and "best chain"
* Rename the best chain block method to `best_block`
* Move fmt utilities to zebra_chain::fmt
* Summarise Debug for some Message variants
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
This provides useful and not too noisy output at INFO level. We do an
info-level message on every block commit instead of trying to do one
message every N blocks, because this is useful both for initial block
sync as well as continuous state updates on new blocks.
The metrics code becomes much simpler because the current version of the
metrics crate builds its own single-threaded runtime on a dedicated worker
thread, so no dependency on the main Zebra Tokio runtime is required.
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware. This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See
ddc64e8d4d
for more context on the Tower changes. To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
## Motivation
Prior to this PR we've been using `sled` as our database for storing persistent chain data on the disk between boots. We picked sled over rocksdb to minimize our c++ dependencies despite it being a less mature codebase. The theory was if it worked well enough we'd prefer to have a pure rust codebase, but if we ever ran into problems we knew we could easily swap it out with rocksdb.
Well, we ran into problems. Sled's memory usage was particularly high, and it seemed to be leaking memory. On top of all that, the performance for writes was pretty poor, causing us to become bottle-necked on sled instead of the network.
## Solution
This PR replaces `sled` with `rocksdb`. We've seen a 10x improvement in memory usage out of the box, no more leaking, and much better write performance. With this change writing chain data to disk is no longer a limiting factor in how quickly we can sync the chain.
The code in this pull request has:
- [x] Documentation Comments
- [x] Unit Tests and Property Tests
## Review
@hdevalence
This helps prevent overloading the network with too many concurrent
block requests. On a fast network, we're likely to still have enough
room to saturate our bandwidth. In the worst case, with 2MB blocks,
downloading 50 blocks concurrently is 100MB of queued downloads. If we
need to download this in 20 seconds to avoid peer connection timeouts,
the implied worst-case minimum speed is 5MB/s. In practice, this
minimum speed will likely be much lower.
This reverts commit 656bd24ba7.
The Hedge middleware keeps a pair of histograms, writing into one in the
current time interval and reading from the previous time interval's
data. This means that the reverted change resulted in doubling all
block downloads until after at least the second measurement interval
(which means that the time measurements are also incorrect, as they're
operating under double the network load...)
* Use the default memory limit in the acceptance tests
PR #1233 changed the default `memory_cache_bytes`, but left the
acceptance tests with their old value.
Sets the default value to the previous lookahead limit. My testing on
mainnet suggested that the newly lower value (changed when the
checkpoint frequency was decreased) is low enough to cause stalls, even
when using hedged requests.
Remove the minimum data points from the syncer hedge configuragtion.
When there are no data points, hedge sends the second request
immediately.
Where there are less than 1/(1-latency_percentile) data points (20),
hedge delays the second request by the highest recent download time.
This change should improve genesis and post-restart sync latency.
* Run large checkpoint sync tests in CI
* Improve test child output match error context
* Add a debug_stop_at_height config
* Use stop at height in acceptance tests
And add some restart acceptance tests, to make sure the stop at
height feature works correctly.
We should error if we notice that we're attempting to download the same
blocks multiple times, because that indicates that peers reported bad
information to us, or we got confused trying to interpret their
responses.
The original sync algorithm split the sync process into two phases, one
that obtained prospective chain tips, and another that attempted to
extend those chain tips as far as possible until encountering an error
(at which point the prospective state is discarded and the process
restarts).
Because a previous implementation of this algorithm didn't properly
enforce linkage between segments of the chain while extending tips,
sometimes it would get confused and fail to discard responses that did
not extend a tip. To mitigate this, a check against the state was
added. However, this check can cause stalls while checkpointing,
because when a checkpoint is reached we may suddenly need to commit
thousands of blocks to the state. Because the sync algorithm now has a
a `CheckedTip` structure that ensures that a new segment of hashes
actually extends an existing one, we don't need to check against the
state while extending a tip, because we don't get confused while
interpreting responses.
This change results in significantly smoother progress on mainnet.
There's no reason to return a pre-Buffer'd service (there's no need for
internal access to the state service, as in zebra-network), but wrapping
it internally removes control of the buffer size from the caller.
The timeout behavior in zebra-network is an implementation detail, not a
feature of the public API. So it shouldn't be mentioned in the doc
comments -- if we want timeout behavior, we have to layer it ourselves.
Using the cancel_handles, we can deduplicate requests. This is
important to do, because otherwise when we insert the second cancel
handle, we'd drop the first one, cancelling an existing task for no
reason.
The hedge middleware implements hedged requests, as described in _The
Tail At Scale_. The idea is that we auto-tune our retry logic according
to the actual network conditions, pre-emptively retrying requests that
exceed some latency percentile. This would hopefully solve the problem
where our timeouts are too long on mainnet and too slow on testnet.
Try to use the better cancellation logic to revert to previous sync
algorithm. As designed, the sync algorithm is supposed to proceed by
downloading state prospectively and handle errors by flushing the
pipeline and starting over. This hasn't worked well, because we didn't
previously cancel tasks properly. Now that we can, try to use something
in the spirit of the original sync algorithm.
This makes two changes relative to the existing download code:
1. It uses a oneshot to attempt to cancel the download task after it
has started;
2. It encapsulates the download creation and cancellation logic into a
Downloads struct.
* Reverse displayed endianness of transaction and block hashes
* fix zebra-checkpoints utility for new hash order
* Stop using "zebrad revhex" in zebrad-hash-lookup
* Rebuild checkpoint lists in new hash order
This change also adds additional checkpoints to the end of each list.
* Replace TransactionHash with transaction::Hash
This change should have been made in #905, but we missed Debug impls
and some docs.
Co-authored-by: Ramana Venkata <vramana@users.noreply.github.com>
Co-authored-by: teor <teor@riseup.net>
This reduces the API surface to the minimum required for functionality,
and cleans up module documentation. The stub mempool module is deleted
entirely, since it will need to be redone later anyways.
* implement most of the chain functions
* implement fork
* fix outpoint handling in Chain struct
* update expect for work
* split utxo into two sets
* update the Chain definition
* remove allow attribute in zebra-state/lib.rs
* merge ChainSet type into MemoryState
* Add error messages to asserts
* export proptest impls for use in downstream crates
* add testjob for disabled feature in zebra-chain
* try to fix github actions syntax
* add module doc comment
* update RFC for utxos
* add missing header
* working proptest for Chain
* propagate back results over channel
* Start updating RFC to match changes
* implement queued block pruning
* and now it syncs wooo!
* remove empty modules
* setup config for proptests
* re-enable missing_docs lint
* update RFC to match changes in impl
* add documentation
* use more explicit variable names
The Inbound service only needs the network setup for some requests, but
it can service other requests without it. Making it return
Poll::Pending until the network setup finishes means that initial
network connections may view the Inbound service as overloaded and
attempt to load-shed.
The original version of this commit ran into
https://github.com/rust-lang/rust/issues/64552
again. Thanks to @yaahc for suggesting a workaround (using futures combinators
to avoid writing an async block).
Remove the seed command entirely, and make the behavior it provided
(responding to `Request::Peers`) part of the ordinary functioning of the
start command.
The new `Inbound` service should be expanded to handle all request
types.
* add test for first checkpoint sync
Prior this this change we've not had any tests that verify our sync /
network logic is well behaved. This PR cleans up the test helper code to
make error reports more consistent and uses this cleaned up API to
implement a checkpoint sync test which runs zebrad until it reads the
first checkpoint event from stdout.
Co-authored-by: teor <teor@riseup.net>
* move include out of unix cfg
Co-authored-by: teor <teor@riseup.net>
* increase the EWMA default and decay
* increase the block download retries
* increase the request and block download timeouts
* increase the sync timeout
* add metrics and tracking endpoint tests
* test endpoints more
* add change filter test for tracing
* add await to post
* separate metrics and tracing tests
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Ignore sync errors when the block is already verified
If we get an error for a block that is already in our state, we don't
need to restart the sync. It was probably a duplicate download.
Also:
Process any ready tasks before reset, so the logs and metrics are
up to date. (But ignore the errors, because we're about to reset.)
Improve sync logging and metrics during the download and verify task.
* Remove duplicate hashes in logs
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Log the sync hash span at warn level
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Remove in-memory state service
* make the config compatible with toml again
* checkpoint commit to see how much I still have to revert
* back to the starting point...
* remove unused dependency
* reorganize error handling a bit
* need to make a new color-eyre release now
* reorder again because I have problems
* remove unnecessary helpers
* revert changes to config loading
* add back missing space
* Switch to released color-eyre version
* add back missing newline again...
* improve error message on unix when terminated by signal
* add context to last few asserts in acceptance tests
* instrument some of the helpers
* remove accidental extra space
* try to make this compile on windows
* reorg platform specific code
* hide on_disk module and fix broken link
* add CheckpointList::new_up_to(limit: NetworkUpgrade)
* if checkpoint_sync is false, limit checkpoints to Sapling
* update tests for CheckpointList and chain::init
This is the first in a sequence of changes that change the block:: items
to not include Block as a prefix in their name, in accordance with the
Rust API guidelines.
* add valid generated config test
* change to pathbuf
* use -c to make sure we are using the generated file
* add and use a ZebraTestDir type
* change approach to generate tempdir in top of each test
* pass tempdir to test_cmd and set current dir to it
* add and use a `generated_config_path` variable in tests
* checkpoint: reject older of duplicate verification requests.
If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled. Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.
* sync: add a timeout layer to block requests.
Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.
* sync: restart syncing on error
Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.
* sync: additional debug info
* sync: handle lookahead limit correctly.
Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped. This meant that completed tasks could be left until the limit was
exceeded again. Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.
* network: add debug instrumentation to retry policy
* sync: instrument the spawned task
* sync: streamline ObtainTips/ExtendTips logic & tracing
This change does three things:
1. It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method. This means that when debugging we only have one
deduplication algorithm to focus on.
2. It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.
3. It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip"). The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.
* sync: hack to work around zcashd behavior
* sync: localize debug statement in extend_tips
* sync: change algorithm to define tips as pairs of hashes.
This is different enough from the existing description that its comments no
longer apply, so I removed them. A further chunk of work is to change the sync
RFC to document this algorithm.
* sync: reduce block timeout
* state: add resource limits for sled
Closes#888
* sync: add a restart timeout constant
* sync: de-pub constants
* network: move gossiped peer selection logic into address book.
* network: return BoxService from init.
* zebrad: add note on why we truncate thegossiped peer list
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Remove unused .rustfmt.toml
Many of these options are never actually loaded by our CI because of a channel
mismatch, where they're not applied on stable but only on nightly (see the logs
from a rustfmt job). This means that we can get different settings when
running `cargo fmt` on the nightly and stable channels, which was causing a CI
failure on this PR. Reverting back to the default rustfmt settings avoids this
problem and keeps us in line with upstream rustfmt. There's no loss to us
since we were using the defaults anyways.
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* fix: Handle known ObtainTips correctly
enumerate never returns a value beyond the end of the vector.
* fix: Ignore known tips in ExtendTips
Some peers send us known tips when we try to extend.
* fix: Ignore known hashes when downloading
Despite all our other checks, we still end up downloading some hashes
multiple times.
* fix: Increase the number of retries
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.
Instead, just retry the fetches that fail.
* fix: Tweak comments
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* fix: Cleanup the state_contains interface in Sync
* Fix brackets
Oops
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* make sure no info is printed in non server tests
* check exact full output for validity instead of log msgs
* add end of output character to version regex
* use coercions, use equality operator
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.
Instead, just retry the fetches that fail.
We have a counter for pending "download and verify" futures. But these
futures are spawned, so they can complete in any order. They can also
complete before we receive their results.
* Split tracing component code into modules.
* Repatriate Tracing and simplify config handling.
We upstreamed our Tracing component, expecting not to have to exert fine
control over the tracing settings. But this turned out not to be the case, and
now that we want to do other things (flamegraphs, journalctl, opentelemetry,
etc), we end up with really awkward code (as in the current flamegraph
handling).
This also makes use of the changes to `init()` to load the config early to pass
configuration data into the components, which avoids the need for the
refactoring in #775.
Finally, we restore support for the `-v` flag when the filter is unset. Closes#831.
* Disable tracing and metrics endpoints by default.
Closes#660.
* Switch back to upstream Abscissa.
* Integrate flamegraph support into the new Tracing component.
* Pass -v in acceptance tests to get info-level output.
* Clean up acceptance test code.
* Setup tracing-flame for use profiling zebrad
* start work on conditional flamegraph generation
* review time!
* update comments
* Update Cargo.toml
* disable default features for inferno
* reorganize
* missing one trait
* Apply suggestions from code review
* graceful shutdown!
* remove special case handling on ctrlc for cleanup
* rename signal fn to better represent its responsibility
* remove unused global hook for flushing flamegraph
* move tracing logic to the right file
* just copy linkerd's signal handling logic
* update book
* make zebrad app drop on shutdown normally
* Update zebrad/src/components/tokio.rs
Co-authored-by: teor <teor@riseup.net>
* Update zebrad/src/application.rs
Co-authored-by: teor <teor@riseup.net>
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* cleanup a little
* ooh yea there's an API for that
* setup env-filter for backup subscriber
* document env filter
* document return codes
* forgot to save
* Update book/src/applications/zebrad.md
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Load tracing filter only from config and simplify logic.
* Configure the state storage in the config, not an environment variable.
This also changes the config so that the path is always set rather than being
optional, because Zebra always needs a place to store its config.
* add zebrad acceptance tests
* add custom command test helpers that work with kill
* add and use info event for start and seed commands
* combine conflicting tests into one test case
Co-authored-by: Jane Lusby <jane@zfnd.org>
We get the injected TokioComponent dependency before the config is
loaded, so we can't use it to open the endpoints.
And we can't define after_config, because we use derive(Component).
So we work around these issues by opening the endpoints manually,
from the application's after_config.
* make zebra-checkpoints
* fix LOOKAHEAD_LIMIT scope
* add a default cli path
* change doc usage text
* add tracing
* move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus
* do byte_reverse_hex in a map
This seems like a better design on principle but also appears to give a much
nicer sawtooth pattern of queued blocks in the checkpointer and a much smoother
pattern of block requests.
We had a brief discussion on discord and it seemed like we had consensus on the
following versioning policy:
* zebrad: match major version to NU version, so we will start by releasing
zebrad 3.0.0;
* zebra-* libraries: start by matching zebrad's version, then increment major
versions of each library as we need to make breaking changes (potentially
faster than the zebrad version, always respecting semver but making no
guarantees about the longevity of major releases).
This commit sets all of the crate versions to 3.0.0-alpha.0 -- the -alpha.0
marks it as a prerelease not subject to perfect adherence to compatibility
guarantees.
Closes#697.
per https://github.com/ZcashFoundation/zebra/issues/697#issuecomment-662742971
The response to a getblocks message is an inv message with the hashes of the
following blocks. However, inv messages are also sent unsolicited to gossip new
blocks across the network. Normally, this wouldn't be a problem, because for
every other request we filter only for the messages that are relevant to us.
But because the response to a getblocks message is an inv, the network layer
doesn't (and can't) distinguish between the response inv and the unsolicited
inv.
But there is a mitigation we can do. In our sync algorithm we have two phases:
(1) "ObtainTips" to get a set of tips to chase down, (2) repeatedly call
"ExtendTips" to extend those as far as possible. The unsolicited inv messages
have length 1, but when extending tips we expect to get more than one hash. So
we could reject responses in ExtendTips that have length 1 in order to ignore
these messages. This way we automatically ignore gossip messages during initial
block sync (while we're extending a tip) but we don't ignore length-1 responses
while trying to obtain tips (while querying the network for new tips).
Closes#617.
Closes#698.
The remaining work on the syncer is alluded to in a new comment:
1. Correctly constructing a block locator object
2. Detecting when we've stopped making progress syncing and restarting obtain_tips.
* fix: Resist CheckpointVerifier memory DoS attacks
Allow a maximum of 2 queued blocks at each height, as a tradeoff between
efficient bad block rejection, and memory usage.
Closes#628.
* fix: Make max queued blocks at height equal to fanout
* fix: Just allocate all the capacity upfront
* fix: Use with_capacity(1) and reserve_exact(1)
* Log at warn level for commands that use stdout
* Let zebrad revhex read from stdin
Most unix tools support reading from stdin, so they can be used in
pipelines.
Part of #564.
* Flatten consensus::verify::* to consensus::*
* Move consensus::*::tests into their own files
* Move CheckpointList into its own file
* Move Progress and Target into a types module
QueuedBlock and QueuedBlockList can stay in checkpoint.rs, because
they are tightly coupled to CheckpointVerifier.
* also download tips and filter tips
* dispatch all block downloads together
* tweek to match henry's changes
* switch to more intuitive match
Co-authored-by: Jane Lusby <jane@zfnd.org>
Each subsection has to have `serde(default)` to get the behaviour we want
(delete all fields except the ones that have been changed); otherwise, we can
delete only entire sections.
Prior to this change, the service returned by `zebra_network::init` would spawn background tasks that could silently fail, causing unexpected errors in the zebra_network service.
This change modifies the `PeerSet` that backs `zebra_network::init` to store all of the `JoinHandle`s for each background task it depends on. The `PeerSet` then checks this set of futures to see if any of them have exited with an error or a panic, and if they have it returns the error as part of `poll_ready`.
* rename zebra-storage to zebra-state
* Setup initial skeleton for zebra-state
* add test
* Apply suggestions from code review
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
* move shared test vectors to a common crate
Co-authored-by: Jane Lusby <jane@zfnd.org>
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
Co-authored-by: Jane Lusby <jane@zfnd.org>
Prior to this change, the seed subcommand would consistently encounter a panic in one of the background tasks, but would continue running after the panic. This is indicative of two bugs.
First, zebrad was not configured to treat panics as non recoverable and instead defaulted to the tokio defaults, which are to catch panics in tasks and return them via the join handle if available, or to print them if the join handle has been discarded. This is likely a poor fit for zebrad as an application, we do not need to maximize uptime or minimize the extent of an outage should one of our tasks / services start encountering panics. Ignoring a panic increases our risk of observing invalid state, causing all sorts of wild and bad bugs. To deal with this we've switched the default panic behavior from `unwind` to `abort`. This makes panics fail immediately and take down the entire application, regardless of where they occur, which is consistent with our treatment of misbehaving connections.
The second bug is the panic itself. This was triggered by a duplicate entry in the initial_peers set. To fix this we've switched the storage for the peers from a `Vec` to a `HashSet`, which has similar properties but guarantees uniqueness of its keys.
Bitcoin does this either with `getblocks` (returns up to 500 following block
hashes) or `getheaders` (returns up to 2000 following block headers, not
just hashes). However, Bitcoin headers are much smaller than Zcash
headers, which contain a giant Equihash solution block, and many Zcash
blocks don't have many transactions in them, so the block header is
often similarly sized to the block itself. Because we're
aiming to have a highly parallel network layer, it seems better to use
`getblocks` to implement `FindBlocks` (which is necessarily sequential)
and parallelize the processing of the block downloads.
Attempting to implement requests for block data revealed a problem with
the previous connection logic. Block data is requested by sending a
`getdata` message with hashes of the requested blocks; the peer responds
with a sequence of `block` messages with the blocks themselves.
However, this wasn't possible to handle with the previous connection
logic, which could only convert a single Bitcoin message into a
Response. Instead, we factor out the message handling logic into a
Handler, which can statefully accumulate arbitrary data into a Response
and signal completion. This is still pretty ugly but it does work.
As a side effect, the HeartbeatNonceMismatch error is removed; because
the Handler now tries to process messages until it comes to a Response,
it just ignores mismatched nonces (and will eventually time out).
The previous Mempool and Transaction requests were removed but could be
re-added in a different form later. Also, the `Get` prefixes are
removed from `Request` to tidy the name.
The components are accessed by a lock on application state. When some command
calls block_on to enter an async context, it obtained a write lock on the
entire application state. This meant that if the application state were
accessed later in an async context, a deadlock would occur. Instead the
TokioComponent holds an Option<Runtime> now, so that before calling block_on,
the caller can .take() the runtime and release the lock. Since we only ever
enter an async context once, it's not a problem that the component is then
missing its runtime, as once we are inside of a task we can access the runtime.
With a 'Transactions' response that gets turned into an 'Inv(Vec<InventoryHash::Tx>)' message.
We don't yet handle a response from our peer for a 'mempool', which will have to be
a more generic 'Inv' type because we might receive transaction hashes we don't know about yet.
Pertains to #26
Moved SeedService out of the command closure Command currently spawns
a tokio task to DOS the seed service with `Request::GetPeers` every
second.
Pertains to #54
This splits out the connection handling code into a try_connect closure, which
could be refactored into a Service of its own.
On creation, when we are likely to have very few peers, launch many concurrent
connections to the first few candidates in the initial candidate set, before
continuing to grow the peer set according to demand signals.
The previous implementation failed when timestamps were duplicated between
peers, because there was not a 1-1 relationship between timestamps and peers.
The disconnected_peers() function allows us to prevent duplicate
connections without maintaining shared state between the peerset and the
dial-additional-peers task.
Previously, the TimestampCollector was intended to own the address book
data, so it was intended to be cloneable and hold shared state among all
of its handles. This is now modeled more directly by an
`Arc<Mutex<AddressBook>>`, so the only functionality left in the
`TimestampCollector` is setting up the inital worker, which is better
called `spawn` than `new`.
This also fixes a problem introduced in the previous commit where the
`TimestampCollector` was dropped, causing the worker task to shut down
early.
This allows us to hide the `TimestampCollector` and to expose only the
address book data required by the inbound request service. It also lets
us have a common data structure (the `AddressBook`) for collecting peer
information that can be used to manage information that other peers
report to us.
This was commented out because making the PeerConnector take a TcpStream
meant that the PeerConnector futures couldn't be constructed in the same
way as before, but now that the PeerConnector is Buffer'able, we can
just clone a buffered copy.
* Don't expose submodules of zebra_network::peer.
* PeerSet, PeerDiscover stubs.
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* Initial work on PeerSet.
This is adapted from the MIT-licensed tower-balance implementation.
* Use PeerSet in the connect stub.
* Fix authorship, license information.
I *thought* I had done a sed pass over the Cargo defaults when doing
repository initialization, but I guess I missed it or something.
Anyways, fixed now.
Add a tower-based peer implementation.
Tower provides middleware for request-response oriented protocols, while Bitcoin/Zcash just send messages which could be interpreted either as requests or responses, depending on context. To bridge this mismatch we define our own internal request/response protocol, and implement a per-peer event loop that scans incoming messages and interprets them either as requests from the remote peer to our node, or as responses to requests we made previously. This is performed by the `PeerService` task, and a corresponding `PeerClient: tower::Service` can send it requests. These tasks are themselves created by a `PeerConnector: tower::Service` which dials a remote peer and performs a handshake.
This field is called `services` in Bitcoin and Zcash, but because we use
that word internally for other purposes, calling it `PeerServices`
disambiguates the meaning to "the services advertised by the peer",
rather than, e.g., a `tower::Service`.
Prior to this commit, the tracing endpoint would attempt to bind the
given address or panic; now, if it is unable to bind the given address
it displays an error but continues running the rest of the application.
This means that we can spin up multiple Zebra instances for load
testing.
This provides a significantly cleaner API to consumers, because it
allows using adaptors that convert a TCP stream to a stream of messages,
and potentially allows more efficient message handling.
This avoids some crate selection conflicts, but makes some futures
extension traits fall out of order? This seems to be an issue with
`pin-project` resolved in the git branch of `hyper` (but not yet
released).
An updated tracing-subscriber version changed one of the public types;
because we hardcode the type instead of being generic over S:
Subscriber, this was actually a breaking change. As noted in the
comment adjacent to this line, we would rather be generic over S, but
this requires fixing a bug in abscissa's proc-macros, so in the meantime
we hardcode the type.
* Add a TracingConfig and some components
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* Restructure, use dependency injection, initialize tracing
* Start a placeholder loop in start command
* Add hyper alpha.1, bump tokio to alpha.4
* Hello world endpoint using async/await from hyper 0.13 alpha
Also cleaned up some linter messages.
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
* Update to tracing_subscriber 0.1
* fmt
* add rust-toolchain
* Remove hyper::Version import
* wip: start filter_handler impl
* Add .rustfmt.toml
* rustfmt
* Tidy up .rustfmt.toml
* Add filter reloading handling.
* bump toolchain
* Remove generated hello world acceptance tests.
These test the behaviour of the autogenerated binary and work as examples of
how to test the behaviour of abscissa binaries. Since we don't print "Hello
World" any more, they fail, but we don't yet have replacement behaviour to add
tests for, so they're removed for now.
* Clean up config file handling with Option::and_then.