Cancellation Shields and Scoped Lifestyle Resource Management
Unlike unstructured concurrency, structured concurrency has a cancellation mechanism to stop the work of the task and its children. On cancellation, it synchronously stops the entire task tree down through a shared mechanism for signalling cancellation. This mechanism is cooperative in the sense that the actual cancellation effect and report must be triggered by a check.
Library and Application authors have the freedom to choose how their types react and report on cancellation. Depending on the type of work done, the author can decide where to put the cancellation check and whether to return or discard the partial result. The Foundation
standard library has automatic checks for cancellation in some of its async APIs, like for example in Task.sleep(nanoseconds:)
. We'll be using this in our examples below.
The approach to handling cancellation varies based on the type of work being done. In most cases, we simply want to stop the request or commands. However, in certain situations, you may need to perform clean-up on a resource regardless of whether the task completes successfully or is canceled due to an external or internal failure. But how would you handle this if a parent node cancels the entire task tree while a child node is still performing the necessary clean-up?
The solution is to protect the asynchronous task from being canceled. This is known as a "cancellation shield" or "async shield." In Swift, this involves wrapping the asynchronous function in a Task
and immediately requesting its value:
Task {
await work()
}.value
A shielded task can be grouped with other cancellable tasks inside a parent task. When the parent task is canceled, the cancellation will successfully stop the cancellable child tasks, but the shielded async task will continue to run uninterrupted.
I'd love to dive into an example right away, but bear with me a bit longer so I can present it more clearly. To set things up, let's first discuss scoped resource lifestyle management.
Scoped lifestyle management and with-style methods
When defining "lifestyles" for a resource, a scoped lifestyle refers to creating a single instance of a resource within a clearly defined scope. When the scope ends, so does the resource's lifetime. In Swift, this scoped lifecycle often appears in the form of “with-style” methods.
"with-style" functions provide access to one instance of a resource and do the "cleaning" of the resource when the underlying scope ends. There are many example of these methods in the standard library like the with(Throwing/Discarding)TaskGroup
family, and array.withUnsafeBytes
to name a few.
// Before destroying the TaskGroup, all tasks inside the group must have been awaited.
try await withTaskGroup(...) { group in
// Add task to the group
}
try await withThrowingTaskGroup(...) { group in
// Add task to the group
}
try await withDiscardingTaskGroup(...) { group in
// Add task to the group
}
[1,2,3].withUnsafeBytes { pointer in
// The pointer argument is valid only for the duration of the function's execution.
}
In the server-side ecosystem we find postgresClient.withConnection
from PostgresNIO
.
postgresClient.withConnection { connection in
// Leases a connection for the provided closure's lifetime
}
In all of these methods is wrong to store the resource (pointer, connection or TaskGroup
) for later use.
The with-style methods allow to hide the implementation details of how the clean-up of the resource is done. For instance, imagine a resource, which is not going to cross isolation domains and has a synchronous clean-up method we need to call. Then we can write a simple with-style method as
struct Resource {
func interact() {
// ...
}
func shutdown() {
// ...
}
}
func withResource(
_ body: (Resource) -> Void
) {
let resource = Resource()
defer { resource.shutdown() }
body(resource)
}
withResource { resource in
resource.interact()
}
Things become more interesting when the shutdown process takes a significant amount of time. In such cases, we can use an asynchronous clean-up method. At the moment of writing this, the defer
function only works with synchronous functions. Additionally, we want to ensure that the shutdown always completes successfully, which brings us back to the topic of cancellation shields. Let's explore how this would work in a simple example.
Example
Consider the following Scanner
, which can perform scanning and shutdown operations concurrently
final class Scanner {
func shutdown() async {
var iterator = Array(1...5).reversed().async.makeAsyncIterator()
while let i = await iterator.next() {
try? await Task.sleep(nanoseconds: 1_000_000_000)
print("Shutting down in \(i) seconds...")
}
}
func scan() async {
var iterator = (1...3).async.makeAsyncIterator()
while let _ = await iterator.next() {
try? await Task.sleep(nanoseconds: 1_000_000_000)
print("scanning...")
}
print("finished scanning")
}
}
The scanner scans during 3 seconds before finishing and it takes 5 seconds to shutdown. We introduce a with-style function withScanner
that let us interact with the Scanner
resource and which takes care of the shutdown. This helps us to always call the shutdown after using the scanner without having to write it explicitly each time. Furthermore, we allow this scoped function to be called on arbitrary actor isolated contexts.
func withScanner(
isolation: isolated (any Actor)? = #isolation,
_ execute: (inout sending Scanner) async -> Void
) async throws {
var scanner = Scanner()
await execute(&scanner)
// Cancellation shield
await Task {
await scanner.shutdown()
}.value
}
Notice how we've added a cancellation shield to the asynchronous shutdown to prevent resource leaks. With this in place, imagine starting a scan and then canceling the operation midway through
let task = Task {
try? await withScanner { scanner in
await scanner.scan()
}
}
try? await Task.sleep(nanoseconds: 1_000_000_000)
task.cancel()
print("task canceled")
try? await Task.sleep(nanoseconds: 6_000_000_000)
We get as expected
scanning...
task canceled
finished scanning
Shutting down in 5 seconds...
Shutting down in 4 seconds...
Shutting down in 3 seconds...
Shutting down in 2 seconds...
Shutting down in 1 seconds...
What if we cancel the parent work in the middle of the shutdown?
let task = Task {
try? await withScanner { scanner in
await scanner.scan()
}
}
try? await Task.sleep(nanoseconds: 5_000_000_000)
task.cancel()
print("task canceled")
// Need a barrier so that Task doesn't get out of scope before the test ends
try? await Task.sleep(nanoseconds: 5_000_000_000)
The shutdown will still complete fully until the end!
scanning...
scanning...
scanning...
finished scanning
Shutting down in 5 seconds...
task canceled
Shutting down in 4 seconds...
Shutting down in 3 seconds...
Shutting down in 2 seconds...
Shutting down in 1 seconds...
This can be helpful for services that use the async-http-client, when the shutdown must be triggered manually.
The following approach could be useful when dealing with interactions that may throw errors while accessing the resource.
func withThrowingScanner<ReturnType: Sendable>(
isolation: isolated (any Actor)? = #isolation,
_ execute: (inout sending ThrowingScanner) async throws -> ReturnType
) async throws -> ReturnType {
var scanner = ThrowingScanner()
var result: ReturnType
do {
result = try await execute(&scanner)
} catch {
// Cancellation shield
try await Task {
try await scanner.shutdown()
}.value
throw error
}
try await Task {
try await scanner.shutdown()
}.value
return result
}
For a further look at the isolation aspect of with-style
functions, I'd like to refer you to Franz Busch's talk Leveraging structured concurrency in your applications at Server Side Swift Conference 2024. This talk inspired the writing of this post and the exploration of cancellation shields.
Other Options
The standard library provides a similar function with a cancellation handler, withTaskCancellationHandler(operation:onCancel:isolation:), but its semantics is different. In this case, you're expected to pass an atomic flag (or any other safe concurrent-access value) to the operation
and check it during execution. This differs from the cooperation mechanism with checkCancellation
, as the onCancel
closure is always called immediately when the task is canceled. The purpose of the onCancel
closure is to reverse the flag and stop the operation from running. If cancellation occurs while the operation is in progress, the cancellation handler runs concurrently with the operation.
In this case, we need to modify the scan()
method and introduce a driving flag.
import Atomics
final class AtomicScanner {
func shutdown() async {
var iterator = Array(1...5).reversed().async.makeAsyncIterator()
while let i = await iterator.next() {
try? await Task.sleep(nanoseconds: 1_000_000_000)
print("Shutting down in \(i) seconds...")
}
}
func scan() async {
try? await Task.sleep(nanoseconds: 1_000_000_000)
print("scanning...")
}
}
func withScannerHandler(
isolation: isolated (any Actor)? = #isolation,
_ execute: (AtomicScanner) async -> Void
) async throws -> Void {
let condition = ManagedAtomic<Bool>(true)
await withTaskCancellationHandler {
let scanner = AtomicScanner()
while condition.load(ordering: .relaxed) {
await execute(scanner)
}
print("finished scanning")
await scanner.shutdown()
print("exit")
} onCancel: {
condition.store(false, ordering: .relaxed)
}
}
On cancellation the operation
async function runs with the inverted flag, but it does not spawn new asynchronous work. We can test that with
let task = Task {
try await withScannerHandler { scanner in
await scanner.scan()
}
}
try? await Task.sleep(nanoseconds: 2_000_000_000)
task.cancel()
print("task canceled")
try? await Task.sleep(nanoseconds: 6_000_000_000)
where we get
scanning...
task canceled
scanning...
finished scanning
exit
By adding the cancellation shield to scanner.shutdown
, we can observe the shutdown countdown ocurring as expected.
Conclusion
- The effect of cancellation on task is synchronous and cooperative. Cooperative means that actual cancellation effect and report has to be triggered by a check. We can opt out on reacting to cancellation and continue running the work.
- We've seen how cancellation shields are useful for protecting asynchronous tasks that need to complete even when canceled, such as resource cleanup or proper shutdown procedures.
with-style
methods are effective for managing the lifecycle of resources within a defined scope.