Conversation
Adds support for Batch DML statements. Normally, a Batch DML statement returns an array of update counts containing the exact update count of each statement. A Batch DML statement can be partly executed by Spanner and fail after N statements. In that case, the client returns a specific error that contains the update counts of the first N statements, and an error for the N+1th statement. It is also possible that a Batch DML statement is partly executed, and that the read/write transaction is aborted by Spanner after N statements. This is propagated as a 'normal' error to trigger a transaction retry.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5139 +/- ##
==========================================
+ Coverage 97.90% 97.92% +0.02%
==========================================
Files 214 215 +1
Lines 42809 43228 +419
==========================================
+ Hits 41914 42333 +419
Misses 895 895 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dbolduc
left a comment
There was a problem hiding this comment.
Mostly nits, but I want you to think more about the return type of execute_batch_update because I am not convinced by it (yet).
| let result = process_response(response); | ||
| let err = result.expect_err("should return error"); | ||
| let batch_err = BatchUpdateError::extract(&err); | ||
| assert!(batch_err.is_none()); |
There was a problem hiding this comment.
nit:
| assert!(batch_err.is_none()); | |
| assert!(batch_err.is_none(), "{batch_err:?}"); |
| assert!(batch_err.is_none()); | ||
| assert_eq!( | ||
| err.status().expect("status").code, | ||
| google_cloud_gax::error::rpc::Code::Aborted |
There was a problem hiding this comment.
nit:
| google_cloud_gax::error::rpc::Code::Aborted | |
| Code::Aborted |
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` | ||
| pub async fn execute_batch_update(&self, batch: BatchDml) -> crate::Result<Vec<i64>> { |
There was a problem hiding this comment.
It's a bit weird to me that there is a Vec<i64> normally, and in the error case.
If update_counts should always be returned, did you consider something like:
pub struct BatchUpdateResponse {
pub update_counts: Vec<i64>,
pub result: crate::Result<()>,
}It's a little clunky, but worth considering. Maybe you are afraid it does not force the application to look at the result field?
Q: So the application can learn something from the update_counts in the success case, and that is why we return it. Is that right?
If not, we could return a std::result::Result<(), BatchUpdateError>. Where the source of the BatchUpdateError might be a crate::Error, with an empty update_counts if there is some gRPC error.
There was a problem hiding this comment.
Q: So the application can learn something from the update_counts in the success case, and that is why we return it. Is that right?
The application can learn something from it in both the success and failure case:
- Success: Verify that the number of rows affected is what you expect. This is often used by query generators and other frameworks (e.g. ORMs) to verify that the update they generated actually updated the number of rows that they expected.
- Failure: In theory, you might want to verify the same as in the success case. But the primary use case is to understand which statement caused the failure.
The way that applications then can determine which statement failed is a bit clunky, but this is actually a relatively standard way of handling this type of errors across many programming languages. This is the standard in Java, .NET, and Python for all database drivers. And to a certain degree also Go, but as Go has multiple return arguments, it is there returned as both a result and an error, instead of putting the update counts in the error.
Maybe you are afraid it does not force the application to look at the result field?
Yes, this is the primary reason why I would prefer not to return a BatchUpdateResponse with a result or status field. I am absolutely sure that there will be applications that get that wrong and think that their batch succeeded when it didn't.
If not, we could return a std::result::Result<(), BatchUpdateError>. Where the source of the BatchUpdateError might be a crate::Error, with an empty update_counts if there is some gRPC error.
I actually considered that. The primary reason I did not choose that was to keep the signature of this method as equal as possible to the other execute_... methods that we have (in this case it would be std::result::Result<Vec<i64>, BatchUpdateError>, as we definitely want to return the update counts in case of success).
One (tiny) additional drawback of that approach is that it is harder for an application to distinguish between a gRPC error and an actual BatchUpdateError that failed for the first statement. Although we could consider making the list of update_counts optional and return None in that case.
There was a problem hiding this comment.
Yes, this is the primary reason why I would prefer not to return a BatchUpdateResponse with a result or status field. I am absolutely sure that there will be applications that get that wrong and think that their batch succeeded when it didn't.
Alright, makes sense to me. I suspected something like this.
| let tx = ReadWriteTransactionBuilder::new(db_client) | ||
| .begin_transaction() | ||
| .await | ||
| .expect("Failed to build transaction"); |
There was a problem hiding this comment.
style nit: we prefer to use ? and have our tests return anyhow::Result<()>
| .add_statement("UPDATE Users SET Name = 'Alice' WHERE Id = 1") | ||
| .add_statement("UPDATE Users SET Name = 'Bob' WHERE Id = 2"); | ||
|
|
||
| let counts = tx.execute_batch_update(batch.build()).await.unwrap(); |
| attempt += 1; | ||
| if attempt == 1 { |
There was a problem hiding this comment.
please use mockall::Sequence with separate mock.expect_foo() calls.
Here is an example:
There was a problem hiding this comment.
Done (missed this one yesterday)
| /// match transaction.execute_batch_update(batch).await { | ||
| /// Ok(update_counts) => { | ||
| /// println!("All statements succeeded. Update counts: {:?}", update_counts); | ||
| /// } | ||
| /// Err(e) => { | ||
| /// if let Some(batch_error) = BatchUpdateError::extract(&e) { | ||
| /// println!( | ||
| /// "Batch failed halfway. Successful update counts before failure: {:?}", | ||
| /// batch_error.update_counts | ||
| /// ); | ||
| /// } else { | ||
| /// println!("The entire batch failed, or an internal error occurred: {}", e); | ||
| /// } | ||
| /// } | ||
| /// } |
There was a problem hiding this comment.
I think this would look like:
let resp = transaction.execute_batch_update(batch).await;
match resp.result {
Ok(()) => println!("All statements succeeded. Update counts: {:?}", resp.update_counts),
Err(e) => println!("Batch failed. Successful update counts before failure: {:?}", resp.update_counts),
}(The case where update_counts.is_empty() does not seem like a special case to me 🤷)
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` | ||
| pub async fn execute_batch_update(&self, batch: BatchDml) -> crate::Result<Vec<i64>> { |
There was a problem hiding this comment.
Yes, this is the primary reason why I would prefer not to return a BatchUpdateResponse with a result or status field. I am absolutely sure that there will be applications that get that wrong and think that their batch succeeded when it didn't.
Alright, makes sense to me. I suspected something like this.
Adds support for Batch DML statements. Normally, a Batch DML statement returns an array of update counts containing the exact update count of each statement.
A Batch DML statement can be partly executed by Spanner and fail after N statements. In that case, the client returns a specific error that contains the update counts of the first N statements, and an error for the N+1th statement.
It is also possible that a Batch DML statement is partly executed, and that the read/write transaction is aborted by Spanner after N statements. This is propagated as a 'normal' error to trigger a transaction retry.