-
Notifications
You must be signed in to change notification settings - Fork 151
Fix uploading large amounts of data #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Following AWS S3's increased single-object size limit, allow the upload of data > 5 GB
This fixes the following error encountered when uploading large files: ERROR: ErrorMalformedXML Message: The XML you provided was not well-formed or did not validate against our published schema. This becomes a problem after a previous patch increases the size limit for a single object, causing more than one growbuffer to be allocated for holding the XML.
Increasing the chunk size avoids an issue where too many parts are uploaded, triggering an error when calling the S3 CompleteMultipartUpload API command. The current limit on the number of parts per upload is 10,000 according to the S3 documentation (https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html)
|
Rejecting due to no response to my question. |
|
@bji There is no question to respond to, unless it is visible only to your account. |
|
OK sorry, not sure why the question I asked is not visible. I will try to make it so. |
| static int growbuffer_append(growbuffer **gb, const char *data, int dataLen) | ||
| { | ||
| int toCopy = 0 ; | ||
| int origDataLen = dataLen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what this fixes. Unless the function comment is wrong, this function only needs to return nonzero if it completed the copy successfully, it doesn't need to return the number of bytes actually copied.
Before your change, it would always run each iteration of the while loop while dataLen was larger than 0, which meant that toCopy would always be assigned to a value greater than 0 on line 451, which means that when the loop terminated, it would always return the nonzero value that toCopy most recently held.
Your change doesn't materially change this fact, it just changes what the actual value is (instead of being the size of the most recent copy, it's the original size of the data to be copied). But it shouldn't matter which exact value was returned from this function according to the function comments.
So what is the point of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value for this function is used by s3.c::put_object() when completing a multipart upload. Perhaps the function comment should be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value that was being returned before was too small, causing the XML string to be truncated when it was sent to AWS. This fixes that issue.
|
I clicked the 'begin review' button which I guess I neglected to click before. Can you see my question now? |
|
Yes, thank you. I have replied to your question above. |
|
Yes, multipart upload was added by another contributor some time after that function was made and the comment is out of date. Thanks for your change. |
|
Oops wrong account. |
When uploading large amounts of data (> 5 GB), libs3 refuses the upload with an error. This pull request increases the single-object limit to 5 TB, which brings it in line with S3's object limit.
After increasing that limit, a new bug was discovered in growbuffer_append(). The XML produced by put_object() was being truncated if more than one growbuffer was allocated.
If there are any questions about this pull request, please let me know.