Skip to content

Conversation

@meinemitternacht
Copy link
Contributor

When uploading large amounts of data (> 5 GB), libs3 refuses the upload with an error. This pull request increases the single-object limit to 5 TB, which brings it in line with S3's object limit.

After increasing that limit, a new bug was discovered in growbuffer_append(). The XML produced by put_object() was being truncated if more than one growbuffer was allocated.

If there are any questions about this pull request, please let me know.

Following AWS S3's increased single-object size limit, allow the upload of data > 5 GB
This fixes the following error encountered when uploading large files:

ERROR: ErrorMalformedXML
  Message: The XML you provided was not well-formed or did not
  validate against our published schema.

This becomes a problem after a previous patch increases the size limit for a single object, causing more than one growbuffer to be allocated for holding the XML.
Increasing the chunk size avoids an issue where too many parts are uploaded, triggering an error when calling the S3 CompleteMultipartUpload API command.  The current limit on the number of parts per upload is 10,000 according to the S3 documentation (https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html)
@bji
Copy link
Owner

bji commented Apr 7, 2019

Rejecting due to no response to my question.

@bji bji closed this Apr 7, 2019
@meinemitternacht
Copy link
Contributor Author

meinemitternacht commented Apr 7, 2019

@bji There is no question to respond to, unless it is visible only to your account.

@bji bji reopened this Apr 8, 2019
@bji
Copy link
Owner

bji commented Apr 8, 2019

OK sorry, not sure why the question I asked is not visible. I will try to make it so.

static int growbuffer_append(growbuffer **gb, const char *data, int dataLen)
{
int toCopy = 0 ;
int origDataLen = dataLen;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what this fixes. Unless the function comment is wrong, this function only needs to return nonzero if it completed the copy successfully, it doesn't need to return the number of bytes actually copied.

Before your change, it would always run each iteration of the while loop while dataLen was larger than 0, which meant that toCopy would always be assigned to a value greater than 0 on line 451, which means that when the loop terminated, it would always return the nonzero value that toCopy most recently held.

Your change doesn't materially change this fact, it just changes what the actual value is (instead of being the size of the most recent copy, it's the original size of the data to be copied). But it shouldn't matter which exact value was returned from this function according to the function comments.

So what is the point of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value for this function is used by s3.c::put_object() when completing a multipart upload. Perhaps the function comment should be updated.

Copy link
Contributor Author

@meinemitternacht meinemitternacht Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value that was being returned before was too small, causing the XML string to be truncated when it was sent to AWS. This fixes that issue.

@bji
Copy link
Owner

bji commented Apr 8, 2019

I clicked the 'begin review' button which I guess I neglected to click before. Can you see my question now?

@meinemitternacht
Copy link
Contributor Author

Yes, thank you. I have replied to your question above.

@bjitivo
Copy link

bjitivo commented Apr 9, 2019

Yes, multipart upload was added by another contributor some time after that function was made and the comment is out of date. Thanks for your change.

@bjitivo
Copy link

bjitivo commented Apr 9, 2019

Oops wrong account.

@bji bji merged commit 287e4be into bji:master Apr 9, 2019
@meinemitternacht meinemitternacht deleted the meinemitternacht-patch-1 branch April 10, 2019 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants