Skip to content

Content-Type for files uploaded via S3 automatically set to application/xml #1840

Open
@westonpace

Description

@westonpace

Describe the bug

When I upload a file to S3 (using a multipart upload request) the content-type of the file will be application/xml unless I specify otherwise. This seems incorrect as a content-type should be omitted if unknown or, at worst, default to application/octet-stream. Per RFC 7231 (3.1.1.5):

A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.

This ended up causing a bit of confusion here (apache/arrow#11934). An S3 client was trying to be intelligent and inspect the XML data if the file was an XML file and this issue caused the client to inspect files it shouldn't.

Expected behavior

If the content type of a file is not set then the file should either have no content-type or the content-type should be set to application/octet-stream.

Current behavior

The file's content-type is set to application/xml

Steps to Reproduce

Reproducible Gist: https://gist.github.com/westonpace/9c3a0baa48083f33aa4880c0cb6a602b

Possible Solution

When the user does not specify a content-type either leave it unset or default to application/octet-stream

AWS CPP SDK version used

1.8.185

Compiler and Version used

GCC 9.3.0

Operating System and version

Ubuntu 20.04.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationThis is a problem with documentation.p3This is a minor priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions