Chris and I had a short discussion last week about what the default content type for Confluence attachments should be. On occasion, we have data uploaded by a user with no content type information provided. Chris had already implemented the fix to serve this data as application/octet-stream
. I was worried that the default should be text/plain
, based on something I could remember reading but couldn't put my finger on. I decided to have a look further into the situation tonight.
My first stop was the MIME Wikipedia article, because I knew the HTTP Content-Type header was derived from the same header used in email messages. It led me to the MIME RFC which states the default content type for MIME messages is text/plain
:
Default RFC 822 messages without a MIME Content-Type header are taken by this protocol to be plain text in the US-ASCII character set, which can be explicitly specified as:
Content-type: text/plain; charset=us-ascii
This default is assumed if no Content-Type header field is specified.
Of course, that doesn't mean the default for HTTP would be the same. HTTP is usually used for transmission of HTML data — it is the Hyper Text Transfer Protocol after all — so a different default might make sense. The HTTP spec actually does have quite a different standard. Strangely, it appears as almost a footnote, completely separate to the discussion of the Content-Type header itself:
Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream".
Since we should provide a content type and the media type is also unknown to us, it turns out that application/octet-stream
is exactly the right thing for Confluence to do. We definitely don't want to delve into the perils of Content-Type sniffing.