19 March 2008

Default Content-Type in HTTP

Chris and I had a short discussion last week about what the default content type for Confluence attachments should be. On occasion, we have data uploaded by a user with no content type information provided. Chris had already implemented the fix to serve this data as application/octet-stream. I was worried that the default should be text/plain, based on something I could remember reading but couldn't put my finger on. I decided to have a look further into the situation tonight.

My first stop was the MIME Wikipedia article, because I knew the HTTP Content-Type header was derived from the same header used in email messages. It led me to the MIME RFC which states the default content type for MIME messages is text/plain:

Default RFC 822 messages without a MIME Content-Type header are taken by this protocol to be plain text in the US-ASCII character set, which can be explicitly specified as:

Content-type: text/plain; charset=us-ascii

This default is assumed if no Content-Type header field is specified.

Of course, that doesn't mean the default for HTTP would be the same. HTTP is usually used for transmission of HTML data — it is the Hyper Text Transfer Protocol after all — so a different default might make sense. The HTTP spec actually does have quite a different standard. Strangely, it appears as almost a footnote, completely separate to the discussion of the Content-Type header itself:

Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream".

Since we should provide a content type and the media type is also unknown to us, it turns out that application/octet-stream is exactly the right thing for Confluence to do. We definitely don't want to delve into the perils of Content-Type sniffing.