Network Working Group N. Freed
Request for Comments: 2046 Innosoft
Obsoletes: 1521, 1522, 1590 N. Borenstein
Category: Standards Track First Virtual
November 1996


Multipurpose Internet Mail Extensions
(MIME) Part Two:
Media Types

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Abstract

STD 11, RFC 822 defines a message representation protocol specifying
considerable detail about US-ASCII message headers, but which leaves
the message content, or message body, as flat US-ASCII text. This
set of documents, collectively called the Multipurpose Internet Mail
Extensions, or MIME, redefines the format of messages to allow for

(1) textual message bodies in character sets other than
US-ASCII,

(2) an extensible set of different formats for non-textual
message bodies,

(3) multi-part message bodies, and

(4) textual header information in character sets other than
US-ASCII.

These documents are based on earlier work documented in RFC 934, STD
11, and RFC 1049, but extends and revises them. Because RFC 822 said
so little about message bodies, these documents are largely
orthogonal to (rather than a revision of) RFC 822.

The initial document in this set, RFC 2045, specifies the various
headers used to describe the structure of MIME messages. This second
document defines the general structure of the MIME media typing
system and defines an initial set of media types. The third document,
RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text

Freed & Borenstein Standards Track [Page 1]



RFC 2046 Media Types November 1996


data in Internet mail header fields. The fourth document, RFC 2048,
specifies various IANA registration procedures for MIME-related
facilities. The fifth and final document, RFC 2049, describes MIME
conformance criteria as well as providing some illustrative examples
of MIME message formats, acknowledgements, and the bibliography.

These documents are revisions of RFCs 1521 and 1522, which themselves
were revisions of RFCs 1341 and 1342. An appendix in RFC 2049
describes differences and changes from previous versions.

Table of Contents

1. Introduction ......................................... 3
2. Definition of a Top-Level Media Type ................. 4
3. Overview Of The Initial Top-Level Media Types ........ 4
4. Discrete Media Type Values ........................... 6
4.1 Text Media Type ..................................... 6
4.1.1 Representation of Line Breaks ..................... 7
4.1.2 Charset Parameter ................................. 7
4.1.3 Plain Subtype ..................................... 11
4.1.4 Unrecognized Subtypes ............................. 11
4.2 Image Media Type .................................... 11
4.3 Audio Media Type .................................... 11
4.4 Video Media Type .................................... 12
4.5 Application Media Type .............................. 12
4.5.1 Octet-Stream Subtype .............................. 13
4.5.2 PostScript Subtype ................................ 14
4.5.3 Other Application Subtypes ........................ 17
5. Composite Media Type Values .......................... 17
5.1 Multipart Media Type ................................ 17
5.1.1 Common Syntax ..................................... 19
5.1.2 Handling Nested Messages and Multiparts ........... 24
5.1.3 Mixed Subtype ..................................... 24
5.1.4 Alternative Subtype ............................... 24
5.1.5 Digest Subtype .................................... 26
5.1.6 Parallel Subtype .................................. 27
5.1.7 Other Multipart Subtypes .......................... 28
5.2 Message Media Type .................................. 28
5.2.1 RFC822 Subtype .................................... 28
5.2.2 Partial Subtype ................................... 29
5.2.2.1 Message Fragmentation and Reassembly ............ 30
5.2.2.2 Fragmentation and Reassembly Example ............ 31
5.2.3 External-Body Subtype ............................. 33
5.2.4 Other Message Subtypes ............................ 40
6. Experimental Media Type Values ....................... 40
7. Summary .............................................. 41
8. Security Considerations .............................. 41
9. Authors' Addresses ................................... 42

Freed & Borenstein Standards Track [Page 2]



RFC 2046 Media Types November 1996


A. Collected Grammar .................................... 43

1. Introduction

The first document in this set, RFC 2045, defines a number of header
fields, including Content-Type. The Content-Type field is used to
specify the nature of the data in the body of a MIME entity, by
giving media type and subtype identifiers, and by providing auxiliary
information that may be required for certain media types. After the
type and subtype names, the remainder of the header field is simply a
set of parameters, specified in an attribute/value notation. The
ordering of parameters is not significant.

In general, the top-level media type is used to declare the general
type of data, while the subtype specifies a specific format for that
type of data. Thus, a media type of "image/xyz" is enough to tell a
user agent that the data is an image, even if the user agent has no
knowledge of the specific image format "xyz". Such information can
be used, for example, to decide whether or not to show a user the raw
data from an unrecognized subtype -- such an action might be
reasonable for unrecognized subtypes of "text", but not for
unrecognized subtypes of "image" or "audio". For this reason,
registered subtypes of "text", "image", "audio", and "video" should
not contain embedded information that is really of a different type.
Such compound formats should be represented using the "multipart" or
"application" types.

Parameters are modifiers of the media subtype, and as such do not
fundamentally affect the nature of the content. The set of
meaningful parameters depends on the media type and subtype. Most
parameters are associated with a single specific subtype. However, a
given top-level media type may define parameters which are applicable
to any subtype of that type. Parameters may be required by their
defining media type or subtype or they may be optional. MIME
implementations must also ignore any parameters whose names they do
not recognize.

MIME's Content-Type header field and media type mechanism has been
carefully designed to be extensible, and it is expected that the set
of media type/subtype pairs and their associated parameters will grow
significantly over time. Several other MIME facilities, such as
transfer encodings and "message/external-body" access types, are
likely to have new values defined over time. In order to ensure that
the set of such values is developed in an orderly, well-specified,
and public manner, MIME sets up a registration process which uses the
Internet Assigned Numbers Authority (IANA) as a central registry for
MIME's various areas of extensibility. The registration process for
these areas is described in a companion document, RFC 2048.

Freed & Borenstein Standards Track [Page 3]



RFC 2046 Media Types November 1996


The initial seven standard top-level media type are defined and
described in the remainder of this document.

2. Definition of a Top-Level Media Type

The definition of a top-level media type consists of:

(1) a name and a description of the type, including
criteria for whether a particular type would qualify
under that type,

(2) the names and definitions of parameters, if any, which
are defined for all subtypes of that type (including
whether such parameters are required or optional),

(3) how a user agent and/or gateway should handle unknown
subtypes of this type,

(4) general considerations on gatewaying entities of this
top-level type, if any, and

(5) any restrictions on content-transfer-encodings for
entities of this top-level type.

3. Overview Of The Initial Top-Level Media Types

The five discrete top-level media types are:

(1) text -- textual information. The subtype "plain" in
particular indicates plain text containing no
formatting commands or directives of any sort. Plain
text is intended to be displayed "as-is". No special
software is required to get the full meaning of the
text, aside from support for the indicated character
set. Other subtypes are to be used for enriched text in
forms where application software may enhance the
appearance of the text, but such software must not be
required in order to get the general idea of the
content. Possible subtypes of "text" thus include any
word processor format that can be read without
resorting to software that understands the format. In
particular, formats that employ embeddded binary
formatting information are not considered directly
readable. A very simple and portable subtype,
"richtext", was defined in RFC 1341, with a further
revision in RFC 1896 under the name "enriched".

Freed & Borenstein Standards Track [Page 4]



RFC 2046 Media Types November 1996


(2) image -- image data. "Image" requires a display device
(such as a graphical display, a graphics printer, or a
FAX machine) to view the information. An initial
subtype is defined for the widely-used image format
JPEG. . subtypes are defined for two widely-used image
formats, jpeg and gif.

(3) audio -- audio data. "Audio" requires an audio output
device (such as a speaker or a telephone) to "display"
the contents. An initial subtype "basic" is defined in
this document.

(4) video -- video data. "Video" requires the capability
to display moving images, typically including
specialized hardware and software. An initial subtype
"mpeg" is defined in this document.

(5) application -- some other kind of data, typically
either uninterpreted binary data or information to be
processed by an application. The subtype "octet-
stream" is to be used in the case of uninterpreted
binary data, in which case the simplest recommended
action is to offer to write the information into a file
for the user. The "PostScript" subtype is also defined
for the transport of PostScript material. Other
expected uses for "application" include spreadsheets,
data for mail-based scheduling systems, and languages
for "active" (computational) messaging, and word
processing formats that are not directly readable.
Note that security considerations may exist for some
types of application data, most notably
"application/PostScript" and any form of active
messaging. These issues are discussed later in this
document.

The two composite top-level media types are:

(1) multipart -- data consisting of multiple entities of
independent data types. Four subtypes are initially
defined, including the basic "mixed" subtype specifying
a generic mixed set of parts, "alternative" for
representing the same data in multiple formats,
"parallel" for parts intended to be viewed
simultaneously, and "digest" for multipart entities in
which each part has a default type of "message/rfc822".


Freed & Borenstein Standards Track [Page 5]



RFC 2046 Media Types November 1996


(2) message -- an encapsulated message. A body of media
type "message" is itself all or a portion of some kind
of message object. Such objects may or may not in turn
contain other entities. The "rfc822" subtype is used
when the encapsulated content is itself an RFC 822
message. The "partial" subtype is defined for partial
RFC 822 messages, to permit the fragmented transmission
of bodies that are thought to be too large to be passed
through transport facilities in one piece. Another
subtype, "external-body", is defined for specifying
large bodies by reference to an external data source.

It should be noted that the list of media type values given here may
be augmented in time, via the mechanisms described above, and that
the set of subtypes is expected to grow substantially.

4. Discrete Media Type Values

Five of the seven initial media type values refer to discrete bodies.
The content of these types must be handled by non-MIME mechanisms;
they are opaque to MIME processors.

4.1. Text Media Type

The "text" media type is intended for sending material which is
principally textual in form. A "charset" parameter may be used to
indicate the character set of the body text for "text" subtypes,
notably including the subtype "text/plain", which is a generic
subtype for plain text. Plain text does not provide for or allow
formatting commands, font attribute specifications, processing
instructions, interpretation directives, or content markup. Plain
text is seen simply as a linear sequence of characters, possibly
interrupted by line breaks or page breaks. Plain text may allow the
stacking of several characters in the same position in the text.
Plain text in scripts like Arabic and Hebrew may also include
facilitites that allow the arbitrary mixing of text segments with
opposite writing directions.

Beyond plain text, there are many formats for representing what might
be known as "rich text". An interesting characteristic of many such
representations is that they are to some extent readable even without
the software that interprets them. It is useful, then, to
distinguish them, at the highest level, from such unreadable data as
images, audio, or text represented in an unreadable form. In the
absence of appropriate interpretation software, it is reasonable to
show subtypes of "text" to the user, while it is not reasonable to do
so with most nontextual data. Such formatted textual data should be
represented using subtypes of "text".

Freed & Borenstein Standards Track [Page 6]



RFC 2046 Media Types November 1996


4.1.1. Representation of Line Breaks

The canonical form of any MIME "text" subtype MUST always represent a
line break as a CRLF sequence. Similarly, any occurrence of CRLF in
MIME "text" MUST represent a line break. Use of CR and LF outside of
line break sequences is also forbidden.

This rule applies regardless of format or character set or sets
involved.

NOTE: The proper interpretation of line breaks when a body is
displayed depends on the media type. In particular, while it is
appropriate to treat a line break as a transition to a new line when
displaying a "text/plain" body, this treatment is actually incorrect
for other subtypes of "text" like "text/enriched" [RFC-1896].
Similarly, whether or not line breaks should be added during display
operations is also a function of the media type. It should not be
necessary to add any line breaks to display "text/plain" correctly,
whereas proper display of "text/enriched" requires the appropriate
addition of line breaks.

NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC-
821] allows a maximum of 998 octets before the next CRLF sequence.
To be transported by such protocols, data which includes too long
segments without CRLF sequences must be encoded with a suitable
content-transfer-encoding.

4.1.2. Charset Parameter

A critical parameter that may be specified in the Content-Type field
for "text/plain" data is the character set. This is specified with a
"charset" parameter, as in:

Content-type: text/plain; charset=iso-8859-1

Unlike some other parameter values, the values of the charset
parameter are NOT case sensitive. The default character set, which
must be assumed in the absence of a charset parameter, is US-ASCII.

The specification for any future subtypes of "text" must specify
whether or not they will also utilize a "charset" parameter, and may
possibly restrict its values as well. For other subtypes of "text"
than "text/plain", the semantics of the "charset" parameter should be
defined to be identical to those specified here for "text/plain",
i.e., the body consists entirely of characters in the given charset.
In particular, definers of future "text" subtypes should pay close
attention to the implications of multioctet character sets for their
subtype definitions.

Freed & Borenstein Standards Track [Page 7]



RFC 2046 Media Types November 1996


The charset parameter for subtypes of "text" gives a name of a
character set, as "character set" is defined in RFC 2045. The rules
regarding line breaks detailed in the previous section must also be
observed -- a character set whose definition does not conform to
these rules cannot be used in a MIME "text" subtype.

An initial list of predefined character set names can be found at the
end of this section. Additional character sets may be registered
with IANA.

Other media types than subtypes of "text" might choose to employ the
charset parameter as defined here, but with the CRLF/line break
restriction removed. Therefore, all character sets that conform to
the general definition of "character set" in RFC 2045 can be
registered for MIME use.

Note that if the specified character set includes 8-bit characters
and such characters are used in the body, a Content-Transfer-Encoding
header field and a corresponding encoding on the data are required in
order to transmit the body via some mail transfer protocols, such as
SMTP [RFC-821].

The default character set, US-ASCII, has been the subject of some
confusion and ambiguity in the past. Not only were there some
ambiguities in the definition, there have been wide variations in
practice. In order to eliminate such ambiguity and variations in the
future, it is strongly recommended that new user agents explicitly
specify a character set as a media type parameter in the Content-Type
header field. "US-ASCII" does not indicate an arbitrary 7-bit
character set, but specifies that all octets in the body must be
interpreted as characters according to the US-ASCII character set.
National and application-oriented versions of ISO 646 [ISO-646] are
usually NOT identical to US-ASCII, and in that case their use in
Internet mail is explicitly discouraged. The omission of the ISO 646
character set from this document is deliberate in this regard. The
character set name of "US-ASCII" explicitly refers to the character
set defined in ANSI X3.4-1986 [US- ASCII]. The new international
reference version (IRV) of the 1991 edition of ISO 646 is identical
to US-ASCII. The character set name "ASCII" is reserved and must not
be used for any purpose.

NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier
version of the American Standard. Insofar as one of the purposes of
specifying a media type and character set is to permit the receiver
to unambiguously determine how the sender intended the coded message
to be interpreted, assuming anything other than "strict ASCII" as the
default would risk unintentional and incompatible changes to the
semantics of messages now being transmitted. This also implies that

Freed & Borenstein Standards Track [Page 8]



RFC 2046 Media Types November 1996


messages containing characters coded according to other versions of
ISO 646 than US-ASCII and the 1991 IRV, or using code-switching
procedures (e.g., those of ISO 2022), as well as 8bit or multiple
octet character encodings MUST use an appropriate character set
specification to be consistent with MIME.

The complete US-ASCII character set is listed in ANSI X3.4- 1986.
Note that the control characters including DEL (0-31, 127) have no
defined meaning in apart from the combination CRLF (US-ASCII values
13 and 10) indicating a new line. Two of the characters have de
facto meanings in wide use: FF (12) often means "start subsequent
text on the beginning of a new page"; and TAB or HT (9) often (though
not always) means "move the cursor to the next available column after
the current position where the column number is a multiple of 8
(counting the first column as column 0)." Aside from these
conventions, any use of the control characters or DEL in a body must
either occur

(1) because a subtype of text other than "plain"
specifically assigns some additional meaning, or

(2) within the context of a private agreement between the
sender and recipient. Such private agreements are
discouraged and should be replaced by the other
capabilities of this document.

NOTE: An enormous proliferation of character sets exist beyond US-
ASCII. A large number of partially or totally overlapping character
sets is NOT a good thing. A SINGLE character set that can be used
universally for representing all of the world's languages in Internet
mail would be preferrable. Unfortunately, existing practice in
several communities seems to point to the continued use of multiple
character sets in the near future. A small number of standard
character sets are, therefore, defined for Internet use in this
document.

The defined charset values are:

(1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].

(2) ISO-8859-X -- where "X" is to be replaced, as
necessary, for the parts of ISO-8859 [ISO-8859]. Note
that the ISO 646 character sets have deliberately been
omitted in favor of their 8859 replacements, which are
the designated character sets for Internet mail. As of
the publication of this document, the legitimate values
for "X" are the digits 1 through 10.


Freed & Borenstein Standards Track [Page 9]



RFC 2046 Media Types November 1996


Characters in the range 128-159 has no assigned meaning in ISO-8859-
X. Characters with values below 128 in ISO-8859-X have the same
assigned meaning as they do in US-ASCII.

Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew
alphabet) includes both characters for which the normal writing
direction is right to left and characters for which it is left to
right, but do not define a canonical ordering method for representing
bi-directional text. The charset values "ISO-8859-6" and "ISO-8859-
8", however, specify that the visual method is used [RFC-1556].

All of these character sets are used as pure 7bit or 8bit sets
without any shift or escape functions. The meaning of shift and
escape sequences in these character sets is not defined.

The character sets specified above are the ones that were relatively
uncontroversial during the drafting of MIME. This document does not
endorse the use of any particular character set other than US-ASCII,
and recognizes that the future evolution of world character sets
remains unclear.

Note that the character set used, if anything other than US- ASCII,
must always be explicitly specified in the Content-Type field.

No character set name other than those defined above may be used in
Internet mail without the publication of a formal specification and
its registration with IANA, or by private agreement, in which case
the character set name must begin with "X-".

Implementors are discouraged from defining new character sets unless
absolutely necessary.

The "charset" parameter has been defined primarily for the purpose of
textual data, and is described in this section for that reason.
However, it is conceivable that non-textual data might also wish to
specify a charset value for some purpose, in which case the same
syntax and values should be used.

In general, composition software should always use the "lowest common
denominator" character set possible. For example, if a body contains
only US-ASCII characters, it SHOULD be marked as being in the US-
ASCII character set, not ISO-8859-1, which, like all the ISO-8859
family of character sets, is a superset of US-ASCII. More generally,
if a widely-used character set is a subset of another character set,
and a body contains only characters in the widely-used subset, it
should be labelled as being in that subset. This will increase the
chances that the recipient will be able to view the resulting entity
correctly.

Freed & Borenstein Standards Track [Page 10]



RFC 2046 Media Types November 1996


4.1.3. Plain Subtype

The simplest and most important subtype of "text" is "plain". This
indicates plain text that does not contain any formatting commands or
directives. Plain text is intended to be displayed "as-is", that is,
no interpretation of embedded formatting commands, font attribute
specifications, processing instructions, interpretation directives,
or content markup should be necessary for proper display. The
default media type of "text/plain; charset=us-ascii" for Internet
mail describes existing Internet practice. That is, it is the type
of body defined by RFC 822.

No other "text" subtype is defined by this document.

4.1.4. Unrecognized Subtypes

Unrecognized subtypes of "text" should be treated as subtype "plain"
as long as the MIME implementation knows how to handle the charset.
Unrecognized subtypes which also specify an unrecognized charset
should be treated as "application/octet- stream".

4.2. Image Media Type

A media type of "image" indicates that the body contains an image.
The subtype names the specific image format. These names are not
case sensitive. An initial subtype is "jpeg" for the JPEG format
using JFIF encoding [JPEG].

The list of "image" subtypes given here is neither exclusive nor
exhaustive, and is expected to grow as more types are registered with
IANA, as described in RFC 2048.

Unrecognized subtypes of "image" should at a miniumum be treated as
"application/octet-stream". Implementations may optionally elect to
pass subtypes of "image" that they do not specifically recognize to a
secure and robust general-purpose image viewing application, if such
an application is available.

NOTE: Using of a generic-purpose image viewing application this way
inherits the security problems of the most dangerous type supported
by the application.

4.3. Audio Media Type

A media type of "audio" indicates that the body contains audio data.
Although there is not yet a consensus on an "ideal" audio format for
use with computers, there is a pressing need for a format capable of
providing interoperable behavior.

Freed & Borenstein Standards Track [Page 11]



RFC 2046 Media Types November 1996


The initial subtype of "basic" is specified to meet this requirement
by providing an absolutely minimal lowest common denominator audio
format. It is expected that richer formats for higher quality and/or
lower bandwidth audio will be defined by a later document.

The content of the "audio/basic" subtype is single channel audio
encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz.

Unrecognized subtypes of "audio" should at a miniumum be treated as
"application/octet-stream". Implementations may optionally elect to
pass subtypes of "audio" that they do not specifically recognize to a
robust general-purpose audio playing application, if such an
application is available.

4.4. Video Media Type

A media type of "video" indicates that the body contains a time-
varying-picture image, possibly with color and coordinated sound.
The term 'video' is used in its most generic sense, rather than with
reference to any particular technology or format, and is not meant to
preclude subtypes such as animated drawings encoded compactly. The
subtype "mpeg" refers to video coded according to the MPEG standard
[MPEG].

Note that although in general this document strongly discourages the
mixing of multiple media in a single body, it is recognized that many
so-called video formats include a representation for synchronized
audio, and this is explicitly permitted for subtypes of "video".

Unrecognized subtypes of "video" should at a minumum be treated as
"application/octet-stream". Implementations may optionally elect to
pass subtypes of "video" that they do not specifically recognize to a
robust general-purpose video display application, if such an
application is available.

4.5. Application Media Type

The "application" media type is to be used for discrete data which do
not fit in any of the other categories, and particularly for data to
be processed by some type of application program. This is
information which must be processed by an application before it is
viewable or usable by a user. Expected uses for the "application"
media type include file transfer, spreadsheets, data for mail-based
scheduling systems, and languages for "active" (computational)
material. (The latter, in particular, can pose security problems
which must be understood by implementors, and are considered in
detail in the discussion of the "application/PostScript" media type.)


Freed & Borenstein Standards Track [Page 12]



RFC 2046 Media Types November 1996


For example, a meeting scheduler might define a standard
representation for information about proposed meeting dates. An
intelligent user agent would use this information to conduct a dialog
with the user, and might then send additional material based on that
dialog. More generally, there have been several "active" messaging
languages developed in which programs in a suitably specialized
language are transported to a remote location and automatically run
in the recipient's environment.

Such applications may be defined as subtypes of the "application"
media type. This document defines two subtypes:

octet-stream, and PostScript.

The subtype of "application" will often be either the name or include
part of the name of the application for which the data are intended.
This does not mean, however, that any application program name may be
used freely as a subtype of "application".

4.5.1. Octet-Stream Subtype

The "octet-stream" subtype is used to indicate that a body contains
arbitrary binary data. The set of currently defined parameters is:

(1) TYPE -- the general type or category of binary data.
This is intended as information for the human recipient
rather than for any automatic processing.

(2) PADDING -- the number of bits of padding that were
appended to the bit-stream comprising the actual
contents to produce the enclosed 8bit byte-oriented
data. This is useful for enclosing a bit-stream in a
body when the total number of bits is not a multiple of
8.

Both of these parameters are optional.

An additional parameter, "CONVERSIONS", was defined in RFC 1341 but
has since been removed. RFC 1341 also defined the use of a "NAME"
parameter which gave a suggested file name to be used if the data
were to be written to a file. This has been deprecated in
anticipation of a separate Content-Disposition header field, to be
defined in a subsequent RFC.

The recommended action for an implementation that receives an
"application/octet-stream" entity is to simply offer to put the data
in a file, with any Content-Transfer-Encoding undone, or perhaps to
use it as input to a user-specified process.

Freed & Borenstein Standards Track [Page 13]



RFC 2046 Media Types November 1996


To reduce the danger of transmitting rogue programs, it is strongly
recommended that implementations NOT implement a path-search
mechanism whereby an arbitrary program named in the Content-Type
parameter (e.g., an "interpreter=" parameter) is found and executed
using the message body as input.

4.5.2. PostScript Subtype

A media type of "application/postscript" indicates a PostScript
program. Currently two variants of the PostScript language are
allowed; the original level 1 variant is described in [POSTSCRIPT]
and the more recent level 2 variant is described in [POSTSCRIPT2].

PostScript is a registered trademark of Adobe Systems, Inc. Use of
the MIME media type "application/postscript" implies recognition of
that trademark and all the rights it entails.

The PostScript language definition provides facilities for internal
labelling of the specific language features a given program uses.
This labelling, called the PostScript document structuring
conventions, or DSC, is very general and provides substantially more
information than just the language level. The use of document
structuring conventions, while not required, is strongly recommended
as an aid to interoperability. Documents which lack proper
structuring conventions cannot be tested to see whether or not they
will work in a given environment. As such, some systems may assume
the worst and refuse to process unstructured documents.

The execution of general-purpose PostScript interpreters entails
serious security risks, and implementors are discouraged from simply
sending PostScript bodies to "off- the-shelf" interpreters. While it
is usually safe to send PostScript to a printer, where the potential
for harm is greatly constrained by typical printer environments,
implementors should consider all of the following before they add
interactive display of PostScript bodies to their MIME readers.

The remainder of this section outlines some, though probably not all,
of the possible problems with the transport of PostScript entities.

(1) Dangerous operations in the PostScript language
include, but may not be limited to, the PostScript
operators "deletefile", "renamefile", "filenameforall",
and "file". "File" is only dangerous when applied to
something other than standard input or output.
Implementations may also define additional nonstandard
file operators; these may also pose a threat to
security. "Filenameforall", the wildcard file search
operator, may appear at first glance to be harmless.

Freed & Borenstein Standards Track [Page 14]



RFC 2046 Media Types November 1996


Note, however, that this operator has the potential to
reveal information about what files the recipient has
access to, and this information may itself be
sensitive. Message senders should avoid the use of
potentially dangerous file operators, since these
operators are quite likely to be unavailable in secure
PostScript implementations. Message receiving and
displaying software should either completely disable
all potentially dangerous file operators or take
special care not to delegate any special authority to
their operation. These operators should be viewed as
being done by an outside agency when interpreting
PostScript documents. Such disabling and/or checking
should be done completely outside of the reach of the
PostScript language itself; care should be taken to
insure that no method exists for re-enabling full-
function versions of these operators.

(2) The PostScript language provides facilities for exiting
the normal interpreter, or server, loop. Changes made
in this "outer" environment are customarily retained
across documents, and may in some cases be retained
semipermanently in nonvolatile memory. The operators
associated with exiting the interpreter loop have the
potential to interfere with subsequent document
processing. As such, their unrestrained use
constitutes a threat of service denial. PostScript
operators that exit the interpreter loop include, but
may not be limited to, the exitserver and startjob
operators. Message sending software should not
generate PostScript that depends on exiting the
interpreter loop to operate, since the ability to exit
will probably be unavailable in secure PostScript
implementations. Message receiving and displaying
software should completely disable the ability to make
retained changes to the PostScript environment by
eliminating or disabling the "startjob" and
"exitserver" operations. If these operations cannot be
eliminated or completely disabled the password
associated with them should at least be set to a hard-
to-guess value.

(3) PostScript provides operators for setting system-wide
and device-specific parameters. These parameter
settings may be retained across jobs and may
potentially pose a threat to the correct operation of
the interpreter. The PostScript operators that set
system and device parameters include, but may not be

Freed & Borenstein Standards Track [Page 15]



RFC 2046 Media Types November 1996


limited to, the "setsystemparams" and "setdevparams"
operators. Message sending software should not
generate PostScript that depends on the setting of
system or device parameters to operate correctly. The
ability to set these parameters will probably be
unavailable in secure PostScript implementations.
Message receiving and displaying software should
disable the ability to change system and device
parameters. If these operators cannot be completely
disabled the password associated with them should at
least be set to a hard-to-guess value.

(4) Some PostScript implementations provide nonstandard
facilities for the direct loading and execution of
machine code. Such facilities are quite obviously open
to substantial abuse. Message sending software should
not make use of such features. Besides being totally
hardware-specific, they are also likely to be
unavailable in secure implementations of PostScript.
Message receiving and displaying software should not
allow such operators to be used if they exist.

(5) PostScript is an extensible language, and many, if not
most, implementations of it provide a number of their
own extensions. This document does not deal with such
extensions explicitly since they constitute an unknown
factor. Message sending software should not make use
of nonstandard extensions; they are likely to be
missing from some implementations. Message receiving
and displaying software should make sure that any
nonstandard PostScript operators are secure and don't
present any kind of threat.

(6) It is possible to write PostScript that consumes huge
amounts of various system resources. It is also
possible to write PostScript programs that loop
indefinitely. Both types of programs have the
potential to cause damage if sent to unsuspecting
recipients. Message-sending software should avoid the
construction and dissemination of such programs, which
is antisocial. Message receiving and displaying
software should provide appropriate mechanisms to abort
processing after a reasonable amount of time has
elapsed. In addition, PostScript interpreters should be
limited to the consumption of only a reasonable amount
of any given system resource.

Freed & Borenstein Standards Track [Page 16]



RFC 2046 Media Types November 1996


(7) It is possible to include raw binary information inside
PostScript in various forms. This is not recommended
for use in Internet mail, both because it is not
supported by all PostScript interpreters and because it
significantly complicates the use of a MIME Content-
Transfer-Encoding. (Without such binary, PostScript
may typically be viewed as line-oriented data. The
treatment of CRLF sequences becomes extremely
problematic if binary and line-oriented data are mixed
in a single Postscript data stream.)

(8) Finally, bugs may exist in some PostScript interpreters
which could possibly be exploited to gain unauthorized
access to a recipient's system. Apart from noting this
possibility, there is no specific action to take to
prevent this, apart from the timely correction of such
bugs if any are found.

4.5.3. Other Application Subtypes

It is expected that many other subtypes of "application" will be
defined in the future. MIME implementations must at a minimum treat
any unrecognized subtypes as being equivalent to "application/octet-
stream".

5. Composite Media Type Values

The remaining two of the seven initial Content-Type values refer to
composite entities. Composite entities are handled using MIME
mechanisms -- a MIME processor typically handles the body directly.

5.1. Multipart Media Type

In the case of multipart entities, in which one or more different
sets of data are combined in a single body, a "multipart" media type
field must appear in the entity's header. The body must then contain
one or more body parts, each preceded by a boundary delimiter line,
and the last one followed by a closing boundary delimiter line.
After its boundary delimiter line, each body part then consists of a
header area, a blank line, and a body area. Thus a body part is
similar to an RFC 822 message in syntax, but different in meaning.

A body part is an entity and hence is NOT to be interpreted as
actually being an RFC 822 message. To begin with, NO header fields
are actually required in body parts. A body part that starts with a
blank line, therefore, is allowed and is a body part for which all
default values are to be assumed. In such a case, the absence of a
Content-Type header usually indicates that the corresponding body has

Freed & Borenstein Standards Track [Page 17]



RFC 2046 Media Types November 1996


a content-type of "text/plain; charset=US-ASCII".

The only header fields that have defined meaning for body parts are
those the names of which begin with "Content-". All other header
fields may be ignored in body parts. Although they should generally
be retained if at all possible, they may be discarded by gateways if
necessary. Such other fields are permitted to appear in body parts
but must not be depended on. "X-" fields may be created for
experimental or private purposes, with the recognition that the
information they contain may be lost at some gateways.

NOTE: The distinction between an RFC 822 message and a body part is
subtle, but important. A gateway between Internet and X.400 mail,
for example, must be able to tell the difference between a body part
that contains an image and a body part that contains an encapsulated
message, the body of which is a JPEG image. In order to represent
the latter, the body part must have "Content-Type: message/rfc822",
and its body (after the blank line) must be the encapsulated message,
with its own "Content-Type: image/jpeg" header field. The use of
similar syntax facilitates the conversion of messages to body parts,
and vice versa, but the distinction between the two must be
understood by implementors. (For the special case in which parts
actually are messages, a "digest" subtype is also defined.)

As stated previously, each body part is preceded by a boundary
delimiter line that contains the boundary delimiter. The boundary
delimiter MUST NOT appear inside any of the encapsulated parts, on a
line by itself or as the prefix of any line. This implies that it is
crucial that the composing agent be able to choose and specify a
unique boundary parameter value that does not contain the boundary
parameter value of an enclosing multipart as a prefix.

All present and future subtypes of the "multipart" type must use an
identical syntax. Subtypes may differ in their semantics, and may
impose additional restrictions on syntax, but must conform to the
required syntax for the "multipart" type. This requirement ensures
that all conformant user agents will at least be able to recognize
and separate the parts of any multipart entity, even those of an
unrecognized subtype.

As stated in the definition of the Content-Transfer-Encoding field
[RFC 2045], no encoding other than "7bit", "8bit", or "binary" is
permitted for entities of type "multipart". The "multipart" boundary
delimiters and header fields are always represented as 7bit US-ASCII
in any case (though the header fields may encode non-US-ASCII header
text as per RFC 2047) and data within the body parts can be encoded
on a part-by-part basis, with Content-Transfer-Encoding fields for
each appropriate body part.

Freed & Borenstein Standards Track [Page 18]



RFC 2046 Media Types November 1996


5.1.1. Common Syntax

This section defines a common syntax for subtypes of "multipart".
All subtypes of "multipart" must use this syntax. A simple example
of a multipart message also appears in this section. An example of a
more complex multipart message is given in RFC 2049.

The Content-Type field for multipart entities requires one parameter,
"boundary". The boundary delimiter line is then defined as a line
consisting entirely of two hyphen characters ("-", decimal value 45)
followed by the boundary parameter value from the Content-Type header
field, optional linear whitespace, and a terminating CRLF.

NOTE: The hyphens are for rough compatibility with the earlier RFC
934 method of message encapsulation, and for ease of searching for
the boundaries in some implementations. However, it should be noted
that multipart messages are NOT completely compatible with RFC 934
encapsulations; in particular, they do not obey RFC 934 quoting
conventions for embedded lines that begin with hyphens. This
mechanism was chosen over the RFC 934 mechanism because the latter
causes lines to grow with each level of quoting. The combination of
this growth with the fact that SMTP implementations sometimes wrap
long lines made the RFC 934 mechanism unsuitable for use in the event
that deeply-nested multipart structuring is ever desired.

WARNING TO IMPLEMENTORS: The grammar for parameters on the Content-
type field is such that it is often necessary to enclose the boundary
parameter values in quotes on the Content-type line. This is not
always necessary, but never hurts. Implementors should be sure to
study the grammar carefully in order to avoid producing invalid
Content-type fields. Thus, a typical "multipart" Content-Type header
field might look like this:

Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p

But the following is not valid:

Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p

(because of the colon) and must instead be represented as

Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p"

This Content-Type value indicates that the content consists of one or
more parts, each with a structure that is syntactically identical to
an RFC 822 message, except that the header area is allowed to be
completely empty, and that the parts are each preceded by the line


Freed & Borenstein Standards Track [Page 19]



RFC 2046 Media Types November 1996


--gc0pJq0M:08jU534c0p

The boundary delimiter MUST occur at the beginning of a line, i.e.,
following a CRLF, and the initial CRLF is considered to be attached
to the boundary delimiter line rather than part of the preceding
part. The boundary may be followed by zero or more characters of
linear whitespace. It is then terminated by either another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part. If no Content-Type
field is present it is assumed to be "message/rfc822" in a
"multipart/digest" and "text/plain" otherwise.

NOTE: The CRLF preceding the boundary delimiter line is conceptually
attached to the boundary so that it is possible to have a part that
does not end with a CRLF (line break). Body parts that must be
considered to end with line breaks, therefore, must have two CRLFs
preceding the boundary delimiter line, the first of which is part of
the preceding body part, and the second of which is part of the
encapsulation boundary.

Boundary delimiters must not appear within the encapsulated material,
and must be no longer than 70 characters, not counting the two
leading hyphens.

The boundary delimiter line following the last body part is a
distinguished delimiter that indicates that no further body parts
will follow. Such a delimiter line is identical to the previous
delimiter lines, with the addition of two more hyphens after the
boundary parameter value.

--gc0pJq0M:08jU534c0p--

NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the
boundary value with the beginning of each candidate line. An exact
match of the entire candidate line is not required; it is sufficient
that the boundary appear in its entirety following the CRLF.

There appears to be room for additional information prior to the
first boundary delimiter line and following the final boundary
delimiter line. These areas should generally be left blank, and
implementations must ignore anything that appears before the first
boundary delimiter line or after the last one.

NOTE: These "preamble" and "epilogue" areas are generally not used
because of the lack of proper typing of these parts and the lack of
clear semantics for handling these areas at gateways, particularly
X.400 gateways. However, rather than leaving the preamble area
blank, many MIME implementations have found this to be a convenient

Freed & Borenstein Standards Track [Page 20]



RFC 2046 Media Types November 1996


place to insert an explanatory note for recipients who read the
message with pre-MIME software, since such notes will be ignored by
MIME-compliant software.

NOTE: Because boundary delimiters must not appear in the body parts
being encapsulated, a user agent must exercise care to choose a
unique boundary parameter value. The boundary parameter value in the
example above could have been the result of an algorithm designed to
produce boundary delimiters with a very low probability of already
existing in the data to be encapsulated without having to prescan the
data. Alternate algorithms might result in more "readable" boundary
delimiters for a recipient with an old user agent, but would require
more attention to the possibility that the boundary delimiter might
appear at the beginning of some line in the encapsulated part. The
simplest boundary delimiter line possible is something like "---",
with a closing boundary delimiter line of "-----".

As a very simple example, the following multipart message has two
parts, both of them plain text, one of them explicitly typed and one
of them implicitly typed:

From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com>
Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST)
Subject: Sample message
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="simple boundary"

This is the preamble. It is to be ignored, though it
is a handy place for composition agents to include an
explanatory note to non-MIME conformant readers.

--simple boundary

This is implicitly typed plain US-ASCII text.
It does NOT end with a linebreak.
--simple boundary
Content-type: text/plain; charset=us-ascii

This is explicitly typed plain US-ASCII text.
It DOES end with a linebreak.

--simple boundary--

This is the epilogue. It is also to be ignored.


Freed & Borenstein Standards Track [Page 21]



RFC 2046 Media Types November 1996


The use of a media type of "multipart" in a body part within another
"multipart" entity is explicitly allowed. In such cases, for obvious
reasons, care must be taken to ensure that each nested "multipart"
entity uses a different boundary delimiter. See RFC 2049 for an
example of nested "multipart" entities.

The use of the "multipart" media type with only a single body part
may be useful in certain contexts, and is explicitly permitted.

NOTE: Experience has shown that a "multipart" media type with a
single body part is useful for sending non-text media types. It has
the advantage of providing the preamble as a place to include
decoding instructions. In addition, a number of SMTP gateways move
or remove the MIME headers, and a clever MIME decoder can take a good
guess at multipart boundaries even in the absence of the Content-Type
header and thereby successfully decode the message.

The only mandatory global parameter for the "multipart" media type is
the boundary parameter, which consists of 1 to 70 characters from a
set of characters known to be very robust through mail gateways, and
NOT ending with white space. (If a boundary delimiter line appears to
end with white space, the white space must be presumed to have been
added by a gateway, and must be deleted.) It is formally specified
by the following BNF:

boundary := 0*69<bchars> bcharsnospace

bchars := bcharsnospace / " "

bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
"+" / "_" / "," / "-" / "." /
"/" / ":" / "=" / "?"

Overall, the body of a "multipart" entity may be specified as
follows:

dash-boundary := "--" boundary
; boundary taken from the value of
; boundary parameter of the
; Content-Type field.

multipart-body := [preamble CRLF]
dash-boundary transport-padding CRLF
body-part *encapsulation
close-delimiter transport-padding
[CRLF epilogue]

Freed & Borenstein Standards Track [Page 22]



RFC 2046 Media Types November 1996


transport-padding := *LWSP-char
; Composers MUST NOT generate
; non-zero length transport
; padding, but receivers MUST
; be able to handle padding
; added by message transports.

encapsulation := delimiter transport-padding
CRLF body-part

delimiter := CRLF dash-boundary

close-delimiter := delimiter "--"

preamble := discard-text

epilogue := discard-text

discard-text := *(*text CRLF) *text
; May be ignored or discarded.

body-part := MIME-part-headers [CRLF *OCTET]
; Lines in a body-part must not start
; with the specified dash-boundary and
; the delimiter must not appear anywhere
; in the body part. Note that the
; semantics of a body-part differ from
; the semantics of a message, as
; described in the text.

OCTET := <any 0-255 octet value>

IMPORTANT: The free insertion of linear-white-space and RFC 822
comments between the elements shown in this BNF is NOT allowed since
this BNF does not specify a structured header field.

NOTE: In certain transport enclaves, RFC 822 restrictions such as
the one that limits bodies to printable US-ASCII characters may not
be in force. (That is, the transport domains may exist that resemble
standard Internet mail transport as specified in RFC 821 and assumed
by RFC 822, but without certain restrictions.) The relaxation of
these restrictions should be construed as locally extending the
definition of bodies, for example to include octets outside of the
US-ASCII range, as long as these extensions are supported by the
transport and adequately documented in the Content- Transfer-Encoding
header field. However, in no event are headers (either message
headers or body part headers) allowed to contain anything other than
US-ASCII characters.

Freed & Borenstein Standards Track [Page 23]



RFC 2046 Media Types November 1996


NOTE: Conspicuously missing from the "multipart" type is a notion of
structured, related body parts. It is recommended that those wishing
to provide more structured or integrated multipart messaging
facilities should define subtypes of multipart that are syntactically
identical but define relationships between the various parts. For
example, subtypes of multipart could be defined that include a
distinguished part which in turn is used to specify the relationships
between the other parts, probably referring to them by their
Content-ID field. Old implementations will not recognize the new
subtype if this approach is used, but will treat it as
multipart/mixed and will thus be able to show the user the parts that
are recognized.

5.1.2. Handling Nested Messages and Multiparts

The "message/rfc822" subtype defined in a subsequent section of this
document has no terminating condition other than running out of data.
Similarly, an improperly truncated "multipart" entity may not have
any terminating boundary marker, and can turn up operationally due to
mail system malfunctions.

It is essential that such entities be handled correctly when they are
themselves imbedded inside of another "multipart" structure. MIME
implementations are therefore required to recognize outer level
boundary markers at ANY level of inner nesting. It is not sufficient
to only check for the next expected marker or other terminating
condition.

5.1.3. Mixed Subtype

The "mixed" subtype of "multipart" is intended for use when the body
parts are independent and need to be bundled in a particular order.
Any "multipart" subtypes that an implementation does not recognize
must be treated as being of subtype "mixed".

5.1.4. Alternative Subtype

The "multipart/alternative" type is syntactically identical to
"multipart/mixed", but the semantics are different. In particular,
each of the body parts is an "alternative" version of the same
information.

Systems should recognize that the content of the various parts are
interchangeable. Systems should choose the "best" type based on the
local environment and references, in some cases even through user
interaction. As with "multipart/mixed", the order of body parts is
significant. In this case, the alternatives appear in an order of
increasing faithfulness to the original content. In general, the

Freed & Borenstein Standards Track [Page 24]



RFC 2046 Media Types November 1996


best choice is the LAST part of a type supported by the recipient
system's local environment.

"Multipart/alternative" may be used, for example, to send a message
in a fancy text format in such a way that it can easily be displayed
anywhere:

From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com>
Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=boundary42

--boundary42
Content-Type: text/plain; charset=us-ascii

... plain text version

Log in or register to write something here or to contact authors.