MIME::Tools::traps - pitfalls and gotchas for users of MIME-tools
SYNOPSIS
This is part of the MIME-tools documentation.
See MIME::Tools for the full table of contents.
DESCRIPTION
Things in MIME-tools to beware of...
Fuzzing of CRLF and newline on input
RFC-1521 dictates that MIME streams have lines terminated by CRLF
("\r\n"). However, it is extremely likely that folks will want to
parse MIME streams where each line ends in the local newline
character "\n" instead.
An attempt has been made to allow the parser to handle both CRLF
and newline-terminated input.
Fuzzing of CRLF and newline when decoding
The "7bit" and "8bit" decoders will decode both
a "\n" and a "\r\n" end-of-line sequence into a "\n".
The "binary" decoder (default if no encoding specified)
still outputs stuff verbatim... so a MIME message with CRLFs
and no explicit encoding will be output as a text file
that, on many systems, will have an annoying ^M at the end of
each line... but this is as it should be.
Fuzzing of CRLF and newline when encoding/composing
All encoders currently output the end-of-line sequence as a "\n",
with the assumption that the local mail agent will perform
the conversion from newline to CRLF when sending the mail.
However, there probably should be an option to output CRLF as per RFC-1521.
Inability to handle multipart boundaries with embedded newlines
Let's get something straight: this is an evil, EVIL practice.
If your mailer creates multipart boundary strings that contain
newlines, give it two weeks notice and find another one. If your
mail robot receives MIME mail like this, regard it as syntactically
incorrect, which it is.
Ignoring non-header headers
People like to hand the parser raw messages straight from
POP3 or from a mailbox. There is often predictable non-header
information in front of the real headers; e.g., the initial
``From'' line in the following message:
From - Wed Mar 22 02:13:18 2000
Return-Path: <eryq@zeegee.com>
Subject: Hello
The parser simply ignores such stuff quietly. Perhaps it
shouldn't, but most people seem to want that behavior.
Fuzzing of empty multipart preambles
Please note that there is currently an ambiguity in the way
preambles are parsed in. The following message fragments both
are regarded as having an empty preamble (where "\n" indicates a
newline character):
Content-type: multipart/mixed; boundary="xyz"\n
Subject: This message (#1) has an empty preamble\n
\n
--xyz\n
...
Content-type: multipart/mixed; boundary="xyz"\n
Subject: This message (#2) also has an empty preamble\n
\n
\n
--xyz\n
...
In both cases, the first completely-empty line (after the ``Subject'')
marks the end of the header.
But we should clearly ignore the second empty line in message #2,
since it fills the role of ``the newline which is only there to make
sure that the boundary is at the beginning of a line''.
Such newlines are never part of the content preceding the boundary;
thus, there is no preamble ``content'' in message #2.
However, it seems clear that message #1 also has no preamble
``content'', and is in fact merely a compact representation of an
empty preamble.
Use of a temp file during parsing
Why not do everything in core?
Although the amount of core available on even a modest home
system continues to grow, the size of attachments continues
to grow with it. I wanted to make sure that even users with small
systems could deal with decoding multi-megabyte sounds and movie files.
That means not being core-bound.
As of the released 5.3xx, MIME::Parser gets by with only
one temp file open per parser. This temp file provides
a sort of infinite scratch space for dealing with the current
message part. It's fast and lightweight, but you should know
about it anyway.
Why do I assume that MIME objects are email objects?
Achim Bohnet once pointed out that MIME headers do nothing more than
store a collection of attributes, and thus could be represented as
objects which don't inherit from Mail::Header.
I agree in principle, but RFC-1521 says otherwise.
RFC-1521 [MIME] headers are a syntactic subset of RFC-822 [email] headers.
Perhaps a better name for these modules would have been RFC1521::
instead of MIME::, but we're a little beyond that stage now.
When I originally wrote these modules for the CPAN, I agonized for a long
time about whether or not they really should subclass from Mail::Internet
(then at version 1.17). Thanks to Graham Barr, who graciously evolved
MailTools 1.06 to be more MIME-friendly, unification was achieved
at MIME-tools release 2.0.
The benefits in reuse alone have been substantial.
You can't print exactly what you parsed!
Parsing is a (slightly) lossy operation.
Because of things like ambiguities in base64-encoding, the following
is not going to spit out its input unchanged in all cases:
If you're using MIME::Tools to process email, remember to save
the data you parse if you want to send it on unchanged.
This is vital for things like PGP-signed email.
(Sing it with me, kids: you can't / always print / what you paaaarsed...)
SEE ALSO
See ``SYNOPSIS'' in MIME::Tools for the full table of contents.