Discussion:
Reading Outlook .msg file using Python
(too old to reply)
Tim Golden
2010-10-11 10:56:01 UTC
Permalink
I have a need to read .msg files exported from Outlook. Google search
came out with a few very old posts about the topic but nothing really
useful. The email module in Python is no help - everything comes back
blank and it can't even see if there are attachments. Did find a Java
library to do the job and I suppose when push come to shove, I would
have to learn Jython and see if it can invoke the Java library. But
before I do that, can somebody point me to a Python/COM solution?
I don't need to gain access to Exchange or Outlook. I just need to
read the .msg file and extract information + attachments from it.
.msg files are Compound Documents -- a file format which obviously
seemed like a jolly good idea at the time, but which frustrates
me every time I have to do anything with it :)

Hopefully this code snippet will get you going. The idea is to open
the compound document using the Structured Storage API. That gives
you an IStorage-ish object which you can then convert to an IMessage-ish
object with the convenience function OpenIMsgOnIStg. At that point you
enter the marvellous world of Extended MAPI. The get_body_from_stream
function does a Q&D job of pulling the body text out. You can get
attachments as well: look at the PyIMessage docs, but come back if
you need help with that:

<code>
import os, sys

from win32com.mapi import mapi, mapitags
from win32com.shell import shell, shellcon
from win32com.storagecon import *
import pythoncom

def get_body_from_stream (message):
CHUNK_SIZE = 10000
stream = message.OpenProperty (mapitags.PR_BODY,
pythoncom.IID_IStream, 0, 0)
text = ""
while True:
bytes = stream.read (CHUNK_SIZE)
if bytes:
text += bytes
else:
break
return text.decode ("utf16")

def main (filepath):
mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))
storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE
storage = pythoncom.StgOpenStorage (filepath, None, storage_flags,
None, 0)
mapi_session = mapi.OpenIMsgSession ()
message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)
print get_body_from_stream (message)

if __name__ == '__main__':
main (*sys.argv[1:])

</code>

TJG
Tim Golden
2010-10-11 15:54:53 UTC
Permalink
Post by Tim Golden
mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))
Either there is no default mail client or the current mail client
cannot fulfill the messsage requrest. Please run Microsoft
Outlook ... client.
I have Outlook (not Express - Outlook 2002) running and I did set it
to be the default mail client. Does MAPI works with Exchange only?
No. I was running it with Outlook 2003 installed (not running,
in fact, although it is the default mail client on that machine).
(And why do I need MAPI to read the file?)
Basically because the MAPI subsystem already contains all
the code to interpret that particular format of structured
storage. If you can find some source of info which tells
you how to parse the format directly, then you can sidestep
MAPI. Presumably this is what is done by the Java code you
mentioned.

I'm afraid I'm not at work at the moment, and I don't run
Outlook on this machine. (So I can't even save an .msg file
to test). FWIW the code did run successfully on my work
machine and produced the plain text of the email, so it
is just a configuration sort of issue. If no-one chips
in with a suggestion in a few hours, might be worth
posting to python-win32; there might be people there who
don't watch this (higher-traffic) list.

I have a vague memory that when I set this kind of thing
up to run on our Helpdesk server where I use this to
ingest incoming emails I did have to install a sort
of server-only alternative to Outlook. I'll try to remote
into the server later to see if I can spot it. But that
still wouldn't explain what the problem was if you were
actually running Outlook in any case.

TJG
Tim Golden
2010-10-12 17:31:10 UTC
Permalink
http://support.microsoft.com/kb/813745
I need to reset my Outlook registry keys. Unfortunately, I don't have
my Office Install CD with me. This would have to wait.
Thanks for the information; I'm keen to see if you're able
to use the solution I posted once this fix is in place.

TJG
Tim Golden
2010-10-17 11:45:19 UTC
Permalink
Post by Tim Golden
http://support.microsoft.com/kb/813745
I need to reset my Outlook registry keys. Unfortunately, I don't have
my Office Install CD with me. This would have to wait.
Thanks for the information; I'm keen to see if you're able
to use the solution I posted once this fix is in place.
TJG
Okay, after fixing the Outlook reg entries as described above, I am
message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)
pywintypes.com_error: (-2147221242, 'OLE error 0x80040106', None,
None)
Strange. That's UNKNOWN_FLAGS. Try the call without the MAPI_UNICODE,
ie make the last param zero. Maybe there's something with Outlook 2002...
I've never tried it myself.

TJG
Tim Golden
2010-10-20 08:41:09 UTC
Permalink
Looks like this flag is valid only if you are getting messages
directly from Outlook. When reading the msg file, the flag is
invalid.
Same issue when accessing attachments. In addition, the MAPITable
method does not seem to work at all when trying to get attachments out
of the msg file (works when dealing with message in an Outlook
mailbox). Eitherway, the display_name doesn't work when trying to
display the filename of the attachment.
I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
mapitags
Ah, thanks. As you will have realised, my code is basically geared
to reading an Outlook/Exchange message box. I hadn't really tried
it on individual message files, except my original excerpt. If it
were opportune, I'd be interested in seeing your working code.

TJG
Tim Golden
2010-10-21 08:48:54 UTC
Permalink
Only just noticed this thread, and had something similar. I took the
following approach:-
(I'm thinking this might be relevant as you mentioned checking whether
your client's Outlook could export .EML directly, which indicates (to
me at least) that you have some control over that...)
- Set up an IMAP email server on a machine (in this case linux and
dovecot)
- Got client to set up a new account in Outlook for the new server
- Got client to use the Outlook interface to copy relevant emails (or
the whole lot) to new server
- Used the standard imaplib and related modules to do what was needed
Nice lateral approach. It would also be possible to do this same
kind of thing via the native Microsoft toolset alone if the OP
has access to the appropriate Outlook / Exchange accounts. (Indeed,
Exchange itself can act as an IMAP server which might be another
approach). I confess I was starting from the original "Can I read an
.msg file?" question.

TJG

Loading...