Tim Golden
2010-10-11 10:56:01 UTC
I have a need to read .msg files exported from Outlook. Google search
came out with a few very old posts about the topic but nothing really
useful. The email module in Python is no help - everything comes back
blank and it can't even see if there are attachments. Did find a Java
library to do the job and I suppose when push come to shove, I would
have to learn Jython and see if it can invoke the Java library. But
before I do that, can somebody point me to a Python/COM solution?
I don't need to gain access to Exchange or Outlook. I just need to
read the .msg file and extract information + attachments from it.
.msg files are Compound Documents -- a file format which obviouslycame out with a few very old posts about the topic but nothing really
useful. The email module in Python is no help - everything comes back
blank and it can't even see if there are attachments. Did find a Java
library to do the job and I suppose when push come to shove, I would
have to learn Jython and see if it can invoke the Java library. But
before I do that, can somebody point me to a Python/COM solution?
I don't need to gain access to Exchange or Outlook. I just need to
read the .msg file and extract information + attachments from it.
seemed like a jolly good idea at the time, but which frustrates
me every time I have to do anything with it :)
Hopefully this code snippet will get you going. The idea is to open
the compound document using the Structured Storage API. That gives
you an IStorage-ish object which you can then convert to an IMessage-ish
object with the convenience function OpenIMsgOnIStg. At that point you
enter the marvellous world of Extended MAPI. The get_body_from_stream
function does a Q&D job of pulling the body text out. You can get
attachments as well: look at the PyIMessage docs, but come back if
you need help with that:
<code>
import os, sys
from win32com.mapi import mapi, mapitags
from win32com.shell import shell, shellcon
from win32com.storagecon import *
import pythoncom
def get_body_from_stream (message):
CHUNK_SIZE = 10000
stream = message.OpenProperty (mapitags.PR_BODY,
pythoncom.IID_IStream, 0, 0)
text = ""
while True:
bytes = stream.read (CHUNK_SIZE)
if bytes:
text += bytes
else:
break
return text.decode ("utf16")
def main (filepath):
mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))
storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE
storage = pythoncom.StgOpenStorage (filepath, None, storage_flags,
None, 0)
mapi_session = mapi.OpenIMsgSession ()
message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)
print get_body_from_stream (message)
if __name__ == '__main__':
main (*sys.argv[1:])
</code>
TJG