Get-Email - Inconsistent results (number of files downloaded vs LogFile.txt)
Opened this issue · 3 comments
Good morning again!
Writing this issue for a second time since when I clicked "Submit" the first time, it redirected me to the login screen (Github) and erased everything.
So I used the Get-Email cmdlet against a list (text file) of 146 Internet Message Ids. Made sure that this list was unique by throwing it in Excel and using "Remove Duplicates".
Luckily for me, and my client, MES was able to download 145 of the 146 emails associated with the message ids! Woohoo!
However, when I started looking at the LogFile.txt for what Internet Message Id could not be found, I realized that there are actually 146 lines for downloaded files (emails). When I extracted the filenames and threw them in Excel + "Remove Duplicates", there was indeed a duplicate: a file/email that shows up twice. So it was "GET" twice by Get-Email by there was only one copy in the output folder (since it was overwritten).
Now, I'm wondering what's happening here. So I'll try to find the line of code used to set the filename and see if I can append the Internet Message Id to it. Am I dealing with a double download of the same Internet Message Id? Two different Internet Message Ids that produced a file with the same name? No idea. I'll report back with my findings and we can see if there's a way to address this scenario?
I ran the Get-Email command twice to make sure that the first time wasn't just a "fluke", it wasn't. Still downloaded 145 emails with the same email being downloaded twice.
Update!
So I wasn't able to modify the script to make it append the Internet Message Id to the output file name for some reason. Even though I reload the module with a clean pwsh.exe process, it just keeps using the "old" version. Oh well.
Despite this, I still managed to confirm that two unique Internet Message Id produced a file with the same name, which led to that issue. The easiest way to solve this I'm guessing is to add the Internet Message Id in the output filename. Which should make all resulting output filenames "unique", even in these kind of situation.
Let me know if you want me to test anything else!
Hi, I think the issue here is that two different Internet Message IDs are pointing to the same email. The current filename is based only on $ReceivedDateTime-$subject
without including the unique Message ID. We previously used the Message ID, but received feedback that it made it hard to identify which email was which, so we switched to using the subject.
In the next update, I'll improve the script by creating a processedMessages
hashtable to keep track of what messages have been processed, along with an array to handle cases where the same email is referenced by different Internet Message IDs.
I need to think how to handle filenames since using the subject + Internet Message ID
might be too long, especially with lengthy subjects. It might be possible to use only part of the subject or the first 8 characters of the ID. Or use the current setup but just use a prefix.
I don't know how frequent that situation is, but just creating if a file with the name already exists in the output directory and appending an incrementing number if it does (e.g.: 2) or any other identifier could work.
I think that as long as this behavior is documented, like, why do I have $SUBJECT.eml and $SUBJECT-2.eml then it should be easily understandable.
I would have to check if in my case, the first 8 characters of the IDs are the same or not. If they are, it would create yet another filename collision.
Edit: Sorry for the reply delay ... saw the email, wanted to reply later on and then poof, goldfish memory hit.