Assignment Task
Email continues to be an essential component of conducting business. This course-long project will introduce you to email forensics by having you write a tool that can analyze the Enron email data set. For instance, given a term to search, it will return the sender’s email address and the date and time the message was sent by using the From: and Date: headers, respectively. For this assignment, you will be allowed to work on teams of two or three students.
1. Write a program that satisfies the description above and conforms to the following usage specifications:
A word to search for in the data set. The search will be case-insensitive, but exact, meaning neither fuzzy matching nor partial matching is performed. When more than one term is given, only emails with ALL terms in the body will be returned.
Your program should ignore duplicate terms and term order, so that the following are equivalent:
The exclusion of fuzzy matching means that the term cash will not match the string money, although they are semantically similar. Exact matching (no partial matching) means the will not match the string them.
For each email with a message body (payload) that matches all the terms given by the user, you should capture and output the sender (using the From: header field) and the date the email was sent (using the Date: header field). Your program should number the results and display the total number of results found when the search completes. It is totally fine for your program to output both the sender’s email address and the date/time sent in the same formats as they are stored in the email headers.
2. Implement a new command to obtain all the emails sent and received by a given person.
Where last_name and first_name belong to the person you are trying to obtain the email addresses from. As a result, you will list the information of each obtained address as follows.
3. Implement a new command to obtain all the emails exchanged by two people, regardless of who initiated the communication in the first place.
Where address and address identify the two people interacting with each other.