|
| 1 | +# ReadEmail Module Documentation |
| 2 | + |
| 3 | +This documentation describes the ReadEmail module, which is structured to facilitate the reading and parsing of email files encoded in the `.eml` format. The module is divided across three main source files: `__init__.py`, `ReadEmail.py`, and `typed.py`. Below, we describe the functionalities of these components, including input and output specifications. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 1. Overview |
| 8 | + |
| 9 | +The ReadEmail module is designed to read `.eml` files, parse the email headers, body, and attachments, and return this data in a structured form. It's intended for applications where email processing and analysis are required, such as in email clients, document processing systems, or automated data extraction tasks. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## 2. Files |
| 14 | + |
| 15 | +### 2.1. `__init__.py` |
| 16 | + |
| 17 | +This file is an empty file created to make the directory a Python package. It contains no code or logic. |
| 18 | + |
| 19 | +### 2.2. `ReadEmail.py` |
| 20 | + |
| 21 | +This file contains the main logic of the ReadEmail module. Below is a breakdown of its components. |
| 22 | + |
| 23 | +#### 2.2.1. Classes |
| 24 | + |
| 25 | +- **ParsedHeader**: Represents the parsed email header with fields such as `subject`, `from`, `to`, and `date`. |
| 26 | +- **ParsedBody**: Represents the content and content type of the email body. |
| 27 | +- **AttachmentHeader**: Contains fields related to the attachment's header such as content disposition and encoding. |
| 28 | +- **ParsedAttachment**: Represents an attachment with fields including `filename`, `raw` content, and `content_header`. |
| 29 | +- **ParsedEmail**: Combines all parsed data (header, body, attachment). |
| 30 | +- **ReadEmail**: Inherits from the `Step` class and contains the core functionality for running the email parsing task. |
| 31 | + |
| 32 | +#### 2.2.2. Key Methods |
| 33 | + |
| 34 | +- **`__decode`**: Decodes the email content based on its encoding, e.g., `base64` or `quoted-printable`. |
| 35 | +- **`run`**: Executes the parsing process using `EmlParser`, processes email data, attaches parsed data, and decodes attachments. |
| 36 | + |
| 37 | +### 2.3. `typed.py` |
| 38 | + |
| 39 | +This file defines the input and output types used by the `ReadEmail` class. |
| 40 | + |
| 41 | +#### 2.3.1. Inputs |
| 42 | + |
| 43 | +- `eml_file_path`: Path to the `.eml` email file to be processed. |
| 44 | +- `base_path`: (Optional) Base directory path where attachments should be saved. |
| 45 | + |
| 46 | +#### 2.3.2. Outputs |
| 47 | + |
| 48 | +- `subject`: Email subject. |
| 49 | +- `datetime`: Date and time the email was sent. |
| 50 | +- `from_`: Sender's email. |
| 51 | +- `body`: Full body text of the email. |
| 52 | +- `message_id`: Unique message identifier. |
| 53 | +- `attachments`: List of attachments with file paths. |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +## 3. Usage |
| 58 | + |
| 59 | +To utilize the ReadEmail module effectively, instantiate the `ReadEmail` class with the required inputs and execute the `run` method. The `run` method will parse the email and return a dictionary containing the parsed data. |
| 60 | + |
| 61 | +### Example |
| 62 | + |
| 63 | +```python |
| 64 | +inputs = { |
| 65 | + "eml_file_path": "path/to/email.eml", |
| 66 | + "base_path": "path/to/save/attachments" |
| 67 | +} |
| 68 | + |
| 69 | +read_email = ReadEmail(inputs) |
| 70 | +parsed_data = read_email.run() |
| 71 | + |
| 72 | +print(parsed_data["subject"]) |
| 73 | +print(parsed_data["body"]) |
| 74 | +for attachment in parsed_data["attachments"]: |
| 75 | + print(attachment["path"]) |
| 76 | +``` |
| 77 | + |
| 78 | +This example will print out the email subject, body, and paths to any attachments extracted from the specified `.eml` file. |
| 79 | + |
| 80 | +--- |
| 81 | + |
| 82 | +By following this guide, users can effectively integrate and utilize the ReadEmail module for processing and extracting information from `.eml` files in their applications. |
0 commit comments