Skip to content

Commit 4387ea1

Browse files
patched-adminpatched.codes[bot]
andauthored
Patched patchwork/steps/ReadEmail/README.md (#1330)
Co-authored-by: patched.codes[bot] <298395+patched.codes[bot]@users.noreply.github.com>
1 parent 3d24984 commit 4387ea1

File tree

1 file changed

+82
-0
lines changed

1 file changed

+82
-0
lines changed
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# ReadEmail Module Documentation
2+
3+
This documentation describes the ReadEmail module, which is structured to facilitate the reading and parsing of email files encoded in the `.eml` format. The module is divided across three main source files: `__init__.py`, `ReadEmail.py`, and `typed.py`. Below, we describe the functionalities of these components, including input and output specifications.
4+
5+
---
6+
7+
## 1. Overview
8+
9+
The ReadEmail module is designed to read `.eml` files, parse the email headers, body, and attachments, and return this data in a structured form. It's intended for applications where email processing and analysis are required, such as in email clients, document processing systems, or automated data extraction tasks.
10+
11+
---
12+
13+
## 2. Files
14+
15+
### 2.1. `__init__.py`
16+
17+
This file is an empty file created to make the directory a Python package. It contains no code or logic.
18+
19+
### 2.2. `ReadEmail.py`
20+
21+
This file contains the main logic of the ReadEmail module. Below is a breakdown of its components.
22+
23+
#### 2.2.1. Classes
24+
25+
- **ParsedHeader**: Represents the parsed email header with fields such as `subject`, `from`, `to`, and `date`.
26+
- **ParsedBody**: Represents the content and content type of the email body.
27+
- **AttachmentHeader**: Contains fields related to the attachment's header such as content disposition and encoding.
28+
- **ParsedAttachment**: Represents an attachment with fields including `filename`, `raw` content, and `content_header`.
29+
- **ParsedEmail**: Combines all parsed data (header, body, attachment).
30+
- **ReadEmail**: Inherits from the `Step` class and contains the core functionality for running the email parsing task.
31+
32+
#### 2.2.2. Key Methods
33+
34+
- **`__decode`**: Decodes the email content based on its encoding, e.g., `base64` or `quoted-printable`.
35+
- **`run`**: Executes the parsing process using `EmlParser`, processes email data, attaches parsed data, and decodes attachments.
36+
37+
### 2.3. `typed.py`
38+
39+
This file defines the input and output types used by the `ReadEmail` class.
40+
41+
#### 2.3.1. Inputs
42+
43+
- `eml_file_path`: Path to the `.eml` email file to be processed.
44+
- `base_path`: (Optional) Base directory path where attachments should be saved.
45+
46+
#### 2.3.2. Outputs
47+
48+
- `subject`: Email subject.
49+
- `datetime`: Date and time the email was sent.
50+
- `from_`: Sender's email.
51+
- `body`: Full body text of the email.
52+
- `message_id`: Unique message identifier.
53+
- `attachments`: List of attachments with file paths.
54+
55+
---
56+
57+
## 3. Usage
58+
59+
To utilize the ReadEmail module effectively, instantiate the `ReadEmail` class with the required inputs and execute the `run` method. The `run` method will parse the email and return a dictionary containing the parsed data.
60+
61+
### Example
62+
63+
```python
64+
inputs = {
65+
"eml_file_path": "path/to/email.eml",
66+
"base_path": "path/to/save/attachments"
67+
}
68+
69+
read_email = ReadEmail(inputs)
70+
parsed_data = read_email.run()
71+
72+
print(parsed_data["subject"])
73+
print(parsed_data["body"])
74+
for attachment in parsed_data["attachments"]:
75+
print(attachment["path"])
76+
```
77+
78+
This example will print out the email subject, body, and paths to any attachments extracted from the specified `.eml` file.
79+
80+
---
81+
82+
By following this guide, users can effectively integrate and utilize the ReadEmail module for processing and extracting information from `.eml` files in their applications.

0 commit comments

Comments
 (0)