Add Support for Ingesting Unstructured Data (Hacktoberfest 2025)

## 🧩 Add Support for Ingesting Unstructured Data (Hacktoberfest 2025)

### 🎯 Goal
Enable ingestion from **unstructured sources** such as raw text, documents (PDF, DOCX), and URLs.  
This will allow users to feed varied data formats directly into Memori’s memory system.

---

### 📋 Description
Currently, Memori supports structured data ingestion only.  
To make the memory engine more versatile, we need to add a module that can handle **unstructured inputs** —  
including plain text, document files, and website content.

Contributors can help by implementing or improving one or more of the following tasks:

1. Add a function to extract text from URLs (using `requests` or `BeautifulSoup`).
2. Add a parser for text-based documents (PDF/DOCX/TXT).
3. Normalize extracted text and feed it into the existing memory ingestion pipeline.
4. Write unit tests to validate ingestion and parsing results.

---

### ✅ Acceptance Criteria
- [ ] Support for at least **two** unstructured data sources (e.g., PDF and URL).  
- [ ] Code is clean, modular, and follows project structure and linting rules.  
- [ ] Includes unit tests with valid input/output examples.  
- [ ] Documentation updated (README or `/docs` section).  

---

### 💡 Tech Notes
- **Language:** Python  
- **Recommended libraries:** `requests`, `beautifulsoup4`, `pypdf`  
- **Design goal:** Keep the ingestion process modular for future extensions (e.g., images, audio).  

---

🤝 Hacktoberfest Details

- Labels: hacktoberfest, bug, help wanted
- This issue is part of Hacktoberfest 2025 — valid pull requests will be merged or labeled hacktoberfest-accepted.
- Please review the [CONTRIBUTING.md](https://github.com/GibsonAI/memori/blob/main/CONTRIBUTING.md) for contribution guidelines.
- Follow our [Code of Conduct](https://github.com/GibsonAI/memori/blob/main/CODE_OF_CONDUCT.md) to maintain a positive and inclusive environment.

⭐ Don’t forget to star the repo on GitHub. It really helps our community grow!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Support for Ingesting Unstructured Data (Hacktoberfest 2025) #107

🧩 Add Support for Ingesting Unstructured Data (Hacktoberfest 2025)

🎯 Goal

📋 Description

✅ Acceptance Criteria

💡 Tech Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Support for Ingesting Unstructured Data (Hacktoberfest 2025) #107

Description

🧩 Add Support for Ingesting Unstructured Data (Hacktoberfest 2025)

🎯 Goal

📋 Description

✅ Acceptance Criteria

💡 Tech Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions