Skip to content

Add Support for Ingesting Unstructured Data (Hacktoberfest 2025) #107

@Boburmirzo

Description

@Boburmirzo

🧩 Add Support for Ingesting Unstructured Data (Hacktoberfest 2025)

🎯 Goal

Enable ingestion from unstructured sources such as raw text, documents (PDF, DOCX), and URLs.
This will allow users to feed varied data formats directly into Memori’s memory system.


📋 Description

Currently, Memori supports structured data ingestion only.
To make the memory engine more versatile, we need to add a module that can handle unstructured inputs
including plain text, document files, and website content.

Contributors can help by implementing or improving one or more of the following tasks:

  1. Add a function to extract text from URLs (using requests or BeautifulSoup).
  2. Add a parser for text-based documents (PDF/DOCX/TXT).
  3. Normalize extracted text and feed it into the existing memory ingestion pipeline.
  4. Write unit tests to validate ingestion and parsing results.

✅ Acceptance Criteria

  • Support for at least two unstructured data sources (e.g., PDF and URL).
  • Code is clean, modular, and follows project structure and linting rules.
  • Includes unit tests with valid input/output examples.
  • Documentation updated (README or /docs section).

💡 Tech Notes

  • Language: Python
  • Recommended libraries: requests, beautifulsoup4, pypdf
  • Design goal: Keep the ingestion process modular for future extensions (e.g., images, audio).

🤝 Hacktoberfest Details

  • Labels: hacktoberfest, bug, help wanted
  • This issue is part of Hacktoberfest 2025 — valid pull requests will be merged or labeled hacktoberfest-accepted.
  • Please review the CONTRIBUTING.md for contribution guidelines.
  • Follow our Code of Conduct to maintain a positive and inclusive environment.

⭐ Don’t forget to star the repo on GitHub. It really helps our community grow!

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions