Feat: Add PDF Decryption Support for Password-Protected Files #2296
+67
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feat: Add PDF Decryption Support for Password-Protected Files
Summary
This PR adds support for processing password-protected PDF files in the document processing pipeline. Users can now decrypt and process encrypted PDFs by setting a password in the environment configuration.
Motivation
Previously, the system would fail to process encrypted PDF files without any clear indication of what went wrong. This enhancement allows users to work with password-protected documents, which is common in enterprise and academic environments where sensitive documents are often encrypted.
Changes Made
1. Configuration Management (
lightrag/api/config.py)pdf_decrypt_passwordparameter toglobal_argsPDF_DECRYPT_PASSWORDenvironment variableNoneif not set2. Document Processing (
lightrag/api/routers/document_routes.py)pipeline_enqueue_filefunction to detect encrypted PDFsdecrypt()methodPDF_DECRYPT_PASSWORD3. Documentation (
env.example)PDF_DECRYPT_PASSWORDconfiguration exampleUsage
Configuration
Add to your
.envfile:Behavior
Error Messages
All error messages are user-friendly and appear in English:
"PDF is encrypted but no password provided - Please set PDF_DECRYPT_PASSWORD environment variable""Failed to decrypt PDF - incorrect password - The provided PDF_DECRYPT_PASSWORD is incorrect for this file""PDF decryption failed - Error during PDF decryption: [details]"Technical Notes
global_args.pdf_decrypt_passwordfor consistency with other configurationTesting Recommendations
Checklist
env.exampleglobal_argspattern