Skip to content

Conversation

@tmccaughey
Copy link
Contributor

Hi Falak. I have fixed the chunking size, it is not exactly identical to the one in the manual upload but a lot better. Thanks for your time! :)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an S3 knowledge connector feature to the AWS extension, enabling extraction and processing of documents from S3 buckets into knowledge sources. It also updates deprecated APIs and dependencies.

Key changes:

  • Implements S3 connector with support for multiple file types (txt, pdf, docx, csv, json, jsonl, md, pptx)
  • Integrates LangChain for document loading and text chunking with configurable chunk sizes
  • Updates deprecated Buffer constructor and context API methods

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
extensions/aws/tsconfig.json Added compiler options for better module compatibility
extensions/aws/src/nodes/lambdaInvoke.ts Fixed deprecated Buffer constructor and updated context API call
extensions/aws/src/module.ts Registered S3 connector and commented out existing nodes
extensions/aws/src/knowledge-connectors/s3Connector.ts Main connector implementation for processing S3 files into knowledge sources
extensions/aws/src/knowledge-connectors/helpers/text_extractor.ts Document loader using LangChain for various file types
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts Text splitting logic with configurable chunk sizes
extensions/aws/src/knowledge-connectors/helpers/new_utils.ts S3 file download and chunk extraction utilities
extensions/aws/src/knowledge-connectors/helpers/list_files.ts S3 bucket listing functionality
extensions/aws/src/knowledge-connectors/helpers/utils/*.ts Utility functions for text processing and configuration
extensions/aws/src/knowledge-connectors/helpers/creds.env Environment variable template for AWS credentials
extensions/aws/package.json Updated dependencies to support LangChain and newer AWS SDK
extensions/aws/.npmrc Added legacy peer deps flag for dependency resolution

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +13 to +16
// sayPollyNode,
// lambdaInvokeNode,
// s3GetObjectNode,
// s3PutObjectNode
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing nodes have been commented out, leaving the nodes array empty. This breaks existing functionality for users of this extension. If these nodes are being temporarily disabled for testing, consider using a feature flag instead, or create a separate branch for the S3 connector work that doesn't disable production features.

Suggested change
// sayPollyNode,
// lambdaInvokeNode,
// s3GetObjectNode,
// s3PutObjectNode
sayPollyNode,
lambdaInvokeNode,
s3GetObjectNode,
s3PutObjectNode

Copilot uses AI. Check for mistakes.
};

/**
* Custom PPTXLoader class to handle pptx files. Impelementation adapted
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Impelementation' to 'Implementation'.

Suggested change
* Custom PPTXLoader class to handle pptx files. Impelementation adapted
* Custom PPTXLoader class to handle pptx files. Implementation adapted

Copilot uses AI. Check for mistakes.
return chunkSize;
};

const getRecursiveCharacterTextSplitter = () => {
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation - this function definition has an extra tab character at the beginning of the line. The indentation should align with other top-level function definitions in the file.

Suggested change
const getRecursiveCharacterTextSplitter = () => {
const getRecursiveCharacterTextSplitter = () => {

Copilot uses AI. Check for mistakes.
});
return splitter;
};
export const splitTextUsingRecursiveCharacterTextSplitter = (text: string) => {
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation - this exported function has an extra tab character at the beginning of the line. The indentation should align with other top-level exports in the file.

Suggested change
export const splitTextUsingRecursiveCharacterTextSplitter = (text: string) => {
export const splitTextUsingRecursiveCharacterTextSplitter = (text: string) => {

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +4
import { get } from "http";


Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'get' import from 'http' is unused in this file and should be removed.

Suggested change
import { get } from "http";

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,67 @@
import { S3Client, ListObjectsV2Command, S3 } from "@aws-sdk/client-s3";
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'S3' import from '@aws-sdk/client-s3' is unused in this file and should be removed.

Suggested change
import { S3Client, ListObjectsV2Command, S3 } from "@aws-sdk/client-s3";
import { S3Client, ListObjectsV2Command } from "@aws-sdk/client-s3";

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +4
AWS_ACCESS_KEY_ID=your_access_key_here
AWS_SECRET_ACCESS_KEY=your_secret_key_here
AWS_REGION=eu-central-1
BUCKET_NAME=your_bucket_name_here No newline at end of file
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment credential files should not be committed to the repository. This file should be added to .gitignore and replaced with a .env.example or .env.template file to document required variables without exposing placeholder values that could be mistaken for real credentials.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants