Text Processing and Tokenization with BKit

This project provides a Python script that processes text data from a CSV file. It applies various text transformations including tokenization, lemmatization, named entity recognition (NER), part-of-speech (POS) tagging, and digit cleaning. The processed data is saved to a new CSV file.

Requirements

pip install pandas bkit[all]

Usage

python main.py input_csv output_csv

Arguments:

input_csv: Path to the input CSV file containing the text column (text).
output_csv: Path to the output CSV file where the processed data will be saved.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Processing and Tokenization with BKit

Requirements

Usage

Arguments:

About

Uh oh!

Releases

Packages

Languages

giga-tech/sample-code-bkit

Folders and files

Latest commit

History

Repository files navigation

Text Processing and Tokenization with BKit

Requirements

Usage

Arguments:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages