COMMENTATOR ✍️: A Code-mixed Multilingual Text Annotation Framework.

COMMENTATOR is a code-mixed annotation tool designed to enhance the quality and efficiency of annotating multilingual, code-mixed text. It reduces annotation time and operational overheads by providing advanced features tailored for code-mixed data. The tool offers intuitive interfaces, automated suggestions, and robust error-checking mechanisms.

🌟 Features

Modular Workflows: Supports Admin Workflow (user management, system configuration, progress monitoring) and Annotation Workflow (text annotation, history review).
User-Friendly Interface: Intuitive UI for annotators and admins.
Scalable Architecture: Built with ReactJS, Flask, and MongoDB for robust performance.
Extensibility: Easily extend to new language pairs, refer to the Configuration Changes file in the Documents folder.
Inter-Annotator Agreement Calculation: Supports agreement metrics such as Cohen’s Kappa (for two annotators) and Fleiss’ Kappa (for three annotators) to assess annotation consistency.
For more details, please refer to our paper (EMNLP 2024:demo).

Architecture: COMMENTATOR features two primary user workflows: Admin workflow and Annotation workflow. The Admin workflow manages user access, system configurations, and monitoring of annotation progress, while the Annotation workflow allows annotators to log in, annotate text, and review their annotation history. Both workflows are integrated into the tool’s modular architecture for efficient management and processing.

📁 Folder Structure

COMMENTATOR/
├── backend/
│   ├── app.py                # Flask application entry point
│   ├── requirements.txt      # Python dependencies
│   ├── Dockerfile            # Docker configuration
│   └── LID_tool/             # Language identification modules
├── frontend/
│   ├── src/
│   │   ├── Admin/            # Admin dashboard components
│   │   ├── Auth/             # Authentication components
│   │   ├── Components/       # Reusable UI components
│   │   ├── Edit/             # Annotation editing interface
│   │   ├── Home/             # LID page interface
│   │   ├── Matrix/           # Matrix analysis interface
│   │   ├── NER/              # NER interface 
│   │   ├── POS/              # POS tagging interface
│   │   ├── Translate/        # Translation interface 
│   │   ├── User/             # User management
│   │   ├── utils/            # Utility functions
│   │   └── Router.js         # Application routing
│   ├── public/               # Static assets
│   └── package.json          # Node.js dependencies
└── README.md

⚡ Quick Start

Prerequisites

Python 3.9.x or 3.10.x

Node.js 12+ Download Node.js

MongoDB

Docker (optional)

To get started with the project, follow these steps:

Clone the Repository:

git clone https://github.com/lingo-iitgn/commentator.git
cd commentator

Backend [ Local Server ]

Steps to Follow

a. Navigate inside backend folder

cd backend

b. Installing Dependencies

pip install -r requirements.txt

c. Updating Frontend URL

Open app.py in a code/text editor (Visual Studio Code, Sublime Text, Notepad etc)

frontend = YOUR_FRONTEND_HOST_URL
OR
frontend = http://localhost:3000

d. Set up MongoDB connection in app.py

Choose one of the following options depending on your setup:

For Local Setup: Ensure MongoDB is running locally, and set:

conn_str = YOUR_MONGODB_URL
OR
conn_str = "mongodb://127.0.0.1:27017/"

For Cloud MongoDB (Atlas): Create an account: 👉 https://cloud.mongodb.com/

Set up your Cluster, Database, and User.

Copy your connection string and replace credentials:

conn_str = "mongodb+srv://<username>:<password>@cluster0.stlpmgf.mongodb.net/?retryWrites=true&w=majority"

e. Set the database name:

If you want to Change or modify to a specific database, update this line in app.py:

database = client['sentences_EMNLP24']

Replace 'sentences_EMNLP24' with your preferred database name as needed.

f. API Configuration for the Translation task:

The backend supports three different AI providers for translation tasks. Configure the API keys based on your preferred provider:

Groq API Setup Get your Groq API key from https://console.groq.com/keys and add it to the .env file in the backend directory:

GROQ_API_KEY="YOUR_GROQ_API_KEY"

Model used: llama-3.3-70b-versatile (you can change this model in the code as needed)

OpenAI API Setup (Optional) Get your OpenAI API key from https://platform.openai.com/api-keys. Then, uncomment the OpenAI API integration code in app.py and set your API key:

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

Model used: gpt-4 (you can change this model in the code as needed)

Anthropic (Claude) API Setup (Optional) Get your Anthropic API key from https://docs.anthropic.com/en/api/admin-api/apikeys/get-api-key. Then, uncomment the Anthropic API integration code in app.py and set your API key:

os.environ["ANTHROPIC_API_KEY"] = "YOUR_ANTHROPIC_API_KEY"

Model used: claude-3-5-sonnet-20241022 (you can change this model in the code as needed)

Note: You can configure multiple API providers, but ensure at least one is properly set up for the translation functionality to work correctly.

g. Running the local server

python app.py
OR
python3 app.py

Frontend [ Local Server ]

Steps to Follow

a. Update frontend/src/.env with your backend URL:

REACT_APP_BACKEND_URL=http://<YOUR_BACKEND_IP_ADDRESS>:5000
OR
REACT_APP_BACKEND_URL=http://localhost:5000

b. Navigate inside frontend folder

cd frontend

c. Install all frontend dependencies post 1st application download.

npm install 
OR
npm install --legacy-peer-deps

Note: For the Translation task, install the react-transliterate library with:

npm install --save react-transliterate --legacy-peer-deps

d. Start the frontend local server.

npm start

🔐 Administrative Configuration

Steps to Follow

Start Frontend and Backend Servers
- Refer to the Frontend Setup section for frontend instructions.
- Refer to the Backend Setup section for backend setup.
Create an Admin Account
- Register a new account through the application’s interface.
Set Admin Privileges in MongoDB
- Access your MongoDB database.
- Locate the user document in the relevant collection.
- Update the user document to set admin: true to grant admin privileges for data management.
Log in to the Admin Dashboard
- Use the admin account credentials to access the dashboard.
Upload Sentences to the Database
- Use the admin dashboard to upload sentences via a .csv or .txt file.

🐳 Containerization of Backend using Docker

Steps to Follow

A. Creating a Docker Hub Account and a public repository

Visit https://hub.docker.com/

B. Updating Dockerfile

FROM python:3.9-slim-buster
WORKDIR /commentator
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
ENV FLASK_APP=app.py
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]
EXPOSE 5000/tcp

C. Push Image to Docker Hub

docker build . -t python-docker
docker tag python-docker <DOCKER_USERNAME>/<REPOSITORY_NAME>
docker push <DOCKER_USERNAME>/<REPOSITORY_NAME>

D. Run Docker server on port 5000

docker run -dp 5000:5000 <DOCKER_USERNAME>/<REPOSITORY_NAME>

E. List of active docker containers

docker ps

F. Stop Docker Container by Container ID.

docker stop <CONTAINER_ID>

⚙️ Troubleshooting (Common Issues)

Port Already in Use

bash# Kill process on port 5000
lsof -ti:5000 | xargs kill -9

1. MongoDB Connection Error
  
  Ensure MongoDB is running on the specified connection string.
  1. Start MongoDB:
    - Mac (Homebrew):
      brew services start mongodb-community
    - Linux:
      sudo systemctl start mongod
    - Windows:
      net start MongoDB
  2. Verify it's running (default port 27017):
```
netstat -an | grep 27017
```
  3. Check connection URI and credentials: Ensure it follows:
```
mongodb://<username>:<password>@<host>:<port>/<dbname>
```
1. Docker Build Failed
Check Docker daemon is running and Dockerfile syntax is correct.
1. Frontend Build Error
Delete the node_modules folder and reinstall:
```
 rm -rf node_modules && npm install --legacy-peer-deps
```
1. Backend Build Error
Ensure all dependencies in requirements.txt are correctly installed.

👩‍💻 Interfaces

COMMENTATOR provides both annotator interface for efficient and faster annotation and admin interface for result export and analysis. Once setup is done follow these steps:

Log in using the Demo Credentials

🔐 Admin Access

username: admin
password: admin

👤 Annotator

username: commentator
password: commentator
Sign up to create an account

📦 Database Schemas

Collection	Description
lid	Language Identification at Token level
matrix	Matrix based Identification of Sentences
pos	POS tags based Identification of Tokens
ner	Named Entity Recognition of Tokens
translate	Sentence-level Translation to Target Language
sentences	Sentences to be annotated
users	Admin & Annotator Accounts

🔗 Relevant Links

Paper Link: EMNLP 2024 Demo Paper

Explore the Project: Project Website

👥 Contributors

Meet the talented team behind the project!

Contributor	Name	Links
	Rajvee Sheth
	Shubh Nisar
	Heenaben Prajapati
	Himanshu Beniwal
	Mayank Singh

👀 Mentions

📄 License

This project is licensed under the Apache-2.0 license - see the LICENSE file for details.

Citation

If you use COMMENTATOR in your research or work, please cite it as follows:

@inproceedings{sheth-etal-2024-commentator,
    title = "Commentator: A Code-mixed Multilingual Text Annotation Framework",
    author = "Sheth, Rajvee  and
      Nisar, Shubh  and
      Prajapati, Heenaben  and
      Beniwal, Himanshu  and
      Singh, Mayank",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.11",
    pages = "101--109",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COMMENTATOR ✍️: A Code-mixed Multilingual Text Annotation Framework.

🌟 Features

📁 Folder Structure

⚡ Quick Start

Backend [ Local Server ]

Steps to Follow

The backend supports three different AI providers for translation tasks. Configure the API keys based on your preferred provider:

Frontend [ Local Server ]

Steps to Follow

🔐 Administrative Configuration

Steps to Follow

🐳 Containerization of Backend using Docker

Steps to Follow

⚙️ Troubleshooting (Common Issues)

👩‍💻 Interfaces

🔐 Admin Access

👤 Annotator

📦 Database Schemas

🔗 Relevant Links

👥 Contributors

👀 Mentions

📄 License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
Documents		Documents
backend		backend
frontend		frontend
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

lingo-iitgn/commentator

Folders and files

Latest commit

History

Repository files navigation

COMMENTATOR ✍️: A Code-mixed Multilingual Text Annotation Framework.

🌟 Features

📁 Folder Structure

⚡ Quick Start

Backend [ Local Server ]

Steps to Follow

The backend supports three different AI providers for translation tasks. Configure the API keys based on your preferred provider:

Frontend [ Local Server ]

Steps to Follow

🔐 Administrative Configuration

Steps to Follow

🐳 Containerization of Backend using Docker

Steps to Follow

⚙️ Troubleshooting (Common Issues)

👩‍💻 Interfaces

🔐 Admin Access

👤 Annotator

📦 Database Schemas

🔗 Relevant Links

👥 Contributors

👀 Mentions

📄 License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages