COMMENTATOR is a code-mixed annotation tool designed to enhance the quality and efficiency of annotating multilingual, code-mixed text. It reduces annotation time and operational overheads by providing advanced features tailored for code-mixed data. The tool offers intuitive interfaces, automated suggestions, and robust error-checking mechanisms.
-
Modular Workflows: Supports Admin Workflow (user management, system configuration, progress monitoring) and Annotation Workflow (text annotation, history review).
-
User-Friendly Interface: Intuitive UI for annotators and admins.
-
Scalable Architecture: Built with ReactJS, Flask, and MongoDB for robust performance.
-
Extensibility: Easily extend to new language pairs, refer to the
Configuration Changes
file in the Documents folder. -
Inter-Annotator Agreement Calculation: Supports agreement metrics such as Cohenβs Kappa (for two annotators) and Fleissβ Kappa (for three annotators) to assess annotation consistency.
-
For more details, please refer to
our paper (EMNLP 2024:demo)
.

Architecture: COMMENTATOR features two primary user workflows: Admin workflow and Annotation workflow. The Admin workflow manages user access, system configurations, and monitoring of annotation progress, while the Annotation workflow allows annotators to log in, annotate text, and review their annotation history. Both workflows are integrated into the toolβs modular architecture for efficient management and processing.
COMMENTATOR/
βββ backend/
β βββ app.py # Flask application entry point
β βββ requirements.txt # Python dependencies
β βββ Dockerfile # Docker configuration
β βββ LID_tool/ # Language identification modules
βββ frontend/
β βββ src/
β β βββ Admin/ # Admin dashboard components
β β βββ Auth/ # Authentication components
β β βββ Components/ # Reusable UI components
β β βββ Edit/ # Annotation editing interface
β β βββ Home/ # LID page interface
β β βββ Matrix/ # Matrix analysis interface
β β βββ NER/ # NER interface
β β βββ POS/ # POS tagging interface
β β βββ Translate/ # Translation interface
β β βββ User/ # User management
β β βββ utils/ # Utility functions
β β βββ Router.js # Application routing
β βββ public/ # Static assets
β βββ package.json # Node.js dependencies
βββ README.md
Prerequisites
Python 3.9.x or 3.10.x
Node.js 12+
Download Node.js
MongoDB
Docker (optional)
To get started with the project, follow these steps:
Clone the Repository:
git clone https://github.com/lingo-iitgn/commentator.git
cd commentator
a. Navigate inside backend folder
cd backend
b. Installing Dependencies
pip install -r requirements.txt
c. Updating Frontend URL
Open
app.py
in a code/text editor (Visual Studio Code, Sublime Text, Notepad etc)
frontend = YOUR_FRONTEND_HOST_URL
OR
frontend = http://localhost:3000
d. Set up MongoDB connection in app.py
Choose one of the following options depending on your setup:
For Local Setup: Ensure MongoDB is running locally, and set:
conn_str = YOUR_MONGODB_URL
OR
conn_str = "mongodb://127.0.0.1:27017/"
For Cloud MongoDB (Atlas): Create an account: π
https://cloud.mongodb.com/
Set up your Cluster, Database, and User.
Copy your connection string and replace credentials:
conn_str = "mongodb+srv://<username>:<password>@cluster0.stlpmgf.mongodb.net/?retryWrites=true&w=majority"
e. Set the database name:
If you want to Change or modify to a specific database, update this line in
app.py
:
database = client['sentences_EMNLP24']
Replace 'sentences_EMNLP24' with your preferred database name as needed.
f. API Configuration for the Translation task:
The backend supports three different AI providers for translation tasks. Configure the API keys based on your preferred provider:
Groq API Setup
Get your Groq API key from https://console.groq.com/keys
and add it to the .env
file in the backend directory:
GROQ_API_KEY="YOUR_GROQ_API_KEY"
Model used: llama-3.3-70b-versatile (you can change this model in the code as needed)
OpenAI API Setup (Optional)
Get your OpenAI API key from https://platform.openai.com/api-keys
. Then, uncomment the OpenAI API integration code in app.py
and set your API key:
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
Model used: gpt-4 (you can change this model in the code as needed)
Anthropic (Claude) API Setup (Optional)
Get your Anthropic API key from https://docs.anthropic.com/en/api/admin-api/apikeys/get-api-key
. Then, uncomment the Anthropic API integration code in app.py
and set your API key:
os.environ["ANTHROPIC_API_KEY"] = "YOUR_ANTHROPIC_API_KEY"
Model used: claude-3-5-sonnet-20241022 (you can change this model in the code as needed)
Note: You can configure multiple API providers, but ensure at least one is properly set up for the translation functionality to work correctly.
g. Running the local server
python app.py
OR
python3 app.py
a. Update frontend/src/.env with your backend URL:
REACT_APP_BACKEND_URL=http://<YOUR_BACKEND_IP_ADDRESS>:5000
OR
REACT_APP_BACKEND_URL=http://localhost:5000
b. Navigate inside frontend folder
cd frontend
c. Install all frontend dependencies post 1st application download.
npm install
OR
npm install --legacy-peer-deps
Note: For the Translation task, install the
react-transliterate library
with:
npm install --save react-transliterate --legacy-peer-deps
d. Start the frontend local server.
npm start
-
Start Frontend and Backend Servers
- Refer to the Frontend Setup section for frontend instructions.
- Refer to the Backend Setup section for backend setup.
-
Create an Admin Account
- Register a new account through the applicationβs interface.
-
Set Admin Privileges in MongoDB
- Access your MongoDB database.
- Locate the user document in the relevant collection.
- Update the user document to set
admin: true
to grant admin privileges for data management.
-
Log in to the Admin Dashboard
- Use the admin account credentials to access the dashboard.
-
Upload Sentences to the Database
- Use the admin dashboard to upload sentences via a
.csv
or.txt
file.
- Use the admin dashboard to upload sentences via a
A. Creating a Docker Hub Account and a public repository
Visit https://hub.docker.com/
B. Updating Dockerfile
FROM python:3.9-slim-buster
WORKDIR /commentator
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
ENV FLASK_APP=app.py
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]
EXPOSE 5000/tcp
C. Push Image to Docker Hub
docker build . -t python-docker
docker tag python-docker <DOCKER_USERNAME>/<REPOSITORY_NAME>
docker push <DOCKER_USERNAME>/<REPOSITORY_NAME>
D. Run Docker server on port 5000
docker run -dp 5000:5000 <DOCKER_USERNAME>/<REPOSITORY_NAME>
E. List of active docker containers
docker ps
F. Stop Docker Container by Container ID.
docker stop <CONTAINER_ID>
-
- Port Already in Use
bash# Kill process on port 5000 lsof -ti:5000 | xargs kill -9
-
-
MongoDB Connection Error
Ensure MongoDB is running on the specified connection string.
-
Start MongoDB:
-
Mac (Homebrew):
brew services start mongodb-community
-
Linux:
sudo systemctl start mongod
-
Windows:
net start MongoDB
-
-
Verify it's running (default port 27017):
netstat -an | grep 27017
-
Check connection URI and credentials: Ensure it follows:
mongodb://<username>:<password>@<host>:<port>/<dbname>
-
-
-
- Docker Build Failed
Check Docker daemon is running and Dockerfile syntax is correct.
-
- Frontend Build Error
Delete the node_modules folder and reinstall:
rm -rf node_modules && npm install --legacy-peer-deps
-
- Backend Build Error
Ensure all dependencies in requirements.txt are correctly installed.
COMMENTATOR provides both annotator interface for efficient and faster annotation and admin interface for result export and analysis. Once setup is done follow these steps:
-
Log in using the Demo Credentials
username: admin
password: admin
username: commentator
password: commentator
-
Sign up to create an account
Collection | Description |
---|---|
lid | Language Identification at Token level |
matrix | Matrix based Identification of Sentences |
pos | POS tags based Identification of Tokens |
ner | Named Entity Recognition of Tokens |
translate | Sentence-level Translation to Target Language |
sentences | Sentences to be annotated |
users | Admin & Annotator Accounts |
Paper Link: EMNLP 2024 Demo Paper
Explore the Project: Project Website
Meet the talented team behind the project!
Contributor | Name | Links |
---|---|---|
![]() |
Rajvee Sheth | |
![]() |
Shubh Nisar | |
![]() |
Heenaben Prajapati | |
![]() |
Himanshu Beniwal | |
![]() |
Mayank Singh |
- https://github.com/microsoft/LID-tool
- https://github.com/sagorbrur/codeswitch
- https://github.com/jiesutd/YEDDA
- https://getmarkup.com/dashboard
- https://inception-project.github.io/
- https://UBIAI.tools/
- https://gate.ac.uk/download/
This project is licensed under the Apache-2.0 license - see the LICENSE file for details.
If you use COMMENTATOR in your research or work, please cite it as follows:
@inproceedings{sheth-etal-2024-commentator,
title = "Commentator: A Code-mixed Multilingual Text Annotation Framework",
author = "Sheth, Rajvee and
Nisar, Shubh and
Prajapati, Heenaben and
Beniwal, Himanshu and
Singh, Mayank",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.11",
pages = "101--109",
}