Skip to content

lingo-iitgn/commentator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

COMMENTATOR ✍️: A Code-mixed Multilingual Text Annotation Framework.

EMNLP License Version Version

COMMENTATOR is a code-mixed annotation tool designed to enhance the quality and efficiency of annotating multilingual, code-mixed text. It reduces annotation time and operational overheads by providing advanced features tailored for code-mixed data. The tool offers intuitive interfaces, automated suggestions, and robust error-checking mechanisms.

🌟 Features

  • Modular Workflows: Supports Admin Workflow (user management, system configuration, progress monitoring) and Annotation Workflow (text annotation, history review).

  • User-Friendly Interface: Intuitive UI for annotators and admins.

  • Scalable Architecture: Built with ReactJS, Flask, and MongoDB for robust performance.

  • Extensibility: Easily extend to new language pairs, refer to the Configuration Changes file in the Documents folder.

  • Inter-Annotator Agreement Calculation: Supports agreement metrics such as Cohen’s Kappa (for two annotators) and Fleiss’ Kappa (for three annotators) to assess annotation consistency.

  • For more details, please refer to our paper (EMNLP 2024:demo).

arch

Architecture: COMMENTATOR features two primary user workflows: Admin workflow and Annotation workflow. The Admin workflow manages user access, system configurations, and monitoring of annotation progress, while the Annotation workflow allows annotators to log in, annotate text, and review their annotation history. Both workflows are integrated into the tool’s modular architecture for efficient management and processing.


πŸ“ Folder Structure

COMMENTATOR/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app.py                # Flask application entry point
β”‚   β”œβ”€β”€ requirements.txt      # Python dependencies
β”‚   β”œβ”€β”€ Dockerfile            # Docker configuration
β”‚   └── LID_tool/             # Language identification modules
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ Admin/            # Admin dashboard components
β”‚   β”‚   β”œβ”€β”€ Auth/             # Authentication components
β”‚   β”‚   β”œβ”€β”€ Components/       # Reusable UI components
β”‚   β”‚   β”œβ”€β”€ Edit/             # Annotation editing interface
β”‚   β”‚   β”œβ”€β”€ Home/             # LID page interface
β”‚   β”‚   β”œβ”€β”€ Matrix/           # Matrix analysis interface
β”‚   β”‚   β”œβ”€β”€ NER/              # NER interface 
β”‚   β”‚   β”œβ”€β”€ POS/              # POS tagging interface
β”‚   β”‚   β”œβ”€β”€ Translate/        # Translation interface 
β”‚   β”‚   β”œβ”€β”€ User/             # User management
β”‚   β”‚   β”œβ”€β”€ utils/            # Utility functions
β”‚   β”‚   └── Router.js         # Application routing
β”‚   β”œβ”€β”€ public/               # Static assets
β”‚   └── package.json          # Node.js dependencies
└── README.md

⚑ Quick Start

Prerequisites

Python 3.9.x or 3.10.x

Node.js 12+ Download Node.js

MongoDB

Docker (optional)

To get started with the project, follow these steps:

Clone the Repository:

git clone https://github.com/lingo-iitgn/commentator.git
cd commentator

Backend [ Local Server ]

Steps to Follow

a. Navigate inside backend folder

cd backend

b. Installing Dependencies

pip install -r requirements.txt

c. Updating Frontend URL

Open app.py in a code/text editor (Visual Studio Code, Sublime Text, Notepad etc)

frontend = YOUR_FRONTEND_HOST_URL
OR
frontend = http://localhost:3000

d. Set up MongoDB connection in app.py

Choose one of the following options depending on your setup:

For Local Setup: Ensure MongoDB is running locally, and set:

conn_str = YOUR_MONGODB_URL
OR
conn_str = "mongodb://127.0.0.1:27017/"

For Cloud MongoDB (Atlas): Create an account: πŸ‘‰ https://cloud.mongodb.com/

Set up your Cluster, Database, and User.

Copy your connection string and replace credentials:

conn_str = "mongodb+srv://<username>:<password>@cluster0.stlpmgf.mongodb.net/?retryWrites=true&w=majority"

e. Set the database name:

If you want to Change or modify to a specific database, update this line in app.py:

database = client['sentences_EMNLP24']

Replace 'sentences_EMNLP24' with your preferred database name as needed.

f. API Configuration for the Translation task:

The backend supports three different AI providers for translation tasks. Configure the API keys based on your preferred provider:

Groq API Setup Get your Groq API key from https://console.groq.com/keys and add it to the .env file in the backend directory:

GROQ_API_KEY="YOUR_GROQ_API_KEY"

Model used: llama-3.3-70b-versatile (you can change this model in the code as needed)

OpenAI API Setup (Optional) Get your OpenAI API key from https://platform.openai.com/api-keys. Then, uncomment the OpenAI API integration code in app.py and set your API key:

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

Model used: gpt-4 (you can change this model in the code as needed)

Anthropic (Claude) API Setup (Optional) Get your Anthropic API key from https://docs.anthropic.com/en/api/admin-api/apikeys/get-api-key. Then, uncomment the Anthropic API integration code in app.py and set your API key:

os.environ["ANTHROPIC_API_KEY"] = "YOUR_ANTHROPIC_API_KEY"

Model used: claude-3-5-sonnet-20241022 (you can change this model in the code as needed)

Note: You can configure multiple API providers, but ensure at least one is properly set up for the translation functionality to work correctly.

g. Running the local server

python app.py
OR
python3 app.py

Frontend [ Local Server ]

Steps to Follow

a. Update frontend/src/.env with your backend URL:

REACT_APP_BACKEND_URL=http://<YOUR_BACKEND_IP_ADDRESS>:5000
OR
REACT_APP_BACKEND_URL=http://localhost:5000

b. Navigate inside frontend folder

cd frontend

c. Install all frontend dependencies post 1st application download.

npm install 
OR
npm install --legacy-peer-deps

Note: For the Translation task, install the react-transliterate library with:

npm install --save react-transliterate --legacy-peer-deps

d. Start the frontend local server.

npm start

πŸ” Administrative Configuration

Steps to Follow

  1. Start Frontend and Backend Servers

    • Refer to the Frontend Setup section for frontend instructions.
    • Refer to the Backend Setup section for backend setup.
  2. Create an Admin Account

    • Register a new account through the application’s interface.
  3. Set Admin Privileges in MongoDB

    • Access your MongoDB database.
    • Locate the user document in the relevant collection.
    • Update the user document to set admin: true to grant admin privileges for data management.
  4. Log in to the Admin Dashboard

    • Use the admin account credentials to access the dashboard.
  5. Upload Sentences to the Database

    • Use the admin dashboard to upload sentences via a .csv or .txt file.

🐳 Containerization of Backend using Docker

Steps to Follow

A. Creating a Docker Hub Account and a public repository

Visit https://hub.docker.com/

B. Updating Dockerfile

FROM python:3.9-slim-buster
WORKDIR /commentator
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
ENV FLASK_APP=app.py
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]
EXPOSE 5000/tcp

C. Push Image to Docker Hub

docker build . -t python-docker
docker tag python-docker <DOCKER_USERNAME>/<REPOSITORY_NAME>
docker push <DOCKER_USERNAME>/<REPOSITORY_NAME>

D. Run Docker server on port 5000

docker run -dp 5000:5000 <DOCKER_USERNAME>/<REPOSITORY_NAME>

E. List of active docker containers

docker ps

F. Stop Docker Container by Container ID.

docker stop <CONTAINER_ID>

βš™οΈ Troubleshooting (Common Issues)

    1. Port Already in Use
    bash# Kill process on port 5000
    lsof -ti:5000 | xargs kill -9
    
    1. MongoDB Connection Error

      Ensure MongoDB is running on the specified connection string.

      1. Start MongoDB:

        • Mac (Homebrew):

          brew services start mongodb-community
        • Linux:

          sudo systemctl start mongod
        • Windows:

          net start MongoDB
      2. Verify it's running (default port 27017):

        netstat -an | grep 27017
      3. Check connection URI and credentials: Ensure it follows:

        mongodb://<username>:<password>@<host>:<port>/<dbname>
        
    1. Docker Build Failed

    Check Docker daemon is running and Dockerfile syntax is correct.

    1. Frontend Build Error

    Delete the node_modules folder and reinstall:

     rm -rf node_modules && npm install --legacy-peer-deps
    
    1. Backend Build Error

    Ensure all dependencies in requirements.txt are correctly installed.


πŸ‘©β€πŸ’» Interfaces

COMMENTATOR provides both annotator interface for efficient and faster annotation and admin interface for result export and analysis. Once setup is done follow these steps:

  1. Log in using the Demo Credentials

    πŸ” Admin Access

    username: admin
    password: admin

    πŸ‘€ Annotator

    username: commentator
    password: commentator

  2. Sign up to create an account


πŸ“¦ Database Schemas

Collection Description
lid Language Identification at Token level
matrix Matrix based Identification of Sentences
pos POS tags based Identification of Tokens
ner Named Entity Recognition of Tokens
translate Sentence-level Translation to Target Language
sentences Sentences to be annotated
users Admin & Annotator Accounts

πŸ”— Relevant Links

Paper Link: EMNLP 2024 Demo Paper

Explore the Project: Project Website


πŸ‘₯ Contributors

Meet the talented team behind the project!

Contributor Name Links
Rajvee Sheth LinkedIn
Shubh Nisar Website
Heenaben Prajapati LinkedIn
Himanshu Beniwal Website
Mayank Singh Website

πŸ‘€ Mentions


πŸ“„ License

This project is licensed under the Apache-2.0 license - see the LICENSE file for details.

Citation

If you use COMMENTATOR in your research or work, please cite it as follows:

@inproceedings{sheth-etal-2024-commentator,
    title = "Commentator: A Code-mixed Multilingual Text Annotation Framework",
    author = "Sheth, Rajvee  and
      Nisar, Shubh  and
      Prajapati, Heenaben  and
      Beniwal, Himanshu  and
      Singh, Mayank",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.11",
    pages = "101--109",
}

About

A code-mixed annotation tool designed to significantly enhance annotation quality and efficiency.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •