This README describes the pipeline for annotating and labeling images, marking character positions, and analyzing data from annotated images. This pipeline includes four main steps, each with an associated part in the notebook to facilitate the process. The steps are outlined below.
Before starting, ensure you have the following dependencies installed:
- Python (version X.X)
- LabelImg (for bounding box annotation)
- Requirement for running python notebooks (
.ipynb
files) such as Anaconda Installed - Additional Python packages (check notebook for specific package imports)
Additionally, organize your folder structure as follows:
images/
: Directory containing the original images to be annotated.xml_annotations/
: Directory where LabelImg will save XML files containing bounding box annotations.annotations.csv
: A file storing image names and associated codes (created in Step 1).final_images/
: Directory where the final labeled images will be saved after processing.
- Purpose: This notebook allows us to load images and annotate each with a unique code.
- Preparation: Ensure your images are saved in the
images/
folder. - Procedure:
- Open
data_annotating.ipynb
. - Update any path variables in the notebook to ensure it reads images from
images/
and savesannotations.csv
in the main directory or a specified path. - The notebook will prompt you to load each image in turn.
- For each image, enter the corresponding code (e.g., alphanumeric string or ID).
- The code entered is saved alongside the image name in an
annotations.csv
file.
- Open
- Output:
annotations.csv
file is generated, containing each image filename and its associated code.
Example:
Below is an example of an original image that will be annotated in the following steps.
- Purpose: Use LabelImg to create bounding boxes around each character in the code on the images and label them in ascending order.
- Preparation:
- Open LabelImg and set the input folder to
images/
to load the images. - Set the output folder to
xml_annotations/
to save XML files with bounding box annotations.
- Open LabelImg and set the input folder to
- Procedure:
- For each character in the code on the image:
- Create a bounding box around the character.
- Label each character in ascending order (e.g., first character = “1st,” second character = “2nd,” etc.).
- Save each annotation as an XML file with the same name as the image file in
xml_annotations/
.
- For each character in the code on the image:
- Output:
- XML files for each image containing the bounding boxes and labels of each character are saved in
xml_annotations/
.
- XML files for each image containing the bounding boxes and labels of each character are saved in
- Purpose: Combine data from the original image,
annotations.csv
, and XML files to generate a final, labeled image. - Preparation:
- Ensure
annotations.csv
is available in the main directory or specify its location in the notebook. - Make sure XML files are stored in
xml_annotations/
and original images are inimages/
. - Specify the output directory for final labeled images as
final_images/
in the notebook.
- Ensure
- Procedure:
- Open
add_coord_img.ipynb
. - Update path variables to point to
images/
,annotations.csv
,xml_annotations/
, andfinal_images/
. - Run the notebook to process each image, adding coordinate labels and visual markers based on bounding boxes from the XML file.
- Open
- Output:
- Labeled images with visible character coordinates are saved in
final_images/
.
- Labeled images with visible character coordinates are saved in
Example:
Below is an example of a final labeled image with annotated character coordinates.
- Purpose: Analyze character position frequency data from the annotated images.
- Preparation:
- Ensure
annotations.csv
and XML files inxml_annotations/
are accessible, along with the images infinal_images/
. - Update the path variables in
char_position_frequency.ipynb
to referenceannotations.csv
and other necessary files.
- Ensure
- Procedure:
- Open
char_position_frequency.ipynb
. - Run the notebook to load the labeled data and compute statistics about character positions, frequency, and distribution.
- Open
- Output:
- Summary statistics or visualizations (e.g., histograms) showing character frequency and positional data are generated and displayed.
- Clone this repository or ensure you have all necessary files organized as specified in the folder structure.
- Execute each notebook in order (
data_annotating.ipynb
,LabelImg
for XML annotations,add_coord_img.ipynb
, andchar_position_frequency.ipynb
). - Save and organize outputs for further analysis.
annotations.csv
: Contains image names and their respective codes..xml
files: XML annotations from LabelImg for each character bounding box.- Final labeled images: Images with annotated character coordinates in
final_images/
. - Data analysis outputs: Frequency statistics or charts from
char_position_frequency.ipynb
.
This pipeline allows you to efficiently annotate images, label character positions, and extract frequency data for analysis.