GitHub - rezi-io/resume-standard

Resume Metadata Standard

An ATS compliant standard for all resumes
View Demo · Report Bug · Request Feature

Purpose

PDF resumes are not parsable by nature.
Only standard format resume is compatible with ATS "Applicant Tracking System".
Format creativity is prohibited in favor of ability to be parsable.

This standard want change it by allowing full creativity in resume format and structured data in metadata.

Using an XMP "Extensible Metadata Platform" structure, common to all resume, make all resume parsable.

When to Use:

As a first pass when a user imports a resume: If structured standardized metadata exists, it can be used for 100% accuracy and fast output.
Checking can be done in less than 5ms.
It is up to the platform to decide which information they want to extract and utilize from the metadata.

Who is Currently Using It:

Rezi, Resumatic, Resumai, ResumeBuild, and others (Add your company here)
Over 4 million resumes already generated with this metadata on the Rezi platform.
Can be implemented in any resume builder, ATS, or HR platform to be compliant with this standard (see section How to Implement).

Comparison between Programmatic Parser, NLP/LLM Parser, and Metadata Parser

Criterion	Programmatic Parser	NLP/LLM Parser	Metadata Parser
Speed	High	Slow	Very fast
Cost	Low	High	Practically zero
Maintenance	High (requires manual adjustments for different formats)	Low (easier to maintain with pre-trained models)	None (standardized and requires no maintenance)
Output Quality on Simple Resumes	Moderate	High (accurate extraction of raw data)	100% accuracy (structured data is directly transferred)
Output Quality on Complex Resumes	Low	High	100% accuracy (even with unstructured resumes)
Accuracy	Moderate	Can hallucinate or modify data	100% accuracy (data is faithfully extracted as metadata)
Adaptability	Very low (limited to fixed formats)	Fairly high (adapts to many formats)	Nearly zero (focused on structured data, doesn’t adapt to various CV formats)
Ideal Use Case	Bulk data analysis, requires manual post processing adjustments	Handles a variety of formats, but can be costly for large scale	Best as a first pass to check for structured data in resumes, reduce cost at scale

Getting Started

Setting Up the Parser

Clone the repository:

git clone https://github.com/rezi-io/resume-standard.git
cd resume-standard

Build the parser (tested with Docker 27.5.1):

for local use:

g++ -std=c++17 -o ./parser/lib/rms-parser ./parser/src/parser.cpp

for ARM architecture :

docker run --rm -v "$PWD":/src -w /src ubuntu:20.04 bash -c \                       
  "apt update && apt install -y g++ && g++ -o ./parser/lib/rms-parser ./parser/src/parser.cpp"

for x86 architecture ( like Cloud functions, AWS Lambda, etc):

docker run --rm --platform linux/amd64 -v "$PWD":/src -w /src ubuntu:20.04 bash -c \
  "apt update && apt install -y g++ && g++ -o ./parser/lib/rms-parser ./parser/src/parser.cpp"

Using the Parser

Run the parser against a PDF file to extract the Resume Metadata Standard (RMS) information: It can be use with a Buffer or a file.

./parser/lib/rms-parser ./path/to/your/resume.pdf

Example:

./parser/lib/rms-parser ./pdf-examples/Software\ Engineer.pdf

The parser will first check for the Producer metadata. If it contains "rms_v2", the parser will extract and return all available metadata that conforms to the Resume Metadata Standard.

Example Output

When you run the parser on a PDF with RMS metadata, you'll get JSON output similar to this:

{
  "status": "success",
  "data": {
    "Description rdf:about": "",
    "Producer": "rms_v2.0.1",
    "pdf": "http://ns.adobe.com/pdf/1.3/",
    "producer": "rms_v2.0.1",
    "rms": "https://github.com/rezi-io/resume-standard",
    
    "rms_contact_city": "New York City",
    "rms_contact_country": "United States",
    "rms_contact_email": "[email protected]",
    "rms_contact_fullName": "Charles Bloomberg",
    "rms_contact_github": "n/a",
    "rms_contact_givenNames": "Charles",
    "rms_contact_lastName": "Bloomberg",
    "rms_contact_linkedin": "in/bloomberg",
    "rms_contact_phone": "(621) 799-5548",
    "rms_contact_state": "New York",
    "rms_contact_website": "n/a",
    
    "rms_education_0_date": "2021",
    "rms_education_0_dateFormat": "YYYY",
    "rms_education_0_dateTS": "1609459200000",
    "rms_education_0_description": "n/a",
    "rms_education_0_institution": "New York University",
    "rms_education_0_isGraduate": "true",
    "rms_education_0_location": "New York, NY",
    "rms_education_0_minor": "Computer Science",
    "rms_education_0_qualification": "Bachelor of Science in Biochemistry",
    "rms_education_0_score": "n/a",
    "rms_education_0_scoreType": "n/a",
    "rms_education_count": "1",
    
    "rms_experience_0_company": "Company B",
    "rms_experience_0_dateBegin": "June 2020",
    "rms_experience_0_dateBeginFormat": "MMMM YYYY",
    "rms_experience_0_dateBeginTS": "1590969600000",
    "rms_experience_0_dateEnd": "June 2021",
    "rms_experience_0_dateEndFormat": "MMMM YYYY",
    "rms_experience_0_dateEndTS": "1622505600000",
    "rms_experience_0_description": "• Created and maintained cloud-based service endpoints with Python, Flask, &amp; Django, increasing service uptime to 99% for an early-stage machine vision startup.\n• Designed &amp; developed intuitive UIs for internal users to train &amp; deploy new machine learning models, decreasing manual set-up time by 90%.\n• Developed embedded software for an ML-powered camera product, achieving 10X system throughput using concurrent programming.\n• Generated &amp; curated visual AI training data and tested deep learning systems to ensure 90% accuracy with &lt;10% training data compared with competitors.",
    "rms_experience_0_isCurrent": "false",
    "rms_experience_0_location": "New York, NY",
    "rms_experience_0_role": "Software Engineer Intern",
    "rms_experience_1_company": "Company C",
    "rms_experience_1_dateBegin": "June 2019",
    "rms_experience_1_dateBeginFormat": "MMMM YYYY",
    "rms_experience_1_dateBeginTS": "1559347200000",
    "rms_experience_1_dateEnd": "June 2020",
    "rms_experience_1_dateEndFormat": "MMMM YYYY",
    "rms_experience_1_dateEndTS": "1590969600000",
    "rms_experience_1_description": "• Bootstrapped project to automate reporting while part of a team of hazardous waste management specialists, eliminating 25% of reporting labor using custom UIs and 3rd-party API calls.\n• Wrote one-off scripts to automate revisions of thousands of legacy reports, cutting manual editing hours by 80%.",
    "rms_experience_1_isCurrent": "false",
    "rms_experience_1_location": "New York, NY",
    "rms_experience_1_role": "Script Programmer",
    "rms_experience_2_company": "Company A",
    "rms_experience_2_dateBegin": "June 2021",
    "rms_experience_2_dateBeginFormat": "MMMM YYYY",
    "rms_experience_2_dateBeginTS": "1622505600000",
    "rms_experience_2_dateEnd": "Present",
    "rms_experience_2_dateEndTS": "n/a",
    "rms_experience_2_description": "• Bootstrapped &amp; led a team of 4 developers to modernize fulfillment automation using a novel human-in-the-loop approach, unlocking an 80% increase in operational scale for a 40-person team.\n• Increased software development velocity by mentoring 4 developers and facilitating user interviews, leading to consistent resolution of 90% of bug reports within 24 hours.\n• Built and maintained distributed systems that served 1M daily requests, coordinating with operations, business, technical, and customer teams to serve cross-functional priorities.\n• Built CI/CD pipelines to lint, build, test, review, and deploy containerized applications to production &lt;30 minutes after development with Docker, GitHub Actions, and Heroku.\n• Led initiative in a team of 8 engineers to improve engineering on-call system, reducing the number of alerts per shift by 50% without dropping mission-critical information.\n• Streamlined critical operational processes to achieve a 6% boost in productivity for a team of 40 operators.",
    "rms_experience_2_isCurrent": "true",
    "rms_experience_2_location": "New York, NY",
    "rms_experience_2_role": "Software Engineer",
    "rms_experience_count": "3",
    
    "rms_project_0_description": "• Prototyped and deployed a music-based web app in 2 days using React, Flask, and AWS EC2.\n• Hosted the code on GitHub.",
    "rms_project_0_organization": "n/a",
    "rms_project_0_title": "Web-Based Musical Instrument",
    "rms_project_count": "1",
    
    "rms_schema_details": "https://github.com/rezi-io/resume-standard",
    
    "rms_skill_0_category": "Languages",
    "rms_skill_0_keywords": "Python, JavaScript, SQL",
    "rms_skill_1_category": "Frameworks",
    "rms_skill_1_keywords": "Flask, Django, Node,  Web extensions",
    "rms_skill_2_category": "Tools",
    "rms_skill_2_keywords": "Unix, Bash, GitHub, GitHub Actions, Docker, Cloud, Heroku, CI/CD, PostgreSQL",
    "rms_skill_count": "3",
    
    "rms_summary": "n/a",
    "xmpmeta:x:xmpmeta xmlns:x": "adobe:ns:meta/",
    "xmpmeta:x:xmptk": "Image::ExifTool 10.68"
  }
}

Parser Logic

The basic logic of the parser includes:

Read a PDF file
Extract the XMP metadata
Check if the Producer field contains "rms_v2"
If found, extract all metadata fields that match the Resume Metadata Standard schema
Return the results as JSON

Validating and Using the Data

After extracting the metadata, you can validate it against the schema and use it in your application. Common use cases include:

Automatically populating application forms
Pre-filling resume builder templates
Creating standardized resume databases
Improving ATS compatibility

Schema

This standardized key structure makes the metadata easy to parse programmatically and provides a clear organization for resume data. It also ensures compatibility across different implementations of the Resume Metadata Standard.

Schema Structure

All metadata keys in the Resume Metadata Standard follow a consistent naming structure to ensure clarity and organization of data. The structure follows this pattern:

rms_{section}_{index}_{field}

Where:

rms: Prefix that stands for "Resume Metadata Standard", identifying all keys as part of this standard
{section}: The section of the resume (e.g., experience, education, skill)
{index}: A zero-based numerical index for items within a section (e.g., 0, 1, 2)
{field}: The specific attribute of the item (e.g., title, description, date)

Examples:

rms_experience_0_company - The company name for the first experience entry
rms_education_1_institution - The institution name for the second education entry
rms_skill_2_keywords - The keywords for the third skill category

Special Cases:

Count fields: These keys follow a slightly different pattern and do not include an index:
```
rms_{section}_count
```
For example: rms_experience_count indicates the total number of experience entries.
Contact information: Since contact details have only one instance, they don't require an index:
```
rms_contact_{field}
```
For example: rms_contact_email contains the person's email address.
Summary field: Similarly, the summary is a single entry without an index:
```
rms_summary
```

Essential fields

Namespace	Type	Example
Producer	Text	rms_v2.0.1
rms_schema_detail	Text	https://github.com/rezi-io/resume-standard

Version of the standard used to generate the metadata. This field is essential for the parser to understand the structure of the metadata. You can review changelog here

Contact

Namespace	Type	Example
rms_contact_fullName	Text	Charles Bloomberg
rms_contact_givenNames	Text	Charles
rms_contact_lastName	Text	Bloomberg
rms_contact_email	Text	[email protected]
rms_contact_phone	Text	(621) 799-5548
rms_contact_linkedin	Text	in/bloomberg
rms_contact_github	Text	github.com/charlesbloomberg
rms_contact_behance	Text	github.com/charlesbloomberg
rms_contact_dribble	Text	github.com/charlesbloomberg
rms_contact_website	Text	www.charlesbloomberg.com
rms_contact_country	Text	United States
rms_contact_countryCode	Text (ISO 3166 A-2)	US
rms_contact_city	Text	Madison
rms_contact_state	Text	Wisconsin