Skip to content

rezi-io/resume-standard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Resume Metadata Standard

An ATS compliant standard for all resumes
View Demo · Report Bug · Request Feature

Table of Contents

  1. Purpose
  2. Comparison
  3. Getting Started
  4. Schema
  5. Additional Documentation
  6. License
  7. Contributor
  8. Acknowledgements

Purpose

  • PDF resumes are not parsable by nature.
  • Only standard format resume is compatible with ATS "Applicant Tracking System".
  • Format creativity is prohibited in favor of ability to be parsable.

This standard want change it by allowing full creativity in resume format and structured data in metadata.

Using an XMP "Extensible Metadata Platform" structure, common to all resume, make all resume parsable.

When to Use:

  • As a first pass when a user imports a resume: If structured standardized metadata exists, it can be used for 100% accuracy and fast output.
  • Checking can be done in less than 5ms.
  • It is up to the platform to decide which information they want to extract and utilize from the metadata.

Who is Currently Using It:

Comparison between Programmatic Parser, NLP/LLM Parser, and Metadata Parser

Criterion Programmatic Parser NLP/LLM Parser Metadata Parser
Speed High Slow Very fast
Cost Low High Practically zero
Maintenance High (requires manual adjustments for different formats) Low (easier to maintain with pre-trained models) None (standardized and requires no maintenance)
Output Quality on Simple Resumes Moderate High (accurate extraction of raw data) 100% accuracy (structured data is directly transferred)
Output Quality on Complex Resumes Low High 100% accuracy (even with unstructured resumes)
Accuracy Moderate Can hallucinate or modify data 100% accuracy (data is faithfully extracted as metadata)
Adaptability Very low (limited to fixed formats) Fairly high (adapts to many formats) Nearly zero (focused on structured data, doesn’t adapt to various CV formats)
Ideal Use Case Bulk data analysis, requires manual post processing adjustments Handles a variety of formats, but can be costly for large scale Best as a first pass to check for structured data in resumes, reduce cost at scale

Getting Started

Setting Up the Parser

  1. Clone the repository:
git clone https://github.com/rezi-io/resume-standard.git
cd resume-standard
  1. Build the parser (tested with Docker 27.5.1):
  • for local use:
g++ -std=c++17 -o ./parser/lib/rms-parser ./parser/src/parser.cpp
  • for ARM architecture :
docker run --rm -v "$PWD":/src -w /src ubuntu:20.04 bash -c \                       
  "apt update && apt install -y g++ && g++ -o ./parser/lib/rms-parser ./parser/src/parser.cpp"
  • for x86 architecture ( like Cloud functions, AWS Lambda, etc):
docker run --rm --platform linux/amd64 -v "$PWD":/src -w /src ubuntu:20.04 bash -c \
  "apt update && apt install -y g++ && g++ -o ./parser/lib/rms-parser ./parser/src/parser.cpp"

Using the Parser

Run the parser against a PDF file to extract the Resume Metadata Standard (RMS) information: It can be use with a Buffer or a file.

./parser/lib/rms-parser ./path/to/your/resume.pdf

Example:

./parser/lib/rms-parser ./pdf-examples/Software\ Engineer.pdf

The parser will first check for the Producer metadata. If it contains "rms_v2", the parser will extract and return all available metadata that conforms to the Resume Metadata Standard.

Example Output

When you run the parser on a PDF with RMS metadata, you'll get JSON output similar to this:

{
  "status": "success",
  "data": {
    "Description rdf:about": "",
    "Producer": "rms_v2.0.1",
    "pdf": "http://ns.adobe.com/pdf/1.3/",
    "producer": "rms_v2.0.1",
    "rms": "https://github.com/rezi-io/resume-standard",
    
    "rms_contact_city": "New York City",
    "rms_contact_country": "United States",
    "rms_contact_email": "[email protected]",
    "rms_contact_fullName": "Charles Bloomberg",
    "rms_contact_github": "n/a",
    "rms_contact_givenNames": "Charles",
    "rms_contact_lastName": "Bloomberg",
    "rms_contact_linkedin": "in/bloomberg",
    "rms_contact_phone": "(621) 799-5548",
    "rms_contact_state": "New York",
    "rms_contact_website": "n/a",
    
    "rms_education_0_date": "2021",
    "rms_education_0_dateFormat": "YYYY",
    "rms_education_0_dateTS": "1609459200000",
    "rms_education_0_description": "n/a",
    "rms_education_0_institution": "New York University",
    "rms_education_0_isGraduate": "true",
    "rms_education_0_location": "New York, NY",
    "rms_education_0_minor": "Computer Science",
    "rms_education_0_qualification": "Bachelor of Science in Biochemistry",
    "rms_education_0_score": "n/a",
    "rms_education_0_scoreType": "n/a",
    "rms_education_count": "1",
    
    "rms_experience_0_company": "Company B",
    "rms_experience_0_dateBegin": "June 2020",
    "rms_experience_0_dateBeginFormat": "MMMM YYYY",
    "rms_experience_0_dateBeginTS": "1590969600000",
    "rms_experience_0_dateEnd": "June 2021",
    "rms_experience_0_dateEndFormat": "MMMM YYYY",
    "rms_experience_0_dateEndTS": "1622505600000",
    "rms_experience_0_description": "• Created and maintained cloud-based service endpoints with Python, Flask, & Django, increasing service uptime to 99% for an early-stage machine vision startup.\n• Designed & developed intuitive UIs for internal users to train & deploy new machine learning models, decreasing manual set-up time by 90%.\n• Developed embedded software for an ML-powered camera product, achieving 10X system throughput using concurrent programming.\n• Generated & curated visual AI training data and tested deep learning systems to ensure 90% accuracy with <10% training data compared with competitors.",
    "rms_experience_0_isCurrent": "false",
    "rms_experience_0_location": "New York, NY",
    "rms_experience_0_role": "Software Engineer Intern",
    "rms_experience_1_company": "Company C",
    "rms_experience_1_dateBegin": "June 2019",
    "rms_experience_1_dateBeginFormat": "MMMM YYYY",
    "rms_experience_1_dateBeginTS": "1559347200000",
    "rms_experience_1_dateEnd": "June 2020",
    "rms_experience_1_dateEndFormat": "MMMM YYYY",
    "rms_experience_1_dateEndTS": "1590969600000",
    "rms_experience_1_description": "• Bootstrapped project to automate reporting while part of a team of hazardous waste management specialists, eliminating 25% of reporting labor using custom UIs and 3rd-party API calls.\n• Wrote one-off scripts to automate revisions of thousands of legacy reports, cutting manual editing hours by 80%.",
    "rms_experience_1_isCurrent": "false",
    "rms_experience_1_location": "New York, NY",
    "rms_experience_1_role": "Script Programmer",
    "rms_experience_2_company": "Company A",
    "rms_experience_2_dateBegin": "June 2021",
    "rms_experience_2_dateBeginFormat": "MMMM YYYY",
    "rms_experience_2_dateBeginTS": "1622505600000",
    "rms_experience_2_dateEnd": "Present",
    "rms_experience_2_dateEndTS": "n/a",
    "rms_experience_2_description": "• Bootstrapped & led a team of 4 developers to modernize fulfillment automation using a novel human-in-the-loop approach, unlocking an 80% increase in operational scale for a 40-person team.\n• Increased software development velocity by mentoring 4 developers and facilitating user interviews, leading to consistent resolution of 90% of bug reports within 24 hours.\n• Built and maintained distributed systems that served 1M daily requests, coordinating with operations, business, technical, and customer teams to serve cross-functional priorities.\n• Built CI/CD pipelines to lint, build, test, review, and deploy containerized applications to production <30 minutes after development with Docker, GitHub Actions, and Heroku.\n• Led initiative in a team of 8 engineers to improve engineering on-call system, reducing the number of alerts per shift by 50% without dropping mission-critical information.\n• Streamlined critical operational processes to achieve a 6% boost in productivity for a team of 40 operators.",
    "rms_experience_2_isCurrent": "true",
    "rms_experience_2_location": "New York, NY",
    "rms_experience_2_role": "Software Engineer",
    "rms_experience_count": "3",
    
    "rms_project_0_description": "• Prototyped and deployed a music-based web app in 2 days using React, Flask, and AWS EC2.\n• Hosted the code on GitHub.",
    "rms_project_0_organization": "n/a",
    "rms_project_0_title": "Web-Based Musical Instrument",
    "rms_project_count": "1",
    
    "rms_schema_details": "https://github.com/rezi-io/resume-standard",
    
    "rms_skill_0_category": "Languages",
    "rms_skill_0_keywords": "Python, JavaScript, SQL",
    "rms_skill_1_category": "Frameworks",
    "rms_skill_1_keywords": "Flask, Django, Node,  Web extensions",
    "rms_skill_2_category": "Tools",
    "rms_skill_2_keywords": "Unix, Bash, GitHub, GitHub Actions, Docker, Cloud, Heroku, CI/CD, PostgreSQL",
    "rms_skill_count": "3",
    
    "rms_summary": "n/a",
    "xmpmeta:x:xmpmeta xmlns:x": "adobe:ns:meta/",
    "xmpmeta:x:xmptk": "Image::ExifTool 10.68"
  }
}

Parser Logic

The basic logic of the parser includes:

  1. Read a PDF file
  2. Extract the XMP metadata
  3. Check if the Producer field contains "rms_v2"
  4. If found, extract all metadata fields that match the Resume Metadata Standard schema
  5. Return the results as JSON

Validating and Using the Data

After extracting the metadata, you can validate it against the schema and use it in your application. Common use cases include:

  • Automatically populating application forms
  • Pre-filling resume builder templates
  • Creating standardized resume databases
  • Improving ATS compatibility

Schema

This standardized key structure makes the metadata easy to parse programmatically and provides a clear organization for resume data. It also ensures compatibility across different implementations of the Resume Metadata Standard.

Schema Structure

All metadata keys in the Resume Metadata Standard follow a consistent naming structure to ensure clarity and organization of data. The structure follows this pattern:

rms_{section}_{index}_{field}

Where:

  • rms: Prefix that stands for "Resume Metadata Standard", identifying all keys as part of this standard
  • {section}: The section of the resume (e.g., experience, education, skill)
  • {index}: A zero-based numerical index for items within a section (e.g., 0, 1, 2)
  • {field}: The specific attribute of the item (e.g., title, description, date)

Examples:

  • rms_experience_0_company - The company name for the first experience entry
  • rms_education_1_institution - The institution name for the second education entry
  • rms_skill_2_keywords - The keywords for the third skill category

Special Cases:

  1. Count fields: These keys follow a slightly different pattern and do not include an index:

    rms_{section}_count
    

    For example: rms_experience_count indicates the total number of experience entries.

  2. Contact information: Since contact details have only one instance, they don't require an index:

    rms_contact_{field}
    

    For example: rms_contact_email contains the person's email address.

  3. Summary field: Similarly, the summary is a single entry without an index:

    rms_summary
    

Essential fields

Namespace Type Example
Producer Text rms_v2.0.1
rms_schema_detail Text https://github.com/rezi-io/resume-standard

Version of the standard used to generate the metadata. This field is essential for the parser to understand the structure of the metadata. You can review changelog here

Contact

Namespace Type Example
rms_contact_fullName Text Charles Bloomberg
rms_contact_givenNames Text Charles
rms_contact_lastName Text Bloomberg
rms_contact_email Text [email protected]
rms_contact_phone Text (621) 799-5548
rms_contact_linkedin Text in/bloomberg
rms_contact_github Text github.com/charlesbloomberg
rms_contact_behance Text github.com/charlesbloomberg
rms_contact_dribble Text github.com/charlesbloomberg
rms_contact_website Text www.charlesbloomberg.com
rms_contact_country Text United States
rms_contact_countryCode Text (ISO 3166 A-2) US
rms_contact_city Text Madison
rms_contact_state Text Wisconsin

Summary

Namespace Type Example
rms_summary Text Software Engineer with years of experience....

Experience

Namespace Type Example
rms_experience_count Integer 31
rms_experience_0_company
rms_experience_1_company
rms_experience_2_company
...
rms_experience_N_company
Text Sony
rms_experience_N_description Text • Negotiated best pricing for hardware and software quotes and procurement.
• Realigned and managed Sprint Wireless and XO Communication vendor accounts to fit business needs...
rms_experience_N_dateBegin Text June 2019
rms_experience_N_dateBeginTS Integer 1559347200000
rms_experience_N_dateEnd Text June 2020
rms_experience_N_dateEndTS Integer 1590969600000
rms_experience_N_isCurrent Bool false
rms_experience_N_role Text IT Support Specialist
rms_experience_N_location Text Palo Alto, CA

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Education

Namespace Type Example
rms_education_count Text 31
rms_education_0_institution
rms_education_1_institution
rms_education_2_institution
...
rms_education_N_institution
Text University of Wisconsin - Madison
rms_education_N_qualification Text Bachelors of Science in Computer Science
rms_education_N_location Text Madison, Wisconsin
rms_education_N_date Text 2014
rms_education_N_dateTS Text 1388534400000
rms_education_N_dateFormat Text YYYY
rms_education_N_isGraduate Text true
rms_education_N_minor Text Mathematics
rms_education_N_score Text 3.81
rms_education_N_scoreType Text GPA

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Certification

Namespace Type Example
rms_certification_count Text 31
rms_certification_0_name
rms_certification_1_name
rms_certification_2_name
...
rms_certification_N_name
Text Project Management Professional (PMP)
rms_certification_N_department Text Project Management Institute
rms_certification_N_date Text 2014
rms_certification_N_dateFormat Text YYYY
rms_certification_N_description Text • Certified in a standardized and evolving set of project management principles.

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Coursework

Namespace Type Example
rms_coursework_count Text 31
rms_coursework_0_name
rms_coursework_1_name
rms_coursework_2_name
...
rms_coursework_N_name
Text Introduction to Computer Systems
rms_coursework_N_department Text University of Wisconsin, Madison
rms_coursework_N_date Text 2017
rms_coursework_N_dateTS Text 1483228800000
rms_coursework_N_dateFormat Text YYYY
rms_coursework_N_description Text • Coordinating on code with a small group of people.
rms_coursework_N_skill Text Teamwork

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Involvement

Namespace Type Example
rms_involvement_count Text 31
rms_involvement_0_organization
rms_involvement_1_organization
rms_involvement_2_organization
...
rms_involvement_N_organization
Text Economics Student Association
rms_involvement_N_location Text University of Wisconsin, Madison
rms_involvement_N_dateBegin Text June 2014
rms_involvement_N_dateBeginTS Text 1401580800000
rms_involvement_N_dateBeginFormat Text MMMM YYYY
rms_involvement_N_dateEnd Text September 2016
rms_involvement_N_dateEndTS Text 1472688000000
rms_involvement_N_dateEndFormat Text MMMM YYYY
rms_involvement_N_role Text Selected Member
rms_involvement_N_description Text • Participated in forums and discussions presented by key economic thinkers and companies associated with the university.

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Project

Namespace Type Example
rms_project_count Text 31
rms_project_0_title
rms_project_1_title
rms_project_2_title
...
rms_project_N_title
Text Volunteer
rms_project_N_organization Text Habitat for Humanity
rms_project_N_role Text Administrator
rms_project_N_dateBegin Text June 2014
rms_project_N_dateBeginTS Text 1401580800000
rms_project_N_dateBeginFormat Text MMMM YYYY
rms_project_N_dateEnd Text September 2016
rms_project_N_dateEndTS Text 1472688000000
rms_project_N_dateEndFormat Text MMMM YYYY
rms_project_N_description Text • Volunteered to help renovate a house and managed a team of 6.
rms_project_N_url Text https://github.com/username/project-repository

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Skill

Namespace Type Example
rms_skill_count Text 31
rms_skill_0_category
rms_skill_1_category
rms_skill_2_category
...
rms_skill_N_category
Text Language
rms_skill_N_keywords Text French, English, Korean

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Publication

Namespace Type Example
rms_publication_count Text 31
rms_publication_0_title
rms_publication_1_title
rms_publication_2_title
...
rms_publication_N_title
Text Machine Learning for Natural Language Processing
rms_publication_N_organization Text IEEE Transactions on Neural Networks
rms_publication_N_role Text Lead Author
rms_publication_N_date Text June 2021
rms_publication_N_dateTS Text 1622505600000
rms_publication_N_dateFormat Text MMMM YYYY
rms_publication_N_description Text • Researched and published a paper detailing new approaches to natural language processing using advanced neural network architectures.
rms_publication_N_type Text Academic Journal

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Award

Namespace Type Example
rms_award_count Text 31
rms_award_0_title
rms_award_1_title
rms_award_2_title
...
rms_award_N_title
Text Excellence in Research Award
rms_award_N_organization Text Computer Science Department
rms_award_N_date Text May 2020
rms_award_N_dateTS Text 1588291200000
rms_award_N_dateFormat Text MMMM YYYY
rms_award_N_description Text • Awarded for outstanding contributions to the field of Computer Science Research during graduate studies.

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Reference

Namespace Type Example
rms_reference_count Text 31
rms_reference_0_name
rms_reference_1_name
rms_reference_2_name
...
rms_reference_N_name
Text Dr. Jane Smith
rms_reference_N_phone Text (555) 123-4567
rms_reference_N_email Text [email protected]
rms_reference_N_type Text Professional
rms_reference_N_organization Text University of California, Berkeley
rms_reference_N_role Text Department Chair

1 : Integer from 0 to 15 giving the total number of items in this specific section.

Additional Documentation

Data Formats

Date and Timestamp Formats

  • Date Display Format: Different fields use different date display formats based on their context:

    • YYYY (e.g., "2021") - Used for education graduation dates and similar annual milestones
    • MMMM YYYY (e.g., "June 2021") - Used for most work experiences, projects, etc.
    • Other formats should be specified in the corresponding dateFormat field
  • Timestamp Fields: All dateTS fields store Unix timestamps in milliseconds since epoch (January 1, 1970).

    • This provides a standardized way to store and compare dates programmatically
    • Example: 1622505600000 represents June 1, 2021, 00:00:00 UTC

URL and Social Media Fields

  • URL Fields:
    • Standard URL fields (website, GitHub, etc.) should include full URLs when possible
    • For social media profiles, the username or handle format is acceptable:
      • LinkedIn: in/username instead of full https://linkedin.com/in/username
      • GitHub: username or github.com/username
      • Dribbble: username or dribbble.com/username
      • Behance: username or behance.net/username

Handling Missing Data

  • Empty Fields: Use n/a for fields that are intentionally left blank rather than omitting them
    • This indicates that the field was considered but has no value
    • Example: "rms_contact_github": "n/a"
  • Unknown Dates: For current positions where end date is unknown, use Present in the date field and omit the dateTS field

Metadata Structure

  • XMP Structure: The RMS uses Adobe's Extensible Metadata Platform (XMP) embedded in the PDF
    • XMP allows for structured metadata to be stored in the PDF file without altering its appearance
    • The metadata is organized into namespaces, with the RMS namespace identified by the "rms_" prefix
    • Multiple tools can read XMP data, not just ExifTool

License

Distributed under the MIT License.

Contributor

Rezi Development Team - [email protected]

Acknowledgements

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages