-
Notifications
You must be signed in to change notification settings - Fork 0
Dataset
Jaxter2017 edited this page Feb 2, 2023
·
28 revisions
The raw data source compromises of 350 maths papers sat as a mock exam by a Year 11 cohort. It is the 2022 Edexcel GCSE Maths Paper 1 originally sat in the summer. I chose to focus on a question I judged to be most difficult for a model to accurately mark/provide feedback for based on the number of marks it was worth, variability of possible answers and its requirement for all working out to be shown.
I have taken a series of steps to prepare this raw handwritten data for model usage. See the examples below for Q3 of the Higher Paper:

- Manually aligned student working so it follows vertically and rewrote certain working out methods in a simple form to aid handwriting conversion model (Notability Conversion)
- Converted this to LaTeX using Mathpix (Rendered + Raw LaTeX)
- Cleaned LaTeX to simple mathematical symbols programmatically (Cleaned Text)