Excel_PRIME 🌟

Excel_Performant Reader via Interfaces for Memory Efficiency.
Without using any external libraries.
Optimised for Range extraction.

What does that mean?

Yet another Excel reader ?,
- Starting with .Net 8 as the performant Runtime (See Benchmarks)
- V9 gives an extra 5% boost,
- V10 Another 5% ;-)

Lets take each of the above elements and explain:

Excel 📈

Open Large 2007 (Onwards) XLSX file formats (Binary later, maybe)

Performant 🚀

Try to be as fast as possible, i.e.
- Forward only Lazy loading
- Only "Quick" decipher / convert of the cell(s) types to ease GC pressure
- No attempt at "creating / using" datatables with headers etc.
- Use IEnumerables with initial offset starts (Row / Column)
- Allow CancellationTokens to be used to allow page transitioning cancellation (More on this later)
Now the fastest in Real world usage 2025-11-19

Q & A's

Q: There are others that are faster
A: Agreed, but then
- They do not have range extraction.
- Or optionally allow the use of the OS's TempFile System to store massive sheets
- Or re-use of already extracted (massive) sheets
- Or allow multiple sheets to be read at the same time
  - because others use global memory to represent the current row
  - Or have a single access into the Zip Excel file

Reader 📋

Read only
- Therefore no calculations or updates to formula calls

Interfaces 🏗️

Will use the DotNet core functionality by default
But, if your target deployment allows for the use of native performant binaries, then via the use of interfaces these will be pluggable
- i.e. Using Zlib.Net for getting the data streams out of the compressed Excel file faster. (Or SharpZipLib / PowerPlayZipper)
- A faster / slimmer implementation for xml stream reading (i.e. TurboXml)
Allow the implementation of different source files (i.e. XLSB)

Q & A's

Q: Why?
A: As mentioned above, this is to allow a developer to replace with external nugets that might perform better XML speed etc.

Memory 🌐

The reason for this project, is to handle very large XSLX files (i.e. > 500K rows with > 180 columns per sheet, with multiple sheets of this size)
For ETL validation scenarios, i.e. make sure that the user modified data that has been transferred has interaction rules applied, before moving onto the T and L stages
Try not to hit / store in the LOH
No internal .Net memory of previously loaded sheets / rows.

Q & A's

Q: It appears that this uses more memory than other implementations
A: Currently yes, but it is being optimised for Range Extraction,
- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via ToList() call);
- AND there is work in place to allow multiple sheets to be read at the same time (Unlike some to of the others that use global memory to represent a row)
- And it appears that the current benchmarks do not extract unless a ToString and a check on the result is used (Otherwise the Jit removes the unassigned dead code)

Efficiency 📦

As hinted by the above statements, this is to be targetted at memory restricted environments (i.e. ASP Net VM's)
Use the OS's Temp File caching, so if the memory is tight then the Owner app will not have to worry about OOM exceptions, or having to use Swap Disk speeds.
Only unzip the sheet(s) when they are asked for
Only load the shared strings upto the current request number

Q & A's

Q: Sometimes the Async await s add too much overhead
A: true, that is why there are also the equivalent base interfaces that perform the same functionality without the need for the async await overheads.

Etc. 🔧

`CancellationToken`s

This is to allow the Large files to be Aborted
Make "Most" of the "Net Cores'" API's Asynchronous Tasks

IDisposable

Got to tidy up those Temp Files, and release the FileStream's

It will not ⛔:

Be: Same sheet Thread safe 📊

It will Not be same sheet Instance thread safe, because the xml reader will be locked (Forward only) to the sheet in use.
- but you can Open the sheet more than once, and have different threads running over it,
- And you can have Parallel threads access the Excel file
- Just remember to set Options{ AccessExcelFileInForwardOnlyMode = false}

Do: Dynamic Ranges ⚠️

i.e. Ones that contain formulas:
- <definedName name="Prices">OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A),1)</definedName>

Do: Poco 🤖

A POCO / Type populator (Extensions can be written for that later)

Be a: Writer / Modifier 📚

Totally beyond the scope of this project remit

Badge 🔄	Area
	Release build and tests

Targets 🎯

Phase 0

✅ Setup this github
✅ Create the main project
✅ Add Unit Test project
✅ Add simple Test Data

Phase Alpha

✅ Use Net Core Interface(s)
- ✅ Use ZipArchive
- ✅ Use XDocument
✅ Implement Open / Dispose (Async)
- ✅ Sheet Names
- ✅ Shared Strings
✅ Implement Sheet loading (unzip and be ready for use)
- ✅ Use XDocument as POC only
✅ Implement Row extraction
- ✅ Skip
- ✅ Delayed read - until a cell is actually needed
- ✅ Deal with Null / Empty cells (Utilise sparse array?)
- ✅ Keep last used offset (i.e. no need to reload sheet if the next range API startRow call is later)

Phase Beta - Benchmarks ⏱️

✅ Benchmarks
- ✅ Add Other "Excel readers" to the Benchmark project(s)
- 🎉 Now With Sylvan.Data.Excel
- 🎉 Now With XlsxHelper
✅ More UnitTests

Phase 1 - MVP 🔍

✅ Add IEnumerables and benchmark
- ⚠️ Still not convinced whether to implement "all the way down"
✅ Implement XmlReader.Create for
- ✅ Loading sharedStrings
- ✅ Sheet loading
- ✅ Some Profiling Enahancements
  - ✅ Performance 2025-10-18-pm
✅ More Benchmarks
- Now With FastExcel
- ✅ Some Profiling Enahancements
  - 🚀 Big Performance improvements 2025-10-19-pm
✅ Better Storage of the SharedStrings
- ✅ Use of LazyLoading Class
  - ⚠️ Performance 2025-10-14
- ✅ Use of Derived XmlNamedTable implementations
- ✅ Locking for separate sheet thread reading
  - ⚠️ Performance 2025-10-25
- ✅ Restricted storage (i.e. do not return things that are not relevant)
  - 🚀 Big Performance improvements 2025-10-26
✅ Cell object type 📅 - 🚀 Big Performance improvements 2025-11-01
- ✅ Cell converted when read (i.e. you will know the type that you want, and you can convert it.)
  - 🚀 Big Performance improvements 2025-11-04
✅ Use internal ZipEntry rented buffer
- ✅ Add and explain usage in options
  - 🚀 Big Performance improvements 2025-11-07
✅ Investigation into the smallest function 💪
- 🚀 More Performance improvements 2025-11-08
✅ Optimise for CellConversion.None 💪
- 🚀 More Performance improvements 2025-11-12
✅ Parallel Sheet threads Access
- ✅ Multiple times (with locking)
✅ Nuget
- ✅ Beta etc.
- 🎊 Released as Nuget V1.yyMM.dd -> 1.2511.14

Phase 2 - RC

✅ Add IEnumerables All the way down ⤵️
- i.e. remove the need for Asynchronous awaits
- 🚀 Yielding More Performance improvements 2025-11-19
- ⛓️‍💥 Breaking Change 🔩
  - The Async classes now have Async appended to be distinct from the non async versions
  - But, Async inherit from the non, so they are interchangable
✅ Nuget
- ✅ Manual workflow deploy Release
- ✅ Manual workflow deploy Beta
✅ Read definedNames (Ranges / Cell / Value / Dynamic) 📇
- ✅ Read from global
- ✅ Handle Dynamics (i.e. do not fall over! 🤷)
✅ Deal with blank rows in a sheet 🗋
- ✅ Return a null cell row
✅ Deal with Empty cells in a row 🗅
- ✅ Return a null cell (e.g. <c r="F12" s="8"/>)
✅ Implement Sheet scoping of definedNames
- ✅ i.e. <definedName name="OrderSize" localSheetId="0">'Try it Yourself'!$C$12:$E$12</definedName>
- Note: The above will be referenced as OrderSize (Try it Yourself) as shown in LibreOffice.
✅ Implement Row extraction 📟
- ✅ Allow ColumnHeader addressing (i.e. start -> end columns)
✅ Implement RangeExtraction 📲
- ✅ Global rangeNames
- ✅ Make DefinedName's work with localSheetIddefinitions
- ✅ User defined, using the "A1:B10" or "$A$1:$B$10" syntax
  - Range Performance on 2025-12-10
✅ Add Benchmarks for "Excel readers" That perform Range Extraction
- ✅ ClosedXML Version="0.105.0"
  - Performance on 2025-11-25
- ✅ EPPlus_LPGL Version="4.5.3.13"
  - Performance on 2025-11-25
- ⚠️ FastExcel Version="3.0.13" -> Fails on Range Extraction
- ✅ FreeSpire.XLS Version="14.2.0"
  - Performance on 2025-11-27
- ✅ Aspose.Cells Version="25.11.0"
  - Performance on 2025-11-28
- ⚠️ Extend bencmarks to cover the other large file types
  - It appears that most of the others do not like the pivot-tables file.!! 🤯
✅ Investigate memory usage(s) 🧑‍💻
- ✅ Some performance improvements 🏃‍➡️ Performance on 2025-12-01
- ✅ More performance improvements 🏃‍➡️ Performance on 2025-12-04
- ✅ Sacrificed a little speed ➡️ Performance on 2025-12-07
✅ Release as Nuget V2.2512-10 💨

V2 Changes

2025-12-14

Implement GetUserRange(...)
- Range Performance on 2025-12-14

Phase 3 - XLSB 💾 - Alpha V3

Phase 4 - Specific Cell value type(s) #️⃣

Cell object type 📅
- Deal with DateOnly / TimeOnly fields -> CellConversion.NumberAndDates 💹
- Use of user defined column schema (Excel Number Format nuget?)
- Formatter applied -> CellConversion.ForceStyles
- Operator based conversion
- Investigate if the XmlConvert classes are efficient
Benchmarks

Phase 5 - Third Party Nugets 📦

Excercise the Implementation of Interfaces for other Libs (Xml / Zip)
- Separate Nuget(s) ?
Benchmarks

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github		.github
Excel_PRIME.Bench		Excel_PRIME.Bench
Excel_PRIME.RangeBench		Excel_PRIME.RangeBench
Excel_PRIME.Tests		Excel_PRIME.Tests
Excel_PRIME		Excel_PRIME
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
Excel.ico		Excel.ico
Excel.png		Excel.png
Excel_PRIME.sln.DotSettings.user		Excel_PRIME.sln.DotSettings.user
Excel_PRIME.slnx		Excel_PRIME.slnx
LICENSE		LICENSE
Microsoft_Excel_(2010).svg		Microsoft_Excel_(2010).svg
Performance.md		Performance.md
README.md		README.md
Release_Notes.md		Release_Notes.md

License

Smurf-IV/Excel_PRIME

Folders and files

Latest commit

History

Repository files navigation

Excel_PRIME 🌟

What does that mean?

Excel 📈

Performant 🚀

Q & A's

Reader 📋

Interfaces 🏗️

Q & A's

Memory 🌐

Q & A's

Efficiency 📦

Q & A's

Etc. 🔧

CancellationTokens

IDisposable

It will not ⛔:

Be: Same sheet Thread safe 📊

Do: Dynamic Ranges ⚠️

Do: Poco 🤖

Be a: Writer / Modifier 📚

Targets 🎯

Phase 0

Phase Alpha

Phase Beta - Benchmarks ⏱️

Phase 1 - MVP 🔍

Phase 2 - RC

V2 Changes

2025-12-14

Phase 3 - XLSB 💾 - Alpha V3

Phase 4 - Specific Cell value type(s) #️⃣

Phase 5 - Third Party Nugets 📦

Phase 6 - ideas 💡

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

`CancellationToken`s

Packages