- Excel_Performant Reader via Interfaces for Memory Efficiency.
- Without using any external libraries.
- Optimised for Range extraction.

- Yet another Excel reader ?,
- Starting with .Net 8 as the performant Runtime (See Benchmarks)
- V9 gives an extra 5% boost,
- V10 Another 5% ;-)
Lets take each of the above elements and explain:
- Open Large 2007 (Onwards) XLSX file formats (Binary later, maybe)
- Try to be as fast as possible, i.e.
- Forward only Lazy loading
- Only "Quick" decipher / convert of the cell(s) types to ease GC pressure
- No attempt at "creating / using" datatables with headers etc.
- Use
IEnumerables with initial offset starts (Row / Column) - Allow
CancellationTokens to be used to allow page transitioning cancellation (More on this later)
- Now the fastest in Real world usage 2025-11-19
- Q: There are others that are faster
- A: Agreed, but then
- They do not have range extraction.
- Or
optionallyallow the use of the OS's TempFile System to store massive sheets - Or re-use of already extracted (massive) sheets
- Or allow multiple sheets to be read at the same time
- because others use global memory to represent the current row
- Or have a single access into the Zip Excel file
- Read only
- Therefore no calculations or updates to formula calls
- Will use the DotNet core functionality by default
- But, if your target deployment allows for the use of native performant binaries, then via the use of interfaces these will be pluggable
- i.e. Using
Zlib.Netfor getting the data streams out of the compressed Excel file faster. (OrSharpZipLib/PowerPlayZipper) - A faster / slimmer implementation for xml stream reading (i.e. TurboXml)
- i.e. Using
- Allow the implementation of different source files (i.e. XLSB)
- Q: Why?
- A: As mentioned above, this is to allow a developer to replace with external nugets that might perform better XML speed etc.
- The reason for this project, is to handle very large XSLX files (i.e. > 500K rows with > 180 columns per sheet, with multiple sheets of this size)
- For
ETLvalidation scenarios, i.e. make sure that the user modified data that has been transferred has interaction rules applied, before moving onto theTandLstages - Try not to hit / store in the LOH
- No internal .Net memory of previously loaded sheets / rows.
- Q: It appears that this uses more memory than other implementations
- A: Currently yes, but it is being optimised for
Range Extraction,- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via
ToList()call); - AND there is work in place to allow multiple sheets to be read at the same time (Unlike some to of the others that use global memory to represent a row)
- And it appears that the current benchmarks do not extract unless a
ToStringand a check on the result is used (Otherwise the Jit removes the unassigned dead code)
- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via
- As hinted by the above statements, this is to be targetted at memory restricted environments (i.e. ASP Net VM's)
- Use the OS's
Temp Filecaching, so if the memory is tight then the Owner app will not have to worry about OOM exceptions, or having to use Swap Disk speeds. - Only unzip the sheet(s) when they are asked for
- Only load the shared strings upto the current request number
- Q: Sometimes the
Asyncawait s add too much overhead - A: true, that is why there are also the equivalent base interfaces that perform the same functionality without the need for the
async awaitoverheads.
- This is to allow the Large files to be Aborted
- Make "Most" of the "Net Cores'" API's Asynchronous
Tasks
- Got to tidy up those
Temp Files, and release theFileStream's
- It will Not be same sheet Instance thread safe, because the xml reader will be locked (Forward only) to the sheet in use.
- but you can Open the sheet more than once, and have different threads running over it,
- And you can have Parallel threads access the Excel file
- Just remember to set
Options{ AccessExcelFileInForwardOnlyMode = false}
- i.e. Ones that contain formulas:
<definedName name="Prices">OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A),1)</definedName>
- A POCO / Type populator (Extensions can be written for that later)
- Totally beyond the scope of this project remit
| Badge π | Area |
|---|---|
| Release build and tests |
- β Setup this github
- β Create the main project
- β Add Unit Test project
- β Add simple Test Data
- β
Use Net Core Interface(s)
- β
Use
ZipArchive - β
Use
XDocument
- β
Use
- β
Implement Open / Dispose (Async)
- β Sheet Names
- β Shared Strings
- β
Implement Sheet loading (unzip and be ready for use)
- β
Use
XDocumentas POC only
- β
Use
- β
Implement Row extraction
- β Skip
- β Delayed read - until a cell is actually needed
- β Deal with Null / Empty cells (Utilise sparse array?)
- β
Keep last used offset (i.e. no need to reload sheet if the next range API
startRowcall is later)
- β
Benchmarks
- β Add Other "Excel readers" to the Benchmark project(s)
- π Now With
Sylvan.Data.Excel - π Now With
XlsxHelper
- β More UnitTests
- β
Add
IEnumerables and benchmarkβ οΈ Still not convinced whether to implement "all the way down"
- β
Implement
XmlReader.Createfor- β Loading sharedStrings
- β Sheet loading
- β
Some Profiling Enahancements
- β Performance 2025-10-18-pm
- β
More Benchmarks
- Now With
FastExcel - β
Some Profiling Enahancements
- π Big Performance improvements 2025-10-19-pm
- Now With
- β
Better
Storageof the SharedStrings- β
Use of LazyLoading Class
β οΈ Performance 2025-10-14
- β
Use of Derived
XmlNamedTableimplementations - β
Locking for separate sheet thread reading
β οΈ Performance 2025-10-25
- β
Restricted storage (i.e. do not return things that are not relevant)
- π Big Performance improvements 2025-10-26
- β
Use of LazyLoading Class
- β
Cell object type π
- π Big Performance improvements 2025-11-01
- β
Cell converted when read (i.e. you will know the type that you want, and you can convert it.)
- π Big Performance improvements 2025-11-04
- β
Cell converted when read (i.e. you will know the type that you want, and you can convert it.)
- β
Use internal
ZipEntryrented buffer- β
Add and explain usage in options
- π Big Performance improvements 2025-11-07
- β
Add and explain usage in options
- β
Investigation into the smallest function πͺ
- π More Performance improvements 2025-11-08
- β
Optimise for
CellConversion.Noneπͺ- π More Performance improvements 2025-11-12
- β
Parallel Sheet threads Access
- β Multiple times (with locking)
- β
Nuget
- β Beta etc.
- π Released as Nuget V1.yyMM.dd ->
1.2511.14
- β
Add
IEnumerables All the way downβ€΅οΈ - i.e. remove the need for Asynchronous awaits
- π Yielding More Performance improvements 2025-11-19
- βοΈβπ₯ Breaking Change π©
- The Async classes now have
Asyncappended to be distinct from the non async versions - But,
Asyncinherit from the non, so they are interchangable
- The Async classes now have
- β
Nuget
- β Manual workflow deploy Release
- β Manual workflow deploy Beta
- β
Read
definedNames (Ranges / Cell / Value / Dynamic) π- β Read from global
- β Handle Dynamics (i.e. do not fall over! π€·)
- β
Deal with blank rows in a sheet π
- β
Return a
nullcell row
- β
Return a
- β
Deal with Empty cells in a row π
- β
Return a
nullcell (e.g.<c r="F12" s="8"/>)
- β
Return a
- β
Implement Sheet scoping of
definedNames- β
i.e.
<definedName name="OrderSize" localSheetId="0">'Try it Yourself'!$C$12:$E$12</definedName> - Note: The above will be referenced as
OrderSize (Try it Yourself)as shown in LibreOffice.
- β
i.e.
- β
Implement Row extraction π
- β Allow ColumnHeader addressing (i.e. start -> end columns)
- β
Implement RangeExtraction π²
- β Global rangeNames
- β
Make
DefinedName's work withlocalSheetIddefinitions - β
User defined, using the
"A1:B10"or"$A$1:$B$10"syntax
- β
Add Benchmarks for "Excel readers" That perform Range Extraction
- β
ClosedXMLVersion="0.105.0" - β
EPPlus_LPGLVersion="4.5.3.13" β οΈ FastExcelVersion="3.0.13" -> Fails on Range Extraction- β
FreeSpire.XLSVersion="14.2.0" - β
Aspose.CellsVersion="25.11.0" β οΈ Extend bencmarks to cover the other large file types- It appears that most of the others do not like the
pivot-tablesfile.!! π€―
- It appears that most of the others do not like the
- β
- β
Investigate memory usage(s) π§βπ»
- β Some performance improvements πββ‘οΈ Performance on 2025-12-01
- β More performance improvements πββ‘οΈ Performance on 2025-12-04
- β Sacrificed a little speed β‘οΈ Performance on 2025-12-07
- β Release as Nuget V2.2512-10 π¨
- Implement Open / Dispose (Async)
- Sheet Names
- Shared Strings
- Implement Sheet loading
- Implement Row extraction
- Skip
- Delayed read - until a cell is actually needed
- Deal with Null / Empty cells (Utilise sparse array?)
- Cell object type π
- Parallel Sheet threads Access
- Multiple times (with locking)
- Read
definedNames (Ranges / Cell / Value / Dynamic) π- Read from global
- Benchmarks π²οΈ
- Add "Excel readers" That support XLSB Extraction
- Release as Nuget V3.yyMM.dd
- Cell object type π
- Deal with
DateOnly/TimeOnlyfields ->CellConversion.NumberAndDatesπΉ - Use of user defined column schema (Excel Number Format nuget?)
- Formatter applied ->
CellConversion.ForceStyles -
Operatorbased conversion - Investigate if the
XmlConvertclasses are efficient
- Deal with
- Benchmarks
- Excercise the Implementation of Interfaces for other Libs (Xml / Zip)
- Separate Nuget(s) ?
- Benchmarks
-
Investigate a different way of storing the Shared strings to the Filesystem, when they are in the MB's
-
Investigate possibility of using "Pipelining" to get data for Next row / cell population after yield?
- Locking
- How to deal with rows that are completely blank
-
fibres?
-
Indicate that things may be
HiddenπΊ- Sheet
- Row
- Column
- Cell ?
-
Indicate that things may be
Readonly- Sheet
- Row
- Column
- Cell ?
-
More ideas to be added later, Please suggest... ;-)