Skip to content

Conversation

@tmds
Copy link
Member

@tmds tmds commented Jun 30, 2025

(By default) Linux allows 128 inotify instances per user. By sharing the inotify instance between the FileSystemWatchers we reduce contention with other applications.

Fixes #62869.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 30, 2025
@tmds
Copy link
Member Author

tmds commented Jun 30, 2025

@dotnet/area-system-io @stephentoub ptal.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmds big thanks for your contribution! For now I've reviewed 25% of this PR (I need to wrap my head around all the locks and it's going to take me a while), PTAL at my comments.

@tmds
Copy link
Member Author

tmds commented Jul 2, 2025

For the locking, this overview may be helpful:

  • watcherslock: guards the list of Watcher
  • lock on Watcher: guards its tree of WatchedDirectory
  • lock on Watch: guards its list of WatchedDirectory
  • addLock: this reader writer lock enables concurrent adding of watches ("readers"), but no watches may be added while an inotify watch is removed ("writer")

To prevent deadlocks, the locks (as needed) are taken in this order: watchersLock, addLock, lock on Watcher, lock on Watch.

@tmds
Copy link
Member Author

tmds commented Jul 7, 2025

@adamsitnik this is a challenge to review so feel free to ask any questions you have while looking at the code.

I'm also going to make some time this week to look at the PR with my reviewer's hat on.

@tmds
Copy link
Member Author

tmds commented Jul 10, 2025

@adamsitnik I'll be on an extended break starting next week. I wonder if you have any additional feedback/questions that I can still look into tomorrow.

If we'd like to address #62869, I think this is the way to go.
There's a cost in code complexity as we can no longer consider each inotify watch to be owned by a single FileSystemWatcher.

I don't think we need to rush this in. It would be good to have some target date in mind so this doesn't get postponed indefinitely.

@tmds
Copy link
Member Author

tmds commented Aug 18, 2025

@adamsitnik @stephentoub where do you want to go with this? Do you want to target .NET 10? Or perhaps defer to early .NET 11?

@jeffhandley
Copy link
Member

We will target .NET 11 for this, @tmds. Thanks for your patience. It might take some more time for @adamsitnik to get back to reviewing it. I'm also adding @jozkee as a reviewer to help and load-balance with Adam (I'll let them coordinate if/how to divide the reviewing).

@jeffhandley jeffhandley added this to the 11.0.0 milestone Sep 2, 2025
@jeffhandley jeffhandley requested a review from jozkee September 2, 2025 02:45
@tmds
Copy link
Member Author

tmds commented Nov 5, 2025

@adamsitnik @jozkee can we target .NET 11 preview 1 for this PR?

@tmds
Copy link
Member Author

tmds commented Nov 18, 2025

@jeffhandley @adamsitnik @jozkee it would be nice if we can work towards getting this merged.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Linux implementation of FileSystemWatcher to use a single shared inotify instance across all FileSystemWatcher objects, addressing the low per-user limit of 128 inotify instances on Linux. The refactoring introduces a new architecture with improved watch management and event processing.

Key changes:

  • Introduces a shared inotify instance pattern to reduce contention with other applications
  • Implements a dedicated thread for reading inotify events with ThreadPool-based event handler invocation to prevent blocking
  • Adds new internal classes (INotify, Watcher, WatchedDirectory, Watch) to track watch state and manage the relationship between paths and watch descriptors

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.

File Description
System/IO/FileSystemWatcher.cs Extracts CreateBufferOverflowException method for reuse in Linux implementation
System/IO/FileSystemWatcher.Linux.cs Major refactoring: implements shared inotify instance, new watch tracking structures, improved event processing with Channel-based queuing
System.IO.FileSystem.Watcher.csproj Adds required dependencies for System.Collections.Concurrent and System.Threading.Channels on non-Windows platforms
Interop.INotify.cs Adds IN_MASK_ADD flag support needed for sharing inotify instances across multiple watchers

public NotifyFilters NotifyFilters { get; }
public Interop.Sys.NotifyEvents WatchFilters { get; }
public bool IncludeSubdirectories { get; }
public bool IsStopped { get; set; }
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IsStopped property is accessed from multiple threads without synchronization (e.g., set on line 1073 within a lock, but read on line 231 without holding the same lock). This should be marked as volatile or use Interlocked operations to ensure proper memory visibility across threads.

Suggested change
public bool IsStopped { get; set; }
private volatile bool _isStopped;
public bool IsStopped
{
get => _isStopped;
set => _isStopped = value;
}

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronization happens with other locks, like the one on the watcher instance, and IsStopped being set before signalling the writer completion.

private readonly WeakReference<FileSystemWatcher> _weakFsw;
private readonly Channel<WatcherEvent> _eventQueue;
private INotify? _inotify;
private bool _emitEvents;
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _emitEvents field is accessed from multiple threads without synchronization (set on lines 1058 and 1074, read on line 1207). This should be marked as volatile to ensure proper memory visibility across threads.

Suggested change
private bool _emitEvents;
private volatile bool _emitEvents;

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1368 to 1383
void Append(Span<char> pathBuffer, ReadOnlySpan<char> path)
{
if (path.Length == 0)
{
return;
}

if (length != 0 && pathBuffer[length - 1] != '/')
{
builder.Append(System.IO.Path.DirectorySeparatorChar);
pathBuffer[length] = '/';
length++;
}

path.CopyTo(pathBuffer.Slice(length));
length += path.Length;
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Append local function in GetPath does not perform bounds checking before writing to pathBuffer. If the total path length exceeds the buffer size (PATH_MAX = 4096), this will throw an IndexOutOfRangeException. Consider adding a check to ensure 'length + path.Length' (plus 1 for separator) does not exceed pathBuffer.Length before copying, or handle potential exceptions gracefully.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the code assumes path length does not to exceed PATH_MAX.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed 30-40% of the code. Since I am not very familiar with this code and I know that historically we have disabled plenty of FSW tests (they were flaky) my next step will be to implement stress tests (and when the implementation passes them review the rest).

Big thanks for your contribution @tmds !

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmds Could you please run these stress tests with just .NET 10 and with your new implementation and report back the results?

For the latter you should be able to do that with corerun $pathToCoreRun FswStressTests.dll

@tmds
Copy link
Member Author

tmds commented Dec 12, 2025

@adamsitnik, thanks for looking at this! I'll run the stress tests on Monday and report how it went.

@tmds
Copy link
Member Author

tmds commented Dec 15, 2025

I built vmr's 10.0 branch with this implementation and ran the stress tests:

$ dotnet run -c Release
=== FileSystemWatcher Stress Tests ===
Started at: 2025-12-15 13:48:48

[1/15] Running: Basic File Creation
    ✓ PASSED in 0.02s

[2/15] Running: Multiple File Operations
    ✓ PASSED in 0.31s

[3/15] Running: Directory Creation
    ✓ PASSED in 0.00s

[4/15] Running: Recursive Directory Monitoring
    ✓ PASSED in 0.15s

[5/15] Running: High-Frequency File Operations
    ✓ PASSED in 2.01s

[6/15] Running: Large File Monitoring
    ✓ PASSED in 0.01s

[7/15] Running: Concurrent Watchers
    ✓ PASSED in 1.00s

[8/15] Running: Rename Operations
    ✓ PASSED in 0.41s

[9/15] Running: Rapid Create/Delete Cycles
    ✓ PASSED in 3.02s

[10/15] Running: Nested Directory Operations
    ✓ PASSED in 2.37s

[11/15] Running: Parallel Watchers and File Creation
    ✓ PASSED in 2.01s

[12/15] Running: Parallel Create/Delete Cycles
    ✓ PASSED in 8.01s

[13/15] Running: Rapid Watcher Start/Stop with Producer Thread
    ✓ PASSED in 10.02s

[14/15] Running: Aggressive Watcher Lifecycle Management
    ✓ PASSED in 0.78s

[15/15] Running: Global Watcher Contention Test
    Created 6492 watchers, 4037 received events, total 8336495 events
    ✓ PASSED in 8.55s

=== Summary ===
Total scenarios: 15
Passed: 15
Failed: 0
Total execution time: 38.67s

All tests passed!

To verify I used this implementation for the test run, I checked the inotify_add_watch syscalls, and all of them were using the IN_MASK_ADD flag (which is added in this PR).

@adamsitnik
Copy link
Member

I built vmr's 10.0 branch with this implementation and ran the stress tests:

$ dotnet run -c Release
=== FileSystemWatcher Stress Tests ===
Started at: 2025-12-15 13:48:48

[1/15] Running: Basic File Creation
    ✓ PASSED in 0.02s

[2/15] Running: Multiple File Operations
    ✓ PASSED in 0.31s

[3/15] Running: Directory Creation
    ✓ PASSED in 0.00s

[4/15] Running: Recursive Directory Monitoring
    ✓ PASSED in 0.15s

[5/15] Running: High-Frequency File Operations
    ✓ PASSED in 2.01s

[6/15] Running: Large File Monitoring
    ✓ PASSED in 0.01s

[7/15] Running: Concurrent Watchers
    ✓ PASSED in 1.00s

[8/15] Running: Rename Operations
    ✓ PASSED in 0.41s

[9/15] Running: Rapid Create/Delete Cycles
    ✓ PASSED in 3.02s

[10/15] Running: Nested Directory Operations
    ✓ PASSED in 2.37s

[11/15] Running: Parallel Watchers and File Creation
    ✓ PASSED in 2.01s

[12/15] Running: Parallel Create/Delete Cycles
    ✓ PASSED in 8.01s

[13/15] Running: Rapid Watcher Start/Stop with Producer Thread
    ✓ PASSED in 10.02s

[14/15] Running: Aggressive Watcher Lifecycle Management
    ✓ PASSED in 0.78s

[15/15] Running: Global Watcher Contention Test
    Created 6492 watchers, 4037 received events, total 8336495 events
    ✓ PASSED in 8.55s

=== Summary ===
Total scenarios: 15
Passed: 15
Failed: 0
Total execution time: 38.67s

All tests passed!

To verify I used this implementation for the test run, I checked the inotify_add_watch syscalls, and all of them were using the IN_MASK_ADD flag (which is added in this PR).

Excellent! Our plan is that @jozkee is going to review the PR and I am going to merge it in the 2nd week of January.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.IO community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileSystemWatcher on Linux uses an excessive amount of resources

4 participants