Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory.EnumerateFiles regressed in .NET 8 #110754

Open
Tjianke opened this issue Dec 16, 2024 · 3 comments
Open

Directory.EnumerateFiles regressed in .NET 8 #110754

Tjianke opened this issue Dec 16, 2024 · 3 comments
Labels
area-System.IO tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@Tjianke
Copy link

Tjianke commented Dec 16, 2024

Description

My team observed a regression for the performance of Directory.EnumerateFiles in .NET 8. We store a large number of database redo logs in a folder, named with a prefix and a log generation number (e.g., E10ABCD1234.log). Our goal is to determine the maximum log generation number efficiently.

To achieve this, we have developed a fast search algorithm that looks for the highest generation log files in a hierarchical manner, starting from E10F*******.log to E100*******.log and so on, until the last digit. We use Directory.EnumerateFiles(directory, filter) to detect matching files.

We compared the performance of our search algorithm against a direct enumeration of all files to get the maximum generation number.

  1. The direct enumeration takes approximately 0.5 seconds in both .Net Framework & .Net 8.
  2. Our search algorithm is significantly faster with a baseline of 4 milliseconds for 400,000 log files in .Net Framework, but it degrades a lot in .Net 8. Below are the detailed performance metrics.
Runtime # of Log Elapsed Time Enum Times
.NET Framework 4.8.9282.0 1000 00:00:00.0040757 99
.NET Framework 4.8.9282.0 10000 00:00:00.0051796 115
.NET Framework 4.8.9282.0 100000 00:00:00.0039643 101
.NET Framework 4.8.9282.0 400000 00:00:00.0042406 101
.NET 8.0.11 1000 00:00:00.0341829 99
.NET 8.0.11 10000 00:00:00.3642297 115
.NET 8.0.11 100000 00:00:02.7944484 101
.NET 8.0.11 400000 00:00:11.4074592 101

Previously in .Net framework it uses FindFirstFile which takes the search filter to find out the first / next file.

This API changed in .Net Core/6/8 implementation,

  1.  Directory.EnumerateFiles enumerates all files and see if current one match the prefix. So, this means FindHighestGenerationLogFileFastV2 scans the whole directory for many times.
    

runtime/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerator.Windows.cs at 6362f242fc6e3065948b3bf922406509cb721a73 · dotnet/runtime

  1.  Directory.EnumerateFiles uses NtQueryDirectoryFile to find all file information, which probably could be slow (?).
    

Configuration

Regression?

Data

Analysis

@Tjianke Tjianke added the tenet-performance Performance related issue label Dec 16, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 16, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 16, 2024
@vcsjones vcsjones added area-System.IO and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 16, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@stephentoub
Copy link
Member

cc: @JeremyKuhne

@KalleOlaviNiemitalo
Copy link

There is previous discussion in #31214.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

4 participants