aspnet/FileSystem

IFileProvider use case and abstraction issues

dazinator opened this issue · 19 comments

Questioning whether the IFileProvider abstraction needs improving (or its implementations) - due to following issues. I am finding usage of the abstraction needs to be different for different providers, so in essence you have to cater for all specific implementations and the IFileProvider abstraction isn't helpful.

Example use case..

I want an RC2 application to list all available CSS files that are themes for the application, so the user can select one and apply a theme to the site.

In order to distinguish a theme file from other files, the rule I am using is, it must live in a Css/Themes directory.
However, in addition to supporting just physical files (from the web applications wwwroot/css/themes directory), the application should also support the ability for theme files to live within referenced projects within the solution.

Here is an example solution directory structure:

WebApplication/wwwroot/css/themes/theme1.css
WebApplication/wwwroot/css/themes/theme2.css
ModuleA/wwwroot/css/themes/moduleA.css
ModuleB/wwwroot/css/themes/moduleB.css

In order to support files located in referenced projects, it means actually embedding the files into the assembly and using an EmbeddedFileProvider over those assemblies. CopyToOutput is useless because its non transitive

So to fulfil the use case, the application starts and creates an "EmbeddedFileProvider" for the referenced project assemblies, pointing to their "wwwroot" folders, as well as a PhysicalFileProvider for the main web applications "wwwroot" directory, and then wraps those in a CompositeFileProvider to give one IFileProvider instance that we can work with to query for Theme files..

Now given the IFileProvider instance, which is CompositeFileProvider we want to list the themes css files. If things were aligned it would be as simple as:

var allThemesFiles = fileProvider.GetDirectoryContents("css/themes")

And that would give you all files within that directory. However because EmbeddedFileProvider doesn't support / preserve directory information, GetDirectoryContents doesn't actually work - so we only get the PhysicalFiles when doing this. Well GetDirectoryContents does work for EmbeddedFileProvider but only for the root because it's just a single directory, i.e fileProvider.GetDirectoryContents("") returns all files embedded in the assembly. So the application now has to worry about what Providers are in the CompositeProvider, and cater for specific ones like the EmbeddedFileProvider. For instance, for EmbeddedFileProvider it needs to loop through all files in the root (single directory) and try to determine if the originated from a css/themes directory based on the name - where "/" was replaced with ".", i.e:

          // all embedded files actually just live in the root directory.
           var allFiles = fileProvider.GetDirectoryContents("");
            foreach (var file in allFiles)
            {
                if (file.Name.StartsWith("css.themes"))
                {
                    // theme?
                }
            }

Ofcourse, doing a fileProvider.GetDirectoryContents(""); on an IFileProvider (which is a composite in my example) could yield hundreds of files depending upon your application, and we don;t want to check all of those files with this logic which it is particular to Embedded files only.
So in reality, this means we actually need to check the IFileProvider instance, cast it to a composite, check the providers it is wrapping, loop through them, upcast the Embedded ones, and then add our fileProvider.GetDirectoryContents(""); check to find the themes files differently for Embedded providers only.

This is just all yucky imho, is there not a better way? Finding someway to preserve directory information on compile (similar to preserveCompilationContext for dependencies information perhaps) and then restore from that at runtime, such that EmbeddedFileProvider could support directories would solve this imho.

For example, could we perhaps, at build / compile time, generate and embed a txt file, containing the preserved directory information for embedded files, (a manifest of sorts), and then use this special file at runtime to restore directory information to allow EmbeddedFileProvider to support directories again.

To summarize: EmbeddedFileProvider should support GetDirectoryContents and sub-directories.
This is the same issue as reported here: aspnet/StaticFiles#141 (comment)

Related: #184

Not sure what kind of design would actually work in either of these issues.

@davidfowl Any thoughts on either? PhysicalFileProvider has similar issues with canonicalization.

Not sure what kind of design would actually work in either of these issues.

I've suggested one solution on #184, feel free to tear it apart :)

I propose a solution that can fix the problem of GetDirectoryContents() that supports only flat structure.
Because it will always be impossible to know where is the directory and where is the file name in a resource name "A.B.C.D", I propose... to let the user do the job.

Imagine a function Func<string, Tuple<string, string>>) (or maybe a structure SplitResourceName instead of the Tuple, but Tuple will be a first class citizen in the C# version). It would split the resource name into a directory and a file name. If function is null, we manage the files as flat. And we can give an intermediate implementation that manage the optional file extensions (see below). The user custom function should be used for very advanced scenarios (as workaround of the '-' char problem for instance).

So the EmbeddedFileProvider class constructors would be:

EmbeddedFileProvider(Assembly assembly)   // Already exists
EmbeddedFileProvider(Assembly assembly, string baseNamespace)   // Already exists
EmbeddedFileProvider(Assembly assembly, IEnumerable<string> fileNameExtensions)   // New
EmbeddedFileProvider(Assembly assembly, string baseNamespace, IEnumerable<string> fileNameExtensions)   // New
EmbeddedFileProvider(Assembly assembly, Func<string, Tuple<string, string>> resourceNameSplitter)   // New
EmbeddedFileProvider(Assembly assembly, string baseNamespace, Func<string, Tuple<string, string>> resourceNameSplitter)   // New

Note that directories are not the only issue. All file providers will have some discrepancies between the characters allowed in url paths and those allowed in file paths. For instance the Mac and Linux file systems are case sensitive (#16).

We were already considering a customizable Func on the provider to override the default matching of urls to files that lets devs work around the oddities of any given provider. You make a good point that this will need to accommodate both directories and files, so we'll probobly need two Funcs.

Are you sure that the two problems are similar?

The code new PhysicalFileSystem("/foo/bar").GetFileInfo("/FOO/BAR") returns different result on Unix and Windows. But who is in charge of this problem?

  • You want to keep this difference, and it is the user responsability to find a solution.
  • It is the provider responsability, so the code should be new PhysicalFileSystem("/foo/bar", ignoreCase: true).GetFileInfo("/FOO/BAR").
  • It is the caller responsability, so the code should be new PhysicalFileSystem("/foo/bar").GetFileInfo("/FOO/BAR", ignoreCase: true).

I think I prefer the third solution. But in any case it seems to be a problem common to all providers.

The function I propose seems to be specific to the Embedded provider because it is unable to split the directory and the file name. The PhysicalFileSystem has not this problem for instance.

I propose my help for a PR with the following design:

A new structure:

public struct SplitFilePath
{
    public string Directory { get; set; }
    public string File { get; set; }
}

Two new constructors:

EmbeddedFileProvider(Assembly assembly, Func<string, SplitFilePath> resourceNameSplitter)
EmbeddedFileProvider(Assembly assembly, string baseNamespace, Func<string, SplitFilePath> resourceNameSplitter) 

A new static method:

public static class ResourceNameSplitter
{
    public static Func<string, SplitFilePath> CreateFromFileExtensions(IEnumerable<string> fileExtensions);
}

Is it OK for you?

For Func<string, SplitFilePath>, is the input string supposed to be the embedded resource name? Since that doesn't change then this doesn't seem like the kind of operation to run per request. Alternatively would you build a big table in the constructor?

Yes, the string is the resource name.
Yes I will "cache" the result (there is already a comment in the current code!). I can build the cache in the constructor, or request by request. The first is easier to develop and faster to execute (no concurrency to manage). I propose to start with the cache built in constructor in a first time.
Another thing for the CreateFromFileExtensions(). It will make impossible to solve the case of a file name that start directly with the extension (like ".foo") Inside a directory ("A.B.foo" will be split into "A" and "B.foo", not "A.B" and ".foo"). But I think it is acceptable: the user can use the "full custom" function if it has this case to manage.
Last thing: if the "splitter" is null, I consider that the tree is flat (like the current behavior).

Sounds like it's worth a try. Send a PR and we'll see how it works.

@davidfowl please have a look at the proposed design above

I like the idea of a Function that can be set to provide path information for a given embedded resource name.
However, the resource name itself is not enough to get back the original directory / path information, otherwise we could already do that reliably and wouldn't need this Func.

Which basically means, this is passing the buck to the Func - and inside that Func what are people supposed to do?

  1. Have a hard coded mapping between resource names and paths (yuck for obvious reasons).
  2. Come up with some implementations, where you will always reliably get the path for any given embedded resource name. Yes please!

Now, how to implement number 2?
When resource names are produced, they loose path information. This means there is no way to reliably get back the path from just a resource name. So a solution only seems possible if we can find away to preserve the path for the file at the point it gets embedded as a resource (build time) AND we can get access to this path information for embedded resources, at runtime, so that we can do a lookup for the path by a resource name.

I proposed a mechanism for that see: #184

imho the framework should solve this once, and comprehensively. There should be no need for people to provide a custom Func to get stuff working!

Wouldn't fixing the copyToOutput transitivity bug in #2902 eliminate the need for this? I realize the issue is probably superseded by the move to msbuild, but it isn't clear how far off that is. A fix for that particular issue would provide a straightforward way to deal with static website assets and a number of other use cases.

@masaeedu it would alleviate one need to use the embedded file provider so it would certainly help. However this issue still remains. Its like saying all cars are broken but if we make the trains run on time who needs cars? How do you think the train drivers get to work? :-)

In the end I managed to solve this issue.
Steps were:

  1. Created an InMemoryFileProvider -here:
  2. I add embedded resources (themes etc) to the InMemoryFileProvider - giving them a proper directory / path. (test here)
  3. I include the InMemoryFileProvider in the composite, I don't use EmbeddedFileProvider.

Now GetDirectoryContents() works as expected, and I can take control over the subpaths for embedded files.

I'd just like to chip in and add that I've also been having trouble with this abstraction. There's a number of inconsistencies between the behavior of PhysicalFileProvider and EmbeddedFileProvider that breaks the abstraction.

Our use case is that we've implemented a modular system that allows content to be defined in your web project or in plugin packages that referenced via nuget. For some features of the application we need to be able to scan certain directories for things like templates so we can extract metadata, or for razor files we use for mail templates. Previous to asp.net core we did this by cobbling together a VirtualPathProvider that works with embedded resources. It wasn't perfect but it got the job done. I thought that this would be easier in asp.net core but I haven't found that to be the case. My (work in progress) solution is to wrap various classes and provide a new EmbeddedResourceProvider.

The major pain points were:

  • EmbeddedFileProvider: GetDirectoryContents() does not support sub-directories
  • EmbeddedFileProvider: When using GetDirectoryContents(string.Empty) , the Name property has the full embedded resource path rather than just the file name.
  • IFileProvider: The lack of a VirtualPath property is inconvenient and makes it more cumbersome to retrieve a child directory or save the path for later use.

The limitations of the embedded resource system do make this area difficult but the presence of a specific EmbeddedFileProvider makes it feel like this stuff should just work.

I've just hit this issue with route for area's where the area views are embedded in an external assembly; the area name was low case, but the folder in the assembly was CamelCase; the controller would fire but not find the views, if the same is performed with physical files everything works; please consider making this consistent

This issue was moved to aspnet/Home#2553