droundy/display-as

typed return value from format_as!

Opened this issue · 21 comments

Hi. I just found this crate. I have a need for something very like it. However, there is one design decision in display_as that I want done differently: format_as! returns a String. This doesn't seem right to me. In particular, the return value from format_as! can't be fed as a parameter to another format_as! because the type is wrong and it will be double-escaped.

I have a vague idea of how this might be improved. Below is a sketch of what I thought of. Please let me know what you think. If you like these ideas (or something like this) I can probably do the implementation.

I think this would be most easily done as part of the display_as crate, particularly because of the desire to reuse most of the format_as macro. The only thing that needs to be done there is to provide a version with the Is return type but I think that's probably best done within the macro rather than (say) as a proc_macro wrapping up the existing format_as!.

If you like the basic shape there are of course some bikesheds about names. I think Is needs to be short. The module name could be anything really. Maybe you have opinions about the fns in Is.

mod typed {

#[derive(Copy,Clone,Debug,Hash,Eq,PartialEq,Ord,PartialOrd)]
pub struct Is<F,T> { t: T, marker: PhantomData<T> };

// not real rust syntax but hopefully you get the idea
macro format_as!<F:Format>(f:F, template: &str, ...) -> Is<F,String>;

impl DisplayAs<F> for Is<F, String> { just copy literally }
impl DisplayAs<F> for Is<F, &str> { just copy literally }

impl<F,T> Is<F,T> {
  fn is(t: T) -> Is<F,T> { Is { t, marker: PhantomData::new() } }
  fn raw(&self) -> T { &self.0 }
  fn raw_mut(&mut self) -> T { &mut self.0 }
  fn into_raw(self) -> T { self.0 }
}

In the future there could be additional cases for impl DisplayAs.. for Is...

Sounds like it could be reasonable, but I'd like to hear more about your use case. I've found that in my own use of this crate, format_as! never actually gets used. Instead everything ends up being with_template. I end up only using format_as for the test suite.

I'd like to hear enough to see that you wouldn't be better off with a different use pattern entirely before giving you the okay to implement this.

If we go ahead with your proposal, I'd like to consider a big change, which would be instead of turning a pub struct, to return an impl Trait. I'm not sure that would be superior, but if possible it's nice to avoid making public wrapper types. (I did do so when the As type, which is a bit of a wart in my opinion.) Not sure what traits would be implemented, clearly DisplayAs, but probably also Into<String> and maybe AsRef<str>?

I'd we went with your struct approach I'd also want to use these standard traits rather than your ad hoc access methods.

But first I'd like to hear how you plan to use format_as.

Okay. The reason that I don't use format_as in that way is that rather than storing strings, I store whatever I need to generate strings (until I serve a page). With normal web serving, that is a great strategy because it retains flexibility to change the presentation easily. But I can see in a case with more dynamic and varied content (e.g. an error message field) you might have too many different sources of content, and want to just store the string after generating it wherever the error came from. So I'm at least able to imagine the use case. Also for constant strings, which have always been embedded in templates for me, but I can imagine wanting to hold separately.

So now the question is API. For Into and AsRef and even AsRefMut, I don't see a problem in terms of safety, in the sense of

impl<F> Into<T> for Is<F,T> { ... }

impl<F> AsRef<T> for Is<F,T> { ... }

Those traits are explicitly designed for accessing the thing wrapped in this sort of type.

My trait proposal was not to create a new trait for the user to define (not sure how that would work), but to use impl Trait to hide the concrete type that we are returning from format_as!. If we used the impl Trait approach, we could have something like:

fn format_one<F: ..., T: DisplayAs<F>>(data: &T) -> impl DisplayAs<F> + Into<String> + AsRef<str> {
  format_as!("{}", data)
}

In this case I'd probably leave off any mutable trait. But now I don't see how you'll store this result in your data structures. So it sounds like a non-option for your use case, even though it's appealing for mine. You could store this result in a local variable, but
not in a data structure, unless said data structure is generic on the string type.

So I think your original implementation with Into and AsRef would make most sense to me. Although I do wonder whether there is benefit in making it generic on the String type. I think I'd rather see:

#[derive(Copy,Clone,Debug,Hash,Eq,PartialEq,Ord,PartialOrd)]
pub struct StringAs<F> { inner: String, marker: PhantomData<F> };

Annoyingly we'd also need a serde feature and to derive serialize and deserialize for these. Did you have a use case for storing other data types than String in one of these? I suppose you might want to store a &'static str or even a &str. But display-as never generates them, so presumably that's where you were thinking it could as easily be defined outside this crate.

It almost feels like what you want is As, except for two issues. One is that As is always a reference (does not own its content), and the other is that it escapes strings, so it doesn't serve your need at all. I think it's worth pondering whether a change to As might be sufficient. Its implementation is hidden, so we could convert it to an enum that would allow for alternatively holding a pre-formatted String. The remaining challenge is the reference nature. Perhaps we could change from

impl<F: Format, T: DisplayAs<F> + ?Sized> Display for As<'a, F, T>

to

impl<F: Format, T: DisplayAs<F>, P: Borrow<T>> Display for As<F, T, P>

which might allow As to hold either references or owned values. Then we'd maybe just need a new constructor to make a formatted string:

fn formatted_as<P: Borrow<str>>(value: P) -> As<F, str, P> { ... }

And then we'd also need (to get the strings out) something like

impl<F,T,P> Into<P> As<F,T,P> { ... }

I think it's worth exploring if this can work. I can't quite see what the concrete enum type would look like, though.

impl Into for Is<F,T> { ... }

impl AsRef for Is<F,T> { ... }

Let me produce a concrete example for why this might be a problem:

struct Report {
  message: Is<Html, String>,
  ..
}

struct User {
  name: String,
  ...
}

// naive and junior maintenance programmer adding new feature
fn do_new_thing(user: &user, ...) -> Report {
  ... do the thing ...
  Report { message: format!("{} did the new thing", &user.name) }
}

The programmer ought to have used format_as! so that the name gets properly escaped. This doesn't compile because format! returns String which is not Is<Html,>. The right fix is to change to format_as! (my proposed typed::format_as!, that is). Type inference will select the right implementation and the escaping is automatic.

But our maintenance programmer hasn't figured out that format_as! exists and doesn't really know what Is is for. So they just write:

  Report { message: format!("{} did the new thing", &user.name).into() }

Hey presto! It compiles! Ship it! But it's an XSS vulnerability.

.into() is generally a thing you can just sprinkle in your code whenever you like, with at worst some slighty suboptimal performance. But declaring that some raw untagged un-escaped string is actually Html is not something you should do without thinking about it. This is much less of a hazard with Is-specific conversion functions.

With AsRef and Display, the hazard is double-escaping, which is less of a problem but still undesirable.

It seems to me that it would be best for the project to be able to decide how strict to be.

Hence my idea (prompted by your mention of traits) that format_as! would use a user-supplied type, with the minimal needed conversions in a specific trait implemented by the user. This also does away with problems like what to do about serde. The project gets to choose whether their Is types implement Serialize and if so whether they're serde(transparent). One project might even have several different Is types if they wanted to get super-fancy.

Okay, I think we're getting closer to a solid solution. I think if we want a new macro (or version thereof) to generate what you're looking for, it should be write_as!, which would be analogous to the write! macro in the standard library. This would also have applicability beyond strings, in formatting directly to a file or socket. It also meshes well with the internal implementation of this crate. The challenge, as I recall, is that there is a bit of complexity that I can't quite recall in making the same code with with strings and std::fmt:: Write.

Another option I could see would be to create an Is with standard conversions like I would prefer, and then let security uses like yourself wrap it in a newtype that has unsafe_from etc methods? It seems kind of silly to require users to duplicate functionality, even if it could be implemented in a separate crate. And now that I think about this a bit more, I wonder if your simple wrapper would make the most sense, but expressed as a separate macro than format_as!. And for your consideration types, include "unsafe" on their name and in the docs make clear why they're not safe (i.e. you can't guarantee that it is properly escaped "Format").

But I'm now thinking if you want the type to be a security boundary, then I'd not want to parameterize it on the string type.

Maybe StringAs<HTML>? And one could give it the same kind of safety properties that ensure that a String is valid utf8.

I suppose there could be a reference type also such as StrAs<HTML>? If there are only going to be the two types ever used, it feels weird to use generics to

Oops, finger slipped...

What is the advantage of Box<str> over String?

I don't much care for providing a complicated generic type when we only ever use one version of it, which I suppose is the argument for something like write_as, i.e. a way to write into some type that is created by the user. Oh, I see write_as is already implemented. So you could use that to fill in any String-like or File-like type... basically any time that you can use write! on.

So the issue is then specifically that it's hard to create a variable-argument macro that calls a variable-argument macro?

I've just implemented the simple version of this, with a FormattedString<F> type (which maybe should be called StringAs<F>. If you have another type you want to use instead, you can implement From<FormattedString<F>> for your own type, and insert .into() after the macro, which doesn't seem like such a huge pain.

I know this isn't what you're asking for, but I don't see a nice way to get what you're asking for, which is a fully generic format_as! macro that can construct any plausible type the users might want. It sounds like a recipe for making the library very hard to use.

Hi. Thanks for looking at this. I had a quick look at the docs from your current trunk and it seems like this is in the direction I was hoping for :-). I think I should try this out in my project to see what it is like to work with.

Do you have any preference between FormattedString<HTML> or StringAs<HTML> or StringFormattedAs<HTML>? I'm torn on the name, which I why I haven't published a release yet.

I spent a while playing with this. I found that I had to make my own html string type because I needed it to be Serialize, Deserialize, and #[serde(transparent)]. I was able to get this mostly working but I did have to write my own macro around format_as anyway, which more or less worked. The lack of an equivalent to str was awkward too. (I have found on a previous occasion that the obvious newtype wrapper struct MyStr(str) cannot be constructed without unsafe { transmute }.) I'm not sure I am going to keep the new code, though. :-/.

I'm not sure how this could be improved in display_as without allowing for user-defined types. Different users may well want different serializations?

Do you have any preference between FormattedString or StringAs or StringFormattedAs? I'm torn on the name, which I why I haven't published a release yet.

I think if display_as is going to define the type, FormattedString<HTML> is the best name.

I hope this is helpful. Thanks for your attention and work and sorry not to be more encouraging.

Adding serde support behind a feature flag is easy and would make sense.

I thought about a FormattedStr, but want sure of the use case. Can you share more about what you were wanting to do with a str version? The only scenario I could think of was for &'static string constants, which is kind of orthogonal to FormattedString.

Okay, I thought of one other use for FormattedStr, which is for zero copy parsing. Which is different in terms of required features.

Adding serde support behind a feature flag is easy and would make sense.

Well, yes. But should it be #[serde(transparent)] or something else? What if in some application someone wants to syntax check it on input? It all seems like display_as has to make a lot of assumptions. I guess if people want to do Funky Stuff they can make their own wrapper type and a macro wrapper...

I thought about a FormattedStr, but want sure of the use case. Can you share more about what you were wanting to do with a str version? The only scenario I could think of was for &'static string constants, which is kind of orthogonal to FormattedString.

FormattedStr is useful if you have anything in your program which generates a str which is in fact known to be HTML. At that point you want to be able to convert it to a FormattedStr. An obvious example is literals in the program text - although that's an even worse problem because you can't const initialise &MyStr (where struct MyStr(str)) so you need a third type struct MyStaticStrRef('static str).

Okay your FormattedStr uses don't involve FormattedString, so I think I can treat that as a separate feature to be added.

Regarding the representation on parsing, I think pick something that is useful at least sometimes, and users can use it or not. Users need not use the feature flag, and also need not include FormattedStrings in types that they serialize. I don't see how to generate a user specified type without a horrendous level of complexity.