w3c/epub-specs

"sideloading" and "untrustworthy" references are unclear given lack of authentication and integrity

Closed this issue · 7 comments

The reading system spec refers to "untrustworthy" EPUB files as files that are not signed and were sideloaded. But assurances of authenticity and integrity are known to be lacking for EPUB 3.3 generally; package-wide signatures are not possible.

I don't think we should suggest that "sideloaded" files (not defined in the spec, but I think the implication is files that weren't accessed through an integrated store) are likely to be less trustworthy. Indeed, books purchased through stores seem more likely to come with DRM, which has substantial known violations of user privacy. And signatures on individual files within an EPUB package are not a good signal to use in determining whether content is trustworthy or not.

It might be that this paragraph should just be removed until #2265 is addressed. But if reading systems should be recommended to prompt for access about untrusted content, that recommendation should apply to content from any source (perhaps on the first use of that source).

I am not sure if I agree, @npdoty. More exactly, you may be right for the privacy aspect, but the "trustworthiness" may be a security issue, too. "Sideloaded" files indeed refer to EPUB entries that nobody has necessarily checked, and may therefore include malicious scripts that would not pass through the integrated store (which may disallow scripts altogether). In this respect, sideloading does indeed require additional care.

I.e., I would prefer to keep that paragraph in the spec. It does not do any harm, but draws attention to an extra danger, which is the goal here.

Like @iherman, I think it's helpful to have the recommendation, but like @npdoty when I look at how the paragraph is phrased it's a bit ambiguous about what we mean. Since you can tie reading systems into other sources, like public libraries, what if generalize the requirement like this:

Reading systems SHOULD treat content from any new source, including from an integrated bookstore, as insecure (e.g., prompt users to allow scripting and network access the first time the source is accessed). If a reading system allows users to load their own content (e.g., through the process of "sideloading"), each instance SHOULD be treated as insecure.

@mattgarrish I understand (and agree) with the direction, but I am not sure what "integrated bookstore" means. If I use the Apple Books app (which does allow what we call 'sideloading'), the Apple Bookstore is an integrated bookstore, right? But I would not expect the Apple Book app to prompt a user to allow anything for a book coming from its own bookstore (even when I buy my first Apple Books...). (Maybe this falls into the SHOULD but not MUST interval?)

But maybe this is nitpicking. I am o.k. changing the text in this direction.

The Web model has tended not to standardize trust differences based on stores operated by a central party, but to allow for valuable content to be posted without any prior permission, and mitigate against the risk of dangerous content, from any source on the Web.

But I would not expect the Apple Book app to prompt a user to allow anything for a book coming from its own bookstore

Right, but I don't believe the wording would require a separate prompt. When you first install the app it could be part of the user agreement that you accept scripting, network activity, certain data collection, and the other things we say users should be alerted to. I don't think making it a general requirement adds too much of a burden, and it is only a recommendation.

If the user then links an overdrive account to the app, for example, it would be helpful to let them know that content may be insecure and ask for whether to enable scripting by default or not.

We could probably throw out the last sentence, looking at it again. You could blanket accept that you want to allow scripting on all sideloaded content, too. We shouldn't be overly prescriptive and say every publication needs accepting. Some reading systems may choose to be that restrictive anyway.

I am o.k. with our points, @mattgarrish, and I would actually keep the explicit reference to sideloading, simply because people did ask us about it. So +1 to change the paragraph to what you proposed in #2542 (comment).

The Web model has tended not to standardize trust differences based on stores operated by a central party, but to allow for valuable content to be posted without any prior permission, and mitigate against the risk of dangerous content, from any source on the Web.

@npdoty I do not think the proposal of @mattgarrish in #2542 (comment) contradicts that. That being said, sideloading is a very special thing in the EPUB world, which does not have any real counterpart in the browser Web, so I believe it is worth referring to it (again, the question did arise in discussions, hence the usefulness of having it there imho).

I will put @mattgarrish's proposal into a PR so that we can wordsmith in specific terms.