jgm/pandoc

Image alt tag and figcaption should be differentiated

matthewlehew opened this issue · 14 comments

As discussed in this thread on the discussion group, the behavior of captioning images is less than ideal for maintaining accessibility.

Currently, the bracketed text becomes both the alt tag and the figcaption:

![alt text](lalune.jpg) 

  -> 

<figure> 
<img src="lalune.jpg" alt="alt text" /><figcaption>alt text</figcaption> 
</figure> 

When wanting to create accessible documents, it is very often necessary to provide an alt tag that describes the image itself with a caption that provides extra explanatory content. Having them be the same doesn't align with their intended use.

Adding an additional descriptor creates a title tag, which is unhelpful in this instance:

![alt text](lalune.jpg "title") 

  -> 

<figure> 
<img src="lalune.jpg" title="title" alt="alt text" /><figcaption>alt text</figcaption> 
</figure> 

I propose amending the behavior so that the additional descriptor becomes the alt text and the bracketed text becomes the figure caption, with some other method used to derive the image title.

![caption text](lalune.jpg "alt text") 

  -> 

<figure> 
<img src="lalune.jpg" alt="alt text" /><figcaption>caption text</figcaption> 
</figure> 

The additional benefit is that this would no longer require the unofficial method of escaping a space to suppress a caption and have it persist as solely an alt tag. Instead, you would simply:

![](lalune.jpg "alt text") 

  -> 

<figure> 
<img src="lalune.jpg" alt="alt text" /> 
</figure> 

I don't think changing the meaning/syntax of the input markdown is likely to happen. This would be a big break in backwards compatibility.

Having the title text or a caption attribute become the target caption seems more likely.

Fair enough! I am not a programmer so I only have a user’s perspective. I’ll accept whatever syntax is necessary to differentiate between caption and alt.

mb21 commented

Should this be closed in favour of #3177 ?

@mb21 I don't think so, I think you'd have to say more as how it would fit...?

mb21 commented

It’s just that the current figure handling is somewhat of a hack and a proper figure element int the AST, i.e. #3177, would handle caption and alt separately.

@mb21 I agree. I missed #3177 when searching on this topic. It appears my issue is a subset of the issue raised there.

My only additional comment is that #3177 should be considered high priority. The current way Pandoc handles figures is a serious hindrance for accessibility in academic publishing.

I don't see it as a subset. #3177 could be done and the request of #4737 would be unresolved. If the present concern was added to #3177, that would be great.

I just added it to be sure, but my understanding is that creating an explicit figure object in AST implies that the caption would be separate from the alt text.

I think this bug should be left open. The syntax shouldn't be changed, but the documentation should deprecate the usage. Alt text and caption text should never be the same. This never makes sense. It means that blind readers hear the same sentence repeated twice. It means that blind readers hear the same sentence repeated twice. The documentation should explain that.

jgm commented

What I'm saying is that there are two separate problems

  1. That you can't create separate alt and caption. This is bug 3177.

  2. That there exists a feature that sets the alt text and the caption to be the same.

That feature is a bug. It can't be removed, I know, but it can and should be deprecated. The documentation should explain why it was a mistake, and pandoc should output a warning when it is used.

Even after 3177 is closed and support is added for the separate figure element, the misfeature will need to be deprecated. (And there is no reason to wait IMO.)

mb21 commented
  1. That there exists a feature that sets the alt text and the caption to be the same.

Pandoc's currently simply generates the caption from the alt text and there is no place in pandoc's internal document representation to store the caption separately. So we cannot fix 2) without fixing 1) first.

However, we could change pandoc's HTML writer to only output a caption, and an empty alt attribute if a caption is output. That way, screen readers wouldn't read it twice.

echo '![foo](bar.png)' | pandoc

<figure>
  <img src="bar.png" alt="" />
  <figcaption>foo</figcaption>
</figure>
jgm commented

However, we could change pandoc's HTML writer to only output a caption, and no alt text if a caption is output. That way, screen readers wouldn't read it twice.

This would certainly be an improvement on current behavior. However, I still stress that with the current explosion of open educational resources in higher ed and the absolute necessity of accessibility, my personal belief is that it should be a top priority for Pandoc to be able to output separate alt text and captions without having to manually add them after conversion.