(src-js) Markdown support
omarhurani opened this issue · 2 comments
Is markdown support doc.markdown()
planned to come any time soon?
Thanks for asking, it certainly helps!
I had some drafts towards this, but took them out to accelerate #172 over the line... Realistically I think getting it implemented would still need quite a bit of work.
The biggest challenge I saw with Markdown is tables:
- As far as I'm aware (and agreeing with this markdownguide.org page), standard Markdown tables are missing important features like merged cells and multi-line text content which can easily come up in Textract results.
- The typical ways of dealing with this (e.g. here) use HTML to some extent or other... So it might be that an implementation ends up inserting HTML to the result quite often anyway.
- The format is quite open-ended (e.g. how much do we prioritize visual appearance of the MD table versus reducing line width or output token count?)
I'm sure there are libraries out there in the NPM ecosystem that could make this easier for us than building from scratch, but today TRP.js supports usage via simple HTML <script>
tags (via IIFE), for the sake of custom human review task UIs in SageMaker Ground Truth and Augmented AI. In those contexts there's no one/obvious module management system - so as far as I understand, taking on dependencies would mean either enforcing some assumptions, or bundling dependencies into the TRP.js IIFE itself and figuring out what we'd need to do to ensure the bundled packages' IP was respected 😖
So the result/tldr is:
- Probably not for at least a month or two, as things are at the moment
- If anybody has feedback on what you'd like to use
.markdown()
for which could help inform the trade-offs, it'd be super useful to hear. For e.g:- Just feeding the result to LLMs (in which case I guess compact markdown tables that don't waste too many
---------
s would be best)- Any data points that md should perform better than HTML for LLM use-cases would also be really interesting to collect
- Optimizing for human-readable .md files (in which case maybe aligning tables is more important)
- Feeding the result to any specific downstream markdown viewers or other tools? (which could affect what dialect(s) would be best)
- Just feeding the result to LLMs (in which case I guess compact markdown tables that don't waste too many
- If anybody would like to suggest tried-and-tested external libraries to help with the markdown rendering it'd be useful to hear, but rolling them in to TRP might not be straightforward 😞
Any updates on this? Currently stuck due to the lack of this feature. Might end up having a python script run in the background just for this feature but not ideal.
More interested in this for the tables specifically