lukesmurray/markdown-anki-decks

Only include referenced media

Closed this issue ยท 10 comments

When the input directory has more than one deck and media files that are not used by all decks, each deck will incorporate all media files. This makes them unnecessarily large. Please only add the media in each deck that is actually referenced.

ahh how bad is this for you. i'll have to think a bit to determine how we'll solve this. are your decks unusable? will also have to be careful to preserve backwards compatibility.

Workaround is to make separate input directories, but this is indeed a problem when using only one input directory. How would this be a backwards compatibility issue? Media not used will not be missed.

I haven't had time to think about this yet, but here are my open questions.
If you can help answer these, that would be very helpful ๐Ÿ™.
Also, feel free to open a PR exploring potential solutions.
Otherwise I'll try to resolve this as soon as possible.

  1. How do we detect referenced media files? For images, we may be able to use beautiful soup to identify image tags. But included sounds may be harder to identify since they don't map to HTML elements. We could always use regexes, but while that works in the short term, it creates issues because people may have code elements that include sound syntax. If possible, I would like to find a safe and straightforward approach for detecting included media files.
  2. How do we resolve this problem for people who have previously imported decks using markdown-Anki-decks? Is it possible for us to delete media files included in previously generated decks? If we do delete previously included media files are there edge cases we aren't thinking about where people could lose data?

For me, it concerns only audio media. Whenever an audio include is processed, that could be added to a set which is checked which media files to included.

Apologies for the delay solving this. As always I am open to pull requests but will try to get to this when I can.

Hope you can fix this, as you are more comfortable with the source code. My files are getting very large at the moment so really looking forward to the implementation for this.

ok will take a stab today

Any update? My apkg files are each nearing 10 MB.

The new release only includes referenced media files. Please let me know if it works for you.

Yes it works, thanks!