machine-learning-exchange/mlx

Make `check_doc_links` fails with UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 522: invalid start byte

ckadner opened this issue · 0 comments

Describe the bug

There are 2 problems when running make check_doc_links :

  1. too many "third-party" Markdown files from project dependencies are scanned for hyperlinks which cause the process to take way too much time
  2. if any of the Markdown files is not UTF-8 encoded, the check errors out

To Reproduce

Build the MLX UI (see dashboard/origin-mlx/README.md) which will create the node_modules folder. Then, at the project root level, run make check_doc_links.

[mlx] $ make check_doc_links

Checking for Markdown files here:

  **/*.md
  bootstrapper/catalog_upload.json

Traceback (most recent call last):
  File "/Users/ckadner/PycharmProjects/mlx_ckadner/tools/python/verify_doc_links.py", line 188, in <module>
    verify_doc_links()
  File "/Users/ckadner/PycharmProjects/mlx_ckadner/tools/python/verify_doc_links.py", line 151, in verify_doc_links
    file_line_text_url = [
  File "/Users/ckadner/PycharmProjects/mlx_ckadner/tools/python/verify_doc_links.py", line 154, in <listcomp>
    for (line, text, url) in get_links_from_md_file(file)
  File "/Users/ckadner/PycharmProjects/mlx_ckadner/tools/python/verify_doc_links.py", line 52, in get_links_from_md_file
    md_file_content = f.read()
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 522: invalid start byte
make: *** [check_doc_links] Error 1