ciselab/CPS_repo_mining

UnicodeEncodeError from Commit message to output file.

IvDinten opened this issue · 0 comments

Issue:

Encoding error; message from Commit cannot be printed to output file.

Project causing issues:

"Arduino": {"local": None, "remote": "https://github.com/esp8266/Arduino"}
Date fetched: Jun 21, 2021
Commit hash: b4774edbfb60a969eb89ec5ce8c8938bead4f829

Reproduce:

  1. Run repository_commits_mining.py script with the following parameters: l7, to select the local checkout of the Arduino project.
  2. Error appears in the terminal output.

Terminal output:

C:\Users\Imara\PycharmProjects\CPS_repo_mining\env\Scripts\python.exe C:/Users/Imara/PycharmProjects/CPS_repo_mining/pd/repository_commits_mining.py l7
Input: ['l7']
Keywords: ['performance', 'memory', 'runtime', 'slow', 'slower', 'slowing', 'fast', 'faster', 'increase', 'decrease', 'memory-heap', 'memory-leak', 'bottleneck', 'overhead', 'deadlock', 'livelock', 'infinite', 'impasse', 'hang']
Traceback (most recent call last):
  File "C:\Users\Imara\PycharmProjects\CPS_repo_mining\pd\repository_commits_mining.py", line 238, in <module>
    main()
  File "C:\Users\Imara\PycharmProjects\CPS_repo_mining\pd\repository_commits_mining.py", line 223, in main
    dig(project, projects[project])
  File "C:\Users\Imara\PycharmProjects\CPS_repo_mining\pd\repository_commits_mining.py", line 126, in dig
    print_commit_header(commit)
  File "C:\Users\Imara\PycharmProjects\CPS_repo_mining\pd\repository_commits_mining.py", line 48, in print_commit_header
    print(f"\nhash: {commit.hash}\ndate: {commit.committer_date}\nmessage: {commit.msg}", file=sourcefile)
  File "C:\Users\Imara\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\uff1a' in position 1170: character maps to <undefined>

Process finished with exit code 1

Attachments: