kochkinaelena/branchLSTM

Tidy up depth analysis script

Closed this issue · 3 comments

depth_analysis.py is used to generate the results in Tables 3-5 of the paper.

Names of some files have changed since the script was prepared, and the output could do with some editing so that the format more closely resembles that of the paper.

Initial tasks were:

  1. Update filenames.
  2. Make output more closely resemble the format of tables in the paper.
  3. Remove hardcoded best trial number.
  4. Add further functionality to examine the hyperparameter optimisation in more detail.

These have been completed in the following commits:

  1. db94f78
  2. Table 3: 2d42dfb 6be5109 Table 4: 8d765b4 edd717f Table 5: abdcaf3
  3. 47767b4
  4. f8eecf0 33c5f2e cdc937f

And I've found another couple of things I'd like to edit before closing this issue:

  • Reuse functions from preprocessing where possible (tweet2branches(), file loading functions).
  • If the trials file isn't available, still output the top half of Table 3 (at the moment, a lot more detail is provided than is listed in the paper).

One other point to investigate further is that in the example I was testing, several sets of hyperparameters achieved the same optimal value. To do - check whether the "best" hyperparameter set is chosen consistently on different machines.

I've finished the other points I added to the previous comment, so I think this issue is ready to be closed after the pull request unless there are any other suggestions for improvements to the output.

I've made a couple of edits to preprocessing.py to combine the functionality of load_dataset() and the similar stage from the depth analysis script - tree2branches() is now called in the loading stage rather than when the tweets are processed later in the script.

Closed by 7e0c80e.