PyDarshan job summary cleanup

Question

PyDarshan job summary cleanup

shanedsnyder opened this issue 2 years ago · 4 comments

Answer 1 · 2023-02-28T16:56:36.000Z

There is some duplication with i.e., gh-804 here as well.

Answer 2 · 2023-02-28T20:19:58.000Z

I embarrassingly can't figure this one out:

Erroneous "type" row in File Count Summary table

Seems like it ought to be a one-liner to move the "type" index up to a column like other values, but I can't quite seem to pull it off.

Most of the other issues I've already addressed in #907

Answer 3 · 2023-02-28T20:53:01.000Z

It wasn't an error--it was an intentional index label. You can remove it, but then you'll need to shim the test suite as well (which is pretty good to do even for aesthetic changes anyway). This is pretty straightforward, but it wasn't a bug (it was intentional) so I'd prefer not using this approach:

--- a/darshan-util/pydarshan/darshan/lib/accum.py
+++ b/darshan-util/pydarshan/darshan/lib/accum.py
@@ -71,7 +71,6 @@ def log_file_count_summary_table(derived_metrics,
                              "read/write files":3}
     df = pd.DataFrame.from_dict(darshan_file_category, orient="index")
     df.rename(columns={0:"index"}, inplace=True)
-    df.index.rename('type', inplace=True)
     df["number of files"] = np.zeros(4, dtype=int)
     df["avg. size"] = np.zeros(4, dtype=str)
     df["max size"] = np.zeros(4, dtype=str)
diff --git a/darshan-util/pydarshan/darshan/tests/test_lib_accum.py b/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
index 080264d1..3d23fb76 100644
--- a/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
+++ b/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
@@ -203,7 +203,6 @@ def test_file_count_summary_table(log_name,
                          "read-only files",
                          "write-only files",
                          "read/write files"]
-    expected_df.index.rename('type', inplace=True)
 
     log_path = get_log_path(log_name)
     with darshan.DarshanReport(log_path, read_all=True) as report:

I'd prefer if you didn't mutate the data structures for the sole purpose of removing a row in the report, and instead do something like this (a small test may be warranted) so that end users get what they want from to_html but no need to inconvenience developers who like index labels:

--- a/darshan-util/pydarshan/darshan/lib/accum.py
+++ b/darshan-util/pydarshan/darshan/lib/accum.py
@@ -100,5 +100,6 @@ def log_file_count_summary_table(derived_metrics,
     df.drop(columns="index", inplace=True)
     ret = plot_common_access_table.DarshanReportTable(df,
                                                       col_space=200,
-                                                      justify="center")
+                                                      justify="center",
+                                                      index_names=False)
     return ret

Answer 4 · 2023-05-31T20:58:40.000Z

Last bits of this closed via #927.