TheAlgorithms/R

Duplicate files in documentation directory, only differing in case

pschneider1968 opened this issue ยท 9 comments

When cloning the repo to my Windows machine, I received the following warning message:

Cloning into 'Algorithms-in-R'...
remote: Enumerating objects: 984, done.
remote: Counting objects: 100% (984/984), done.
remote: Compressing objects: 100% (424/424), done.
remote: Total 984 (delta 532), reused 984 (delta 532), pack-reused 0
Receiving objects: 100% (984/984), 927.20 KiB | 27.27 MiB/s, done.
Resolving deltas: 100% (532/532), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'documentation/ANN.md'
  'documentation/ann.md'
  'documentation/K_Folds.md'
  'documentation/k_folds.md'
  'documentation/kmeans_raw_R.md'
  'documentation/kmeans_raw_r.md'
  'documentation/KNN.md'
  'documentation/knn.md'
  'documentation/linearRegressionRawR.md'
  'documentation/linearregressionrawr.md'
  'documentation/SVM.md'
  'documentation/svm.md'

Peter@bolide MINGW64 /c/repos/Learn/Algorithms
$

So it seems to me these files are duplicated and only differing in case. However from each pair, one of them has a newer commit. So you might want to check these changes to see whether they have been applied to the right files, or correct the case of the most recent file and delete the older other one.

These changes are mostly from #88 it seems.

Hi, thanks for reporting that! Do you want to make a PR to fix that or somebody else can pick up this issue? I think it should be safe to just remove older files.

Sure I can do that! What case would you prefer for the correct version of the affected files? Should they be in mixed/upper case, as they were before, or would you like them to be all lower case? The latter would be inconsistent with the remainder of the documentation directory, however...

I'm thinking of having them all lowercase. This doesn't seem to be too inconsistent, though. There are other files that are lowercase and don't have PascalCase counterparts. Maybe you could rename other files to match the naming scheme?

At first glance, I thought that all files in the documentation directory were just normal, manually written and edited documentation files.

But now I dug a little bit deeper to understand the issue, and learnt that these files are autogenerated by a Github workflow. What was specifically changed in #88 was that a new R script .github/scripts/doc_builder.r has been added that does this job, and does it differently than before, when this functionality was still in .github/workflows/documentation_workflow.yml

So I've now come to think that manually changing stuff in the documentation directory is pointless because of this. I guess when a maintainer commits to the master branch and the Github workflows are triggered, all documentation files are re-generated.

However: this won't delete the old ones. Maybe this could be added to the workflow? I'm not sure about this, as I'm not very familiar with GH workflows...

I've added removal of old files in #121, but it seems that not all of them are autogenerated (just based on the number of removed and added files). HTML files are not generated either. Unfortunately, I don't have time to dig in at the moment. Maybe you could check what's going on? Where HTML files are from and whether or not there is manual documentation?

In the meantime, I dug in deeper, and on my fork I did a change similar to yours: in ./github/documentation_workflow.yml I deleted the documentation directory via rm -rf so that everything gets regenerated.

See my change here

Then when the workflow runs, it turns out, as you also noticed, that a lot of files in the documentation directory are missing afterwards. This is due to a lot of the R scripts failing to run properly. See the workflow output of my commit here in the section Documentation compilation. Lots of error messages there, so no new output file for each affected script ๐Ÿค”๐Ÿ˜”

So I think the R scripts ran properly some time ago in the past, but over time, some of them started to fail because of reasons beyond my knowledge. However as the documentation directory was never deleted before, the old generated output files from earlier successfull runs just piled up, and nobody noticed that there started to be problems with the R scripts.

So I think the reasons for the errors should be investigated and fixed. However as I am just starting to learn R, this is far beyond my knowledge.

With the original problem in my issue of files names clashing, I just scratched the surface of deeper and more serious problem, I think. Some of the core maintainers should look into this issue and fix it.

I think I might have fixed it. In #121 all but 2 files compiled successfully. At least it's not a systemic error now. Please let me know if it looks good to you.

I tried your changes in my fork, and can confirm that all but 2 scripts now run successfully and generate output files in the documentation directory.

The two remaining errors I see are these:


processing file: ../classification_algorithms/decision_tree.Rmd
1/3                  
2/3 [unnamed-chunk-1]
3/3                  
output file: decision_tree.md
Error compiling: Error in parse(text = x, keep.source = TRUE): <text>:59:45: unexpected symbol
58: 
59: x <- data[,!(names(data) %in% drop_columns)]y
                                                ^

processing file: ../regression_algorithms/ann.Rmd
1/3                  
2/3 [unnamed-chunk-1]
3/3                  
output file: ann.md
Error compiling: Error in parse(text = x, keep.source = TRUE): <text>:59:45: unexpected symbol
58: 
59: x <- data[,!(names(data) %in% drop_columns)]y
                                                ^

But I have no idea what causes them or how to fix ๐Ÿ˜”

But anyway, nice to see some progress!

Now as #121 was merged, I think this issue can be closed.