Code organization
Closed this issue · 5 comments
I was thinking about a better way to organize the growing main file and tests. It's getting bigger and bigger after each of your great contributions, which is great. But it's also getting harder and harder to read and follow up.
I was thinking about splitting each file in multiples one but I don't know yet the splitting logic. Do you have any idea ?
I agree that this is a must. Currently I'm not aware of any way to split the class function definitions into several files (I tried a basic test but that didn't work out). I've posted on Gitter but not gotten any replies, I'll perhaps try StackOverflow once I'm done with the categories.
My current thought is that we should move all helper functions to a utils-file, e.g. the entire utils
section, the _to_html
and my new __to_string
could probably be moved. There is also some double documentation at the moment - I'll post a new issue regarding this.
How to split a class into several files
I got a suggestion from @andreaskoepf to simply use the return value. Here's an example of an approach we could use:
main.lua
test = require("test")
assert(loadfile("test2.lua"))(test)
local a = test.new()
a:a()
a:b()
print("end")
The test.lua file
Here we create the class and return it to the main for passing onto the subsections:
test = torch.class('test')
function test:__init()
self.data = {}
end
function test:a()
print("a")
end
return test
The test2.lua
As we pass unnamed parameters to the file we need to extract the first and set the test class:
local params = {...}
local test = params[1]
function test:b()
print("b")
end
This gives the desired output:
$ th main.lua
a
b
end
Possible structure
I haven't thought that thoroughly about this but the main themes seem to be:
- Core
- Load data (csv + table)
- Save data
- Print to screen
- Print to itorch
- Subset data (head/tail/sub/where)
- Manipulate data
- Utilities
Another thing that may be worth considering is if we should imitate Pandas' Series. This would be nice but I'm not sure its worth the effort.
There was some additional tweaking necessary in the init.lua/rockspec for everything to work (see my misc-branch for how I've implemented this). I've summarized everything in a SO-question as this information was surprisingly hard to get by.
The current subfiles that I have are:
- load_data
- save_data
- load_batch
- missing_data
- output
- select_set_update
- statistics
I'm thinking of adding a categorical
file with the specific categorical functions. I've tried to keep main to core functions but we should probably split it a little further now that the infrastructure is in place.
Mind blown
I'm thinking of adding a categorical file with the specific categorical functions.
I agree, categorical
functions would be better in a specific file.
I've tried to keep main to core functions but we should probably split it a little further now that the infrastructure is in place.
Yes we should split again by creating sub-directories in Extensions
to keep it clean but for now it's great enough to try the new architecture you did ^^.
What you could do, is PR the new functionalities, then PR the new architecture (or vis versa). What do you think ?
Glad you like it. We will unfortunately have to do the file split last as that is how I've organized the branches. I'll add the next PR tonight