As part of the Ganga Project assignment for Google Summer of Code 2019, the codes and instructions have been put to execute the mentioned tasks in the given file.
Since there was a Task Statement and Memory Management Statement, both have been discussed seperately in detail below.
First task was to execute a simple Hello World_ job in the Ganga Shell whose output can be found here: Ganga_Hello_World.ipynb. The Jupyter Notebook can be opened in the Colab Notebook whose link is available at the top of the notebook.
In the next task, the given PDF file needs to be seperated into individual pages. Next, the Ganga Job
should count the number of the
in the given PDF file. The count of individual pages should be performed using subjobs
. Finally, a merge
needs to be written which takes the count from each subjob and adds up the values and writes it in a file.
In this regard, two helper modules/functions: execute.sh and adder.py are written and explained below:
This file contains bash commands which convert the individual PDF pages into text file and count the number of the
existing in the file.
This file contains a CustomMerger function which adds up all the counts and writes it in a output file.
The Ganga_File_Split.ipynb notebook contains the commands and code for:
- Install and Import needed modules
- Getting the required files
- Split the PDF file to PDF pages
- Commands to execute in the Ganga Shell
Note: I tried placing the code in a single Python file but while execution the merger failed due to the job being in submitted mode. Even after adding time-delay nothing worked. Hence, commands need to be put manually in the Ganga Shell.
The file stdout in the current directory will contain the needed sum.
For Memory Management, 4 tasks were given, out of which 3 were performed with all the requirements fulfilled. Please find the description of the performed experiments below:
- There are two folders: Deep Copy and Shallow Copy.
- In Deep Copy folder, there are two python files:
- deepcopy_delay-1.py executes the first task of performing deep copy of previous simple objects and monitors the memory usage.
- deep-release_reference-2.py executes the second task of releasing the reference of created objects one by one and observe the memory usage.
- In Shallow Copy folder, there is one python file:
- shallow-release_reference-3.py executes the same tasks as in the deep-copy case but using shallow copy.
I checked for implementing the algorithm for using shallow-copy to mimic deep-copy (as described by Ulrik sir's in the email). I got an idea as well which is described below:
Shallow Copy creates a new object and has only references from original object for the sub-objects within it. This can be shown below. To use shallow-copy and make it mimic like deep-copy, we have to make shallow-copies of the available sub-objects as well.
- Deep-Copy of Objects
- Release Reference - Deep Copy
- Shallow Copy