Ganga Project - CERN-HSF - GSoC 2019

As part of the Ganga Project assignment for Google Summer of Code 2019, the codes and instructions have been put to execute the mentioned tasks in the given file.

Required Modules

Since there was a Task Statement and Memory Management Statement, both have been discussed seperately in detail below.

Task

First task was to execute a simple Hello World_ job in the Ganga Shell whose output can be found here: Ganga_Hello_World.ipynb. The Jupyter Notebook can be opened in the Colab Notebook whose link is available at the top of the notebook.

In the next task, the given PDF file needs to be seperated into individual pages. Next, the Ganga Job should count the number of the in the given PDF file. The count of individual pages should be performed using subjobs. Finally, a merge needs to be written which takes the count from each subjob and adds up the values and writes it in a file.

In this regard, two helper modules/functions: execute.sh and adder.py are written and explained below:

execute.sh

This file contains bash commands which convert the individual PDF pages into text file and count the number of the existing in the file.

adder.py

This file contains a CustomMerger function which adds up all the counts and writes it in a output file.

The Ganga_File_Split.ipynb notebook contains the commands and code for:

Install and Import needed modules
Getting the required files
Split the PDF file to PDF pages
Commands to execute in the Ganga Shell

Note: I tried placing the code in a single Python file but while execution the merger failed due to the job being in submitted mode. Even after adding time-delay nothing worked. Hence, commands need to be put manually in the Ganga Shell.

The file stdout in the current directory will contain the needed sum.

Memory Management

For Memory Management, 4 tasks were given, out of which 3 were performed with all the requirements fulfilled. Please find the description of the performed experiments below:

There are two folders: Deep Copy and Shallow Copy.
In Deep Copy folder, there are two python files:
- deepcopy_delay-1.py executes the first task of performing deep copy of previous simple objects and monitors the memory usage.
- deep-release_reference-2.py executes the second task of releasing the reference of created objects one by one and observe the memory usage.
In Shallow Copy folder, there is one python file:
- shallow-release_reference-3.py executes the same tasks as in the deep-copy case but using shallow copy.

Note

I checked for implementing the algorithm for using shallow-copy to mimic deep-copy (as described by Ulrik sir's in the email). I got an idea as well which is described below:

Shallow Copy creates a new object and has only references from original object for the sub-objects within it. This can be shown below. To use shallow-copy and make it mimic like deep-copy, we have to make shallow-copies of the available sub-objects as well.

Results

Deep-Copy of Objects

Release Reference - Deep Copy

Shallow Copy