Repository for a university based project on galaxy clustering for which the code was developed by Pawel Janas with the help of Prof. Peter Coles at Maynooth University, Ireland 2022.
Welcome to janas-gcluster
! The programs found in this repositary simulate galaxy distributions in a simple cubic geometry and run specific analysis algorithms on them such as the two-point correlation function. I created this repository as a placeholder for my project based code/files for a project I completed with Maynooth University as part of their SPUR program. In this README document, I will outline the basics of using my code, specific function/variable names and ways to improve my code for future use.
This code is not a finished product and should not be treated as such. Bugs and glitches may occur and minor mistakes may exist in the algorithms. Misuse of the program (e.g. allowing it to run for 50,000 Monte Carlo runs) may crash your machine.
This program runs on Python 3.8 and above (not tested with earlier versions). It requires the following packages:
numpy
matplotlib
math
pandas
scipy
The programs were developed using the Spyder IDE.
To 'install' this repository, simply download the repository or just the program files and place them in your current working directory (mine was a specific folder named Programs
from which all Python programs ran). If you require an IDE or a custom working environment. I highly recomment installing through Anaconda. In the future I might add a setup.py
file to easily install with pip
, but for now please use the crude installation as listed.
Feel free to use the simulated data created using power_law_model.py
pairwise_cluster_model
etc or use your own. NB: Only 3 columned x, y and z data is compatible with the programs and each .txt
data file follows the following format:
x y z (headings to be read in separately)
(data lies here)
This program is simply a placeholder for all main functions used for analysis of cluster patterns. Each main function has its own documentation string.
In the code, xi_r_calc
has two definitions, one basic and one 'alt'. The basic one uses the Peebles & Hauser method of calculating the correlation function (xi(r) = n_DD/n_RR - 1). The 'alt' method uses the bit more complex Hamilton estimator (xi(r) = n_DD * n_RR / n_DR^2 - 1). Where n_DD, n_RR etc are the binned frequency values of pairwise distance catalogues. For example, n_DD is the 'data-data' set where the pairwise distances are calculated within the data set and n_DR is the 'data-random' set of cross correlated distances.
Recommended first program to use as it gives the user a good overview of what the key programs do. User decides what functions to run and data files to examine. The random catalogue for n_RR, n_DR calculation is simulated at each runtime (this is not the case for monte_carlo_sim.py
. Note: The factor
variable in the code is hard-coded and needs to be changed before running the program if desired.
This is the heart of janas-gcluster
API. This program does similar analysis methods to cluster_analysis.py
but implements a Monte Carlo approach to calculate xi(r). This is done to see if any trend arises in xi(r). See below pictures of how the two-point correation function (xi(r)) changes with increasing MC run count for two random catalogues.
The variables in this program need to be changed via editor such as MCRUNS
file_name
and Nbins
so please make sure to check them. You can plot xi(r) at powers of 10 but I don't recommend this for higher run counts as it eats up memory so comment that block out if necessary.
This program is mainly deprecated which was used for early testing but it has one capability that other programs have not; you can define a minimum and maximum bound for the total galaxy number N and run analysis tools similar to what is in cluster_analysis.py
. The user defines these bounds and also chooses how many steps to take from nmin
to nmax
. Also calculates the mean and variance of xi(r) for increasing N. If using Spyder as IDE and other IDEs, variables will be stored for checking. galaxy_count
needs to be edited as the arrays generated are too big. Note: This program contains a lot of functions that are not in funcs.py
and there may be some duplicate-like appearances so be careful.
The following programs were used to simulate 3D data where each point represents a galaxy. Please uncomment the file_write
lines to save a custom data file (make sure to choose a name for the file).
This simulates clusters around 10% of the initial random catalogue with a density profile proportional to r^-1.8. This was done by creating a new probability density function (pdf) and then sampling r values from it for the cluster generation algorithm. This was achieved by subclassing scipy
for a custom pdf. The direction at which to place each galaxy is also randomly chosen from 3 normal distributions (sample_spherical
) to avoid corner densities at pole of each cluster.
Gives 10% of the total catalogue a 'pair' at a fixed distance r. This data-set takes many Monte Carlo runs to achieve some steady state of the correlation function xi(r). Would recommend changing variable N
to place more points (galaxies) at fixed distance r for each cluster.
Same as power_law_model
except r values are chosen at random instead of from a custom density function like above.