/Photometric_Redshift

Estimate the Redshift of galaxies using Decision Tree model

Primary LanguageJupyter Notebook

Photometric_Redshift

Estimate the Redshift of galaxies using Decision Tree model. The redshift of galaxies can be predicted by using the light intensity values emitted by them in the following wavelength ranges :

  1. Ultraviolet (u)
  2. Green (g)
  3. Red (r)
  4. Near Infrared (i)
  5. Infrared (z)

A decision tree regressor is trained on the difference of the intensity values through each filter , to obtain a decent estimate of the Redshift.

The decision tree can be linearized into decision rules, where the outcome is the contents of the leaf node, and the conditions along the path form a conjunction in the if clause. In general, the rules have the form:

Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.

if condition1 and condition2 and condition3 then outcome. Decision rules can be generated by constructing association rules with the target variable on the right. They can also denote temporal or causal relations.

Decision trees:

Are simple to understand and interpret. People are able to understand decision tree models after a brief explanation. Have value even with little hard data. Important insights can be generated based on experts describing a situation (its alternatives, probabilities, and costs) and their preferences for outcomes. Help determine worst, best and expected values for different scenarios. Use a white box model. If a given result is provided by a model. Can be combined with other decision techniques.

Disadvantages of decision trees:

They are unstable, meaning that a small change in the data can lead to a large change in the structure of the optimal decision tree. They are often relatively inaccurate. Many other predictors perform better with similar data. This can be remedied by replacing a single decision tree with a random forest of decision trees, but a random forest is not as easy to interpret as a single decision tree. For data including categorical variables with different number of levels, information gain in decision trees is biased in favor of those attributes with more levels. Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked.