MacarocoFonseca
Data Scientist strives for excellence and continuous learning. Want to change the career path to Quantitative Finance.
Schuberg PhilisAmsterdam
Pinned Repositories
algo-trading-models-practice
repository for practicing and experimenting with new algorithmic trading models. Sharpen your skills, explore innovative strategies, and contribute to the world of algorithmic trading.
Algorithmic-Trading-Market-Micro-structure
The aim of this project is to use two Algorithm trading strategies, Three Moving Average Crossover Algorithm and On-Balance Volume (OBV) Stock Trading Strategy to determine when to buy and sell stock and analyze the performance of these strategies, in a simulation, using the information of 10 stock prices data during 1 year.
binomial_options
Set of examples for price options
Bond_Prices_Trade_Prediction
PySpark project for bond trade valuation
Brazilian-Index-Deep_Learning
The purpose of creating a model to predict in a binary way if it is a good moment to buy or not a particular stock. For this work, I decided to analyze companies in the Brazilian index Bovespa. I chose not to focus on day to day information like stock prices and volatility but to concentrate on the company's fundamentals. In our view, a buy and hold strategy, rather than day trading that uses graphic patterns, based on the concepts of investors like Benjamin Graham and Warren Buffett, proves to deliver a higher return in the long run.
Chess_Board-Pieces_Detection
Project developed in C++ and C# within Computer Vision subject. The objective was to develop a program which would detect the Chess board and where was located each piece.
Credit_Risk_Walmart
Credit Risk project with many computations regarding Distance to Default and also Probabilyt of Default of Walmart. It was used KMV model with the starting point of Loffer and Posch. Used equations translated from Bharath and Shumwat (2008, Equation 6 and 7)
deploy_machine_learning_model
Repository focused on how to deploy Machine Learning models to production
Loan_Default_Prediction
1. Business Understanding In applying Machine Learning techniques to solve business problems, it is necessary to follow a project lifecycle in order to ensure that the model implemented is aligned with the objective and takes into consideration all the aspects of the business issue at hand. This lifecycle is typically the following: - Business Understanding - Data Understanding - Data Preparation - Modeling - Evaluation - Deployment However, this lifecycle cannot be considered as a linear process and, depending on the outcomes of each stage, it will be required from the analysts to go back to the previous stage and adapt their analysis until optimal results are obtained. For this reason, the business understanding part of the analysis is crucial, as it will set the objectives of the study and will enable to understand from a business perspective how each variable at hand can influence these objec-tives, how real-life events and phenomena might affect the outcome and how to account for the variability and unpre-dictability of financial and business data. In this study, we are attempting to solve the issue faced by financial institutions in assessing the likelihood of default of potential borrowers in order to take the right decision in approving or rejecting loan applications. Our business under-standing objective is to determine which of their clients will default or not on a loan. 5 This should be done through the analysis of data provided by clients or collected from historical performance. Variables such as the income of the borrower, the number of years they have been employed and their home ownership can provide information to firms on the financial strength of borrowers and whether or not they have collateral as an insurance against default and are important determinants of whether or not a borrower will default. Historical credit information are also crucial to determine the current likelihood of default, as patterns are most likely to be repeated. Hence, banks can use previous credit ratings and the number of loans already outstanding in order to determine if the borrower has been reliable in the past or if it is current overleveraged, and therefore likely to default on new loans. Finally, the nature, amount and term of the loan a borrower is applying for can also determine their likelihood of default as loans with a longer maturity, higher principal or for certain purposes can lead to higher default probabilities. All these variables should be taken into consideration and have a relationship with the default outcome of a borrower. Applying a supervised learning algorithm will allow us to use existing datasets containing this information and observe if and how they relate to the status of existing associated loans during the same timeframe as we would consider for future predictions. Therefore, we could train a model on existing historical data containing the previous feautures. This data would need to be collected from at least one period prior to our prediction date and equal or higher to the time needed for the variables to affect our target. The variables selected should always be available to us at the time where predictions would have to be done (hence at the time a loan is requested). As an example, the outstanding amounts on a loan or the number of loans already paid could not be used as predictors for default. Finally, the model trained would be evaluated on a test dataset for which the outcome is already known, in order to verify the accuracy of our predictions, before being deployed to predict default for future loans. Our data mining objec-tives are to predict with high accuracy and F1-Score the binary outcome of the model and the probability of default. Accuracy and F1-Score are most appropriate because both false positive and false negative values should be minimized as much as possible. Indeed, a bank would lose money by approving a loan to a defaulting customer, or by losing the business of a non-defaulting customer that would be falsly flagged as a defaulter. We will also attempt to maximize our ROC-AUC score, which is an important metric that represents the ability of a model to distinguish between different classes. The results of the evaluation and deployment stages might lead us to reassess the data understanding, preparation and modeling stages in order to fine-tune our results. Our analysis will start with the description and analysis of our dataset as developed in the next part.
portfolio-optimization
A journey to efficient investing, quantum leaps in portfolio strategy.
MacarocoFonseca's Repositories
MacarocoFonseca/algo-trading-models-practice
repository for practicing and experimenting with new algorithmic trading models. Sharpen your skills, explore innovative strategies, and contribute to the world of algorithmic trading.
MacarocoFonseca/Algorithmic-Trading-Market-Micro-structure
The aim of this project is to use two Algorithm trading strategies, Three Moving Average Crossover Algorithm and On-Balance Volume (OBV) Stock Trading Strategy to determine when to buy and sell stock and analyze the performance of these strategies, in a simulation, using the information of 10 stock prices data during 1 year.
MacarocoFonseca/Loan_Default_Prediction
1. Business Understanding In applying Machine Learning techniques to solve business problems, it is necessary to follow a project lifecycle in order to ensure that the model implemented is aligned with the objective and takes into consideration all the aspects of the business issue at hand. This lifecycle is typically the following: - Business Understanding - Data Understanding - Data Preparation - Modeling - Evaluation - Deployment However, this lifecycle cannot be considered as a linear process and, depending on the outcomes of each stage, it will be required from the analysts to go back to the previous stage and adapt their analysis until optimal results are obtained. For this reason, the business understanding part of the analysis is crucial, as it will set the objectives of the study and will enable to understand from a business perspective how each variable at hand can influence these objec-tives, how real-life events and phenomena might affect the outcome and how to account for the variability and unpre-dictability of financial and business data. In this study, we are attempting to solve the issue faced by financial institutions in assessing the likelihood of default of potential borrowers in order to take the right decision in approving or rejecting loan applications. Our business under-standing objective is to determine which of their clients will default or not on a loan. 5 This should be done through the analysis of data provided by clients or collected from historical performance. Variables such as the income of the borrower, the number of years they have been employed and their home ownership can provide information to firms on the financial strength of borrowers and whether or not they have collateral as an insurance against default and are important determinants of whether or not a borrower will default. Historical credit information are also crucial to determine the current likelihood of default, as patterns are most likely to be repeated. Hence, banks can use previous credit ratings and the number of loans already outstanding in order to determine if the borrower has been reliable in the past or if it is current overleveraged, and therefore likely to default on new loans. Finally, the nature, amount and term of the loan a borrower is applying for can also determine their likelihood of default as loans with a longer maturity, higher principal or for certain purposes can lead to higher default probabilities. All these variables should be taken into consideration and have a relationship with the default outcome of a borrower. Applying a supervised learning algorithm will allow us to use existing datasets containing this information and observe if and how they relate to the status of existing associated loans during the same timeframe as we would consider for future predictions. Therefore, we could train a model on existing historical data containing the previous feautures. This data would need to be collected from at least one period prior to our prediction date and equal or higher to the time needed for the variables to affect our target. The variables selected should always be available to us at the time where predictions would have to be done (hence at the time a loan is requested). As an example, the outstanding amounts on a loan or the number of loans already paid could not be used as predictors for default. Finally, the model trained would be evaluated on a test dataset for which the outcome is already known, in order to verify the accuracy of our predictions, before being deployed to predict default for future loans. Our data mining objec-tives are to predict with high accuracy and F1-Score the binary outcome of the model and the probability of default. Accuracy and F1-Score are most appropriate because both false positive and false negative values should be minimized as much as possible. Indeed, a bank would lose money by approving a loan to a defaulting customer, or by losing the business of a non-defaulting customer that would be falsly flagged as a defaulter. We will also attempt to maximize our ROC-AUC score, which is an important metric that represents the ability of a model to distinguish between different classes. The results of the evaluation and deployment stages might lead us to reassess the data understanding, preparation and modeling stages in order to fine-tune our results. Our analysis will start with the description and analysis of our dataset as developed in the next part.
MacarocoFonseca/Credit_Risk_Walmart
Credit Risk project with many computations regarding Distance to Default and also Probabilyt of Default of Walmart. It was used KMV model with the starting point of Loffer and Posch. Used equations translated from Bharath and Shumwat (2008, Equation 6 and 7)
MacarocoFonseca/deploy_machine_learning_model
Repository focused on how to deploy Machine Learning models to production
MacarocoFonseca/portfolio-optimization
A journey to efficient investing, quantum leaps in portfolio strategy.
MacarocoFonseca/binomial_options
Set of examples for price options
MacarocoFonseca/Bond_Prices_Trade_Prediction
PySpark project for bond trade valuation
MacarocoFonseca/Brazilian-Index-Deep_Learning
The purpose of creating a model to predict in a binary way if it is a good moment to buy or not a particular stock. For this work, I decided to analyze companies in the Brazilian index Bovespa. I chose not to focus on day to day information like stock prices and volatility but to concentrate on the company's fundamentals. In our view, a buy and hold strategy, rather than day trading that uses graphic patterns, based on the concepts of investors like Benjamin Graham and Warren Buffett, proves to deliver a higher return in the long run.
MacarocoFonseca/chapelHillExpertSurvey
MacarocoFonseca/Chess_Board-Pieces_Detection
Project developed in C++ and C# within Computer Vision subject. The objective was to develop a program which would detect the Chess board and where was located each piece.
MacarocoFonseca/Codility_Exercises_Solutions
My proposed solutions, in Python, to Colidity's challenges
MacarocoFonseca/FixedIncomeSecurities
MacarocoFonseca/FloatingRate
Today's date is January 15th 2007 and, to finance an investment project, a company entered into a floating rate loan agreement with the following details:
MacarocoFonseca/machine-learning-ops
Learning how to put machine learning projects in production
MacarocoFonseca/mv-trading
MacarocoFonseca/Sudoku_BackTrack_Solver
Solver for Sudoku game developed in Python.
MacarocoFonseca/tsibbledata
Example datasets for tsibble