yuchenyuu's Stars
AshenOneme/OpenSeespy-common-problems-and-cases
OpenSees常见问题解答及案例分析
JobinJohan/classification_task_ML
Example of an animal classification task. KNN, Decision Trees, SVC, Naive Bayes, Multilayer Perceptron are compared.
ericdhitchens/Predicting_Concrete_Compressive_Strength
How would you predict the compressive strength of concrete as a function of its constituent materials and curing time? In this portfolio project, I optimize a model for determining concrete compressive strength using a deep neural network in Tensorflow 2.0 and compare its performance to linear models.
dcfeng-87/Ensemble-learning-deep-beam
Implementing ensemble learning methods for shear strength prediction of RC deep beams with/without web reinforcements
LaxmiChaudhary/Modeling-of-strength-of-high-performance-concrete-using-Machine-Learning
bigdog3626/Concerete_Crack_Detection
kaushik1580/Compressive_Strength_Prediction_For_Concerete
mansouri-83/Shear-strength-CFST
Developing an interpretable machine learning model for estimating the shear strength of concrete-filled steel tubes.
arjunrch/CFST_Strength_Prediction
Comparative study of data driven models for strength prediction of CFST columns using Machine Learning approach
Actor12/data_analysis
机器学习数据预处理:包括画数据分布图、特征筛选、调参技巧
MacarocoFonseca/Loan_Default_Prediction
1. Business Understanding In applying Machine Learning techniques to solve business problems, it is necessary to follow a project lifecycle in order to ensure that the model implemented is aligned with the objective and takes into consideration all the aspects of the business issue at hand. This lifecycle is typically the following: - Business Understanding - Data Understanding - Data Preparation - Modeling - Evaluation - Deployment However, this lifecycle cannot be considered as a linear process and, depending on the outcomes of each stage, it will be required from the analysts to go back to the previous stage and adapt their analysis until optimal results are obtained. For this reason, the business understanding part of the analysis is crucial, as it will set the objectives of the study and will enable to understand from a business perspective how each variable at hand can influence these objec-tives, how real-life events and phenomena might affect the outcome and how to account for the variability and unpre-dictability of financial and business data. In this study, we are attempting to solve the issue faced by financial institutions in assessing the likelihood of default of potential borrowers in order to take the right decision in approving or rejecting loan applications. Our business under-standing objective is to determine which of their clients will default or not on a loan. 5 This should be done through the analysis of data provided by clients or collected from historical performance. Variables such as the income of the borrower, the number of years they have been employed and their home ownership can provide information to firms on the financial strength of borrowers and whether or not they have collateral as an insurance against default and are important determinants of whether or not a borrower will default. Historical credit information are also crucial to determine the current likelihood of default, as patterns are most likely to be repeated. Hence, banks can use previous credit ratings and the number of loans already outstanding in order to determine if the borrower has been reliable in the past or if it is current overleveraged, and therefore likely to default on new loans. Finally, the nature, amount and term of the loan a borrower is applying for can also determine their likelihood of default as loans with a longer maturity, higher principal or for certain purposes can lead to higher default probabilities. All these variables should be taken into consideration and have a relationship with the default outcome of a borrower. Applying a supervised learning algorithm will allow us to use existing datasets containing this information and observe if and how they relate to the status of existing associated loans during the same timeframe as we would consider for future predictions. Therefore, we could train a model on existing historical data containing the previous feautures. This data would need to be collected from at least one period prior to our prediction date and equal or higher to the time needed for the variables to affect our target. The variables selected should always be available to us at the time where predictions would have to be done (hence at the time a loan is requested). As an example, the outstanding amounts on a loan or the number of loans already paid could not be used as predictors for default. Finally, the model trained would be evaluated on a test dataset for which the outcome is already known, in order to verify the accuracy of our predictions, before being deployed to predict default for future loans. Our data mining objec-tives are to predict with high accuracy and F1-Score the binary outcome of the model and the probability of default. Accuracy and F1-Score are most appropriate because both false positive and false negative values should be minimized as much as possible. Indeed, a bank would lose money by approving a loan to a defaulting customer, or by losing the business of a non-defaulting customer that would be falsly flagged as a defaulter. We will also attempt to maximize our ROC-AUC score, which is an important metric that represents the ability of a model to distinguish between different classes. The results of the evaluation and deployment stages might lead us to reassess the data understanding, preparation and modeling stages in order to fine-tune our results. Our analysis will start with the description and analysis of our dataset as developed in the next part.
OmarHammemi/-MBTI-Myers-Briggs-Personality-Type-Dataset
The Myers Briggs Type Indicator (or MBTI for short) is a personality type system that divides everyone into 16 distinct personality types across 4 axis: Introversion (I) – Extroversion (E) Intuition (N) – Sensing (S) Thinking (T) – Feeling (F) Judging (J) – Perceiving (P) (More can be learned about what these mean here) So for example, someone who prefers introversion, intuition, thinking and perceiving would be labelled an INTP in the MBTI system, and there are lots of personality based components that would model or describe this person’s preferences or behaviour based on the label. It is one of, if not the, the most popular personality test in the world. It is used in businesses, online, for fun, for research and lots more. A simple google search reveals all of the different ways the test has been used over time. It’s safe to say that this test is still very relevant in the world in terms of its use. From scientific or psychological perspective it is based on the work done on cognitive functions by Carl Jung i.e. Jungian Typology. This was a model of 8 distinct functions, thought processes or ways of thinking that were suggested to be present in the mind. Later this work was transformed into several different personality systems to make it more accessible, the most popular of which is of course the MBTI. Recently, its use/validity has come into question because of unreliability in experiments surrounding it, among other reasons. But it is still clung to as being a very useful tool in a lot of areas, and the purpose of this dataset is to help see if any patterns can be detected in specific types and their style of writing, which overall explores the validity of the test in analysing, predicting or categorising behaviour. Content This dataset contains over 8600 rows of data, on each row is a person’s: Type (This persons 4 letter MBTI code/type) A section of each of the last 50 things they have posted (Each entry separated by "|||" (3 pipe characters)) Acknowledgements This data was collected through the PersonalityCafe forum, as it provides a large selection of people and their MBTI personality type, as well as what they have written. Inspiration Some basic uses could include: Use machine learning to evaluate the MBTIs validity and ability to predict language styles and behaviour online. Production of a machine learning algorithm that can attempt to determine a person’s personality type based on some text they have written.
sumanramesha/-Churn-of-Telecom-Users-Prediction-by-using-Machine-Learning-Classification-Algorithms
This paper aims to predict the churn of telecom customers, which will help us react in time and try to retain the existing users who want to switch to different networks. We will be using three different machine learning techniques for classification Support Vector Machines, K-Nearest Neighbour and Random Forest also find out the best model for classification.The data consists of information about almost six thousand users including the services they use, their demographic characteristics, the duration of the operator’s services, the amount of payment and the method of payment.In the dataset there are 20 variables, some of them which are numerical and most are categorical. There are also some missing values in the dataset. We have to do data pre- processing before implementing any model.(Data Pre-Processing) Let’s first remove the null values from the dataset. There are only 10 missing values present in total charge variable. The customers with NA values all have a tenure of 0, they are new clients who has yet to pay their bills therefore total charge value for them should be zero. We also have to drop unwanted columns like ‘gender’, ‘MultipleLines’ , ‘PhoneServices’ , ‘differences’.Exploratory Data Analysis Why the clients are more inclined to leave the company and on what factors it depends.'Phone services' were available in 91% of cases. 88 percent had a "month-to-month" contract, 82 percent had no "dependents," 78 percent had no "online security," 77 percent had no "tech support," 75 percent had "paperless billing," and 75 percent are "older citizens." 68 percent had fibre optic internet, 65 percent had no 'online backup' or 'device protection,' and 64 percent had no partner. 57 percent paid with an electronic check, 50 percent did not have'streaming TV,' were male, and did not have'streaming movies,' and 45 percent had'multiplelines.' hypotheses formulation Based on our observations, we believe that a client is more likely to depart if he has a high MonthlyCharge. This is especially true if the client is new (less than 15 months). It lacks particular services such as internet security, tech assistance, online backup, and/or device protection if the decision to quit is simple, i.e. there is no firm commitment: has a month-to-month contract, no other person involved in the decision: no dependant and/or spouse, everything can be done via the internet or over the phone: Paperless Billing and Phone Services are available.Class Imbalance It is clearly visible that there is a huge difference between the two classes (customers who stayed and the customers who left the company) one is the majority class and the other one is minority.The challenge here that we can face with such a imbalanced data is that most of the classification techniques will not consider the minority class (customers who left), and in turn show poor prediction.Here we will use one approach to address the problem SMOTE. SMOTE (Synthetic Minority Oversampling Technique) is an oversampling technique used to create synthetic samples for the minority class instead of creating copies. We will be using the from imblearn.over_sampling import SMOTE python library. The method chooses two or more comparable examples (through a distance measure) and perturbs one characteristic at a time by a random amount within the difference between the surrounding examples.The last thing we have to do is to split and scale the dataset, In splitting we will split the data into training samples and testing samples randomly and in scaling we solely normalise continuous data and leave dummy variables alone. We also apply the min-max scaler to those continuous variables, giving them the identical minimum of zero, maximum of one, and range of one. (Correlation Heatmap) Correlation heatmap is shown in the below figure it helps us to depict the relations between different variables.And also I have plotted histogram and scatter plot between variables and their relations with the target variables.We can observe that, in general, clients that desire to quit (churn = 'Yes') are new clients (low tenure 15 months, and hence low TotalCharges) with high MonthlyCharges > 65$/month. Because there is no linear relationship between tenure and TotalCharges, additional fees must be determined.( MACHINE LEARNING CLASSIFICATION TECHNIQUES) are used such as Support Vector Machine, K-Nearest Neighbor, Random Forest where Random Forest is the most perdicted accuracy model with 83.2%. random forest classification method we can get the best prediction for the customers leaving the telecom company.
jason13nn/Machine-Learning-in-Python
Introduces the processes of exploring, visualizing, and classifying large amounts of data. This course provides an introduction to classic and contemporary learning techniques in classification and regression, using the Python programming language for simple APIs and rapid prototyping. We explore linear classification algorithms and their non-linear counterparts via kernel tricks. We also explore Neural Networks and deep learning architectures, with emphasis on GPU accelerated training and auto encoding procedures. Class projects focus on using Kaggle competitions as example datasets. All material covered will be reinforced through hands-on experience using state-of-the art tools to design and execute data learning algorithms.
EndOfTheGlory/training-examples-Machine-learning
All trainings ipnyb notebooks that I made when learning main ML algorithms. Although it's not much, it still reflects my learning process.
Davisy/Machine-Learning-Project-with-Scikit-Plot
Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.
monte-flora/scikit-explain
A user-friendly python package for computing and plotting machine learning explainability output.
yonycherkos/Applied-Data-Science-with-Python-Specialization
The 5 courses in this University of Michigan specialization introduce learners to data science through the python programming language. This skills-based specialization is intended for learners who have a basic python or programming background, and want to apply statistical, machine learning, information visualization, text analysis, and social network analysis techniques through popular python toolkits such as pandas, matplotlib, scikit-learn, nltk, and networkx to gain insight into their data. Introduction to Data Science in Python (course 1), Applied Plotting, Charting & Data Representation in Python (course 2), and Applied Machine Learning in Python (course 3) should be taken in order and prior to any other course in the specialization. After completing those, courses 4 and 5 can be taken in any order. All 5 are required to earn a certificate.
abhiwalia15/Python-for-Data-Science-and-Machine-Learning-Bootcamp
program with Python, how to create amazing data visualizations, and how to use Machine Learning with Python! Here a just a few of the topics we will be learning: Programming with Python NumPy with Python Using pandas Data Frames to solve complex tasks Use pandas to handle Excel Files Web scraping with python Connect Python to SQL Use matplotlib and seaborn for data visualizations Use plotly for interactive visualizations Machine Learning with SciKit Learn, including: Linear Regression K Nearest Neighbors K Means Clustering Decision Trees Random Forests Natural Language Processing Neural Nets and Deep Learning Support Vector Machines and much, much more!
anubhavanand12qw/STOCK-PRICE-PREDICTION-USING-TWITTER-SENTIMENT-ANALYSIS
The coding has been done on Python 3.65 using Jupyter Notebook. This program fetches LIVE data from TWITTER using Tweepy. Then we clean our data or tweets ( like removing special characters ). After that we perform sentiment analysis on the twitter data and plot it for better visualization. The we fetch the STOCK PRICE from yahoo.finance and add it to the data-set to perform prediction. We apply many machine learning algorithms like (random forest, MLPClassifier, logistic regression) and train our data-set. Then we perform prediction on untrained data and plot it with the real data and see the accuracy.