Pinned Repositories
kPCA-in-high-frequency-trading
• Partition trading time series data into 30 minutes intervals by picking the mean transaction price and volumes in each interval and compute the log-return (aka ’U sequence’) and write it into a corresponding csv file: JNJ_1004_1015_2010_HFT_30min_.csv • Visualize the high frequency data with PCA by using 2 or 3 PCs: you need to calculate the variance explained ratios for your visualization. • Identify outliers in your PCA analysis • Visualize it by using KPCA and compare its results with those of PCA (you need to at least try two kernels)
Twitter-analytics-for-University-endowment-analysis
use python twitter API to collect the number of followers for the 232 rich universities in the file endowments_2018.csv and updated the file as endow- ments_2018_2.csv to include the the number of followers in twitter. • Verify the following hypotheses via your analytics: – The richer a university, the more followers in its twitter – Private universities have more followers than its public peers • Visualize the data via PCA and t-SNE (or SOM) and analyze your findings • Label entries as – ’universities with a large number of followers – universities with a medium number of followers – universities with a small number of followers – then partition data as training and test by using extra-Trees, MLP and CNN to do classification by showing all classification measures including d-index.
Basic-PCA
bsktball-analysis
NCAA Dataset
Classification-on-credit-risk-data
Classification model on the credit risk data
Credit-Risk-Analysis
matching-sys
Orphan designations matching
PCA--Application
Suppose k PCs are selected in your PCA for the SP data. It means ith stock will have coordinates pi1, pi2 · · · pik in the subspace spanned by PCs. – 1. find the top 20 stocks with largest PC1 values – 2. find the top 20 stocks with largest PC2 values – 3. Find the top 20 stocks with largest PCk values – 4. RankeachstockbytheirPCArankingscoreρ= p2i1 +p2i2 +···+p2ik, list the top-ranked 20 stocks, and describe their characteristics. – 5. Discuss the relationships between the variables according to your PCA analysis
petAdoptionLogisticRegression
Logistic regression model developed to predict whether a pet gets adopted or not
review-text-analysis
fashion review analysis and wine review analysis
cryaa's Repositories
cryaa/kPCA-in-high-frequency-trading
• Partition trading time series data into 30 minutes intervals by picking the mean transaction price and volumes in each interval and compute the log-return (aka ’U sequence’) and write it into a corresponding csv file: JNJ_1004_1015_2010_HFT_30min_.csv • Visualize the high frequency data with PCA by using 2 or 3 PCs: you need to calculate the variance explained ratios for your visualization. • Identify outliers in your PCA analysis • Visualize it by using KPCA and compare its results with those of PCA (you need to at least try two kernels)
cryaa/Twitter-analytics-for-University-endowment-analysis
use python twitter API to collect the number of followers for the 232 rich universities in the file endowments_2018.csv and updated the file as endow- ments_2018_2.csv to include the the number of followers in twitter. • Verify the following hypotheses via your analytics: – The richer a university, the more followers in its twitter – Private universities have more followers than its public peers • Visualize the data via PCA and t-SNE (or SOM) and analyze your findings • Label entries as – ’universities with a large number of followers – universities with a medium number of followers – universities with a small number of followers – then partition data as training and test by using extra-Trees, MLP and CNN to do classification by showing all classification measures including d-index.
cryaa/Basic-PCA
cryaa/bsktball-analysis
NCAA Dataset
cryaa/Classification-on-credit-risk-data
Classification model on the credit risk data
cryaa/Credit-Risk-Analysis
cryaa/matching-sys
Orphan designations matching
cryaa/PCA--Application
Suppose k PCs are selected in your PCA for the SP data. It means ith stock will have coordinates pi1, pi2 · · · pik in the subspace spanned by PCs. – 1. find the top 20 stocks with largest PC1 values – 2. find the top 20 stocks with largest PC2 values – 3. Find the top 20 stocks with largest PCk values – 4. RankeachstockbytheirPCArankingscoreρ= p2i1 +p2i2 +···+p2ik, list the top-ranked 20 stocks, and describe their characteristics. – 5. Discuss the relationships between the variables according to your PCA analysis
cryaa/petAdoptionLogisticRegression
Logistic regression model developed to predict whether a pet gets adopted or not
cryaa/review-text-analysis
fashion review analysis and wine review analysis
cryaa/reviewtext-analysis
fashion review analysis and wine review analysis
cryaa/risk-evaluation-in-p2p
cryaa/Vacun.github.io
cryaa/wineReview