Pinned Repositories
100days
100 days of algorithms
2018-MachineLearning-Lectures-ESA
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Air_Tranportation_Statistics_Data_Inteview_Case_Study
Analysis of Air Tranportation Statistics Data Case Study solutions for a Lead Data Engineering Position
Axa-Insurance-Telematics-Kaggle
I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.
IEDATACHALLANGE-DJANGO-ANALYTICS-PROJECT
IE 2nd term project prototype application based on Telefonica Mobility and BBVA Credit Card Payments. Provider data is strictly disclosed; but you can use the code in any purpose you desire. MVC stack framework using python Django. Api integrations with Expedia and Twitter Streaming API. Important work on TripAdvisor webscraping. NLP (NLTK) for Topic based sentiment analysis(Trip Advisor Reviews), Timeseries forecasting, Recommendation Engine, Leaflet Data Visualization, NetworkX SNA (python and JS). BBVA data is neglected because of lack of data integrity and necessary categories. I hope this work can be helpful to practicioners of Django framework and analytics. This application is developed in a very short term with Agile methodology, therefore it is normal that there are problems and inconsistencies of code quality. For example we tried to use mongoengine and Django framework document models as a common data source; but we faced with difficulties time to time because of lack of accurate documentation in web. Whenever we resolved we followed the accurate coding practice. Please followup the model usage practice in the last view in views.py to comply with MVC, do not use pymongo directly. Mongoengine will provide features like DBConnectionPooling that will facilitate a scalable architecture.
mortgagebalanceforecastingengine
Mortgage Balance Forecasting Engine Pyspark (Spark 1.3.0), Django, SimPy, Python
python-NLTK-exercise---sentiwordnet-scoring
python-NLP-Simple Sentiment Analysis
semiGridSearchCV
Scikit-learn compliant Semi-supervised learning Grid Search with Cross Validation
semiKmeans
scikit-learn compliant Semi-Supervised Kmeans (seeded Kmeans) with probability estimates
tivi
Currently under development! Tivi Pyspark Streaming and Django Project to build up a recommender system for TV channel audience. Entity Resolution and two recommendation engine algorithms would be used with drifting principle acocridng to training set average treshold comparison principle during the validation. (Content based- collaborative based on show genres/topics and Alternating Least Squares Collaborative Filtering)
AnilSener's Repositories
AnilSener/2018-MachineLearning-Lectures-ESA
Machine Learning Lectures at the European Space Agency (ESA) in 2018
AnilSener/amazon-emr-management-guide
The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request.
AnilSener/Apache-Spark-Deep-Learning-Cookbook
Apache Spark Deep Learning Cookbook, published by Packt
AnilSener/awesome-machine-learning
A curated list of awesome Machine Learning frameworks, libraries and software.
AnilSener/aws-devops-essential
In few hours, quickly learn how to effectively leverage various AWS services to improve developer productivity and reduce the overall time to market for new product capabilities.
AnilSener/bayeslite
BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
AnilSener/boto3
AWS SDK for Python
AnilSener/brick-tutorial-buildsys2017
AnilSener/Data_Structures_Algorithms_In_Python
My implementation of 80+ popular data structures and algorithms and interview questions in Python 3
AnilSener/DIVE-backend
Codebase for DIVE backend (server, worker, and ORM)
AnilSener/DIVE-frontend
Codebase for DIVE SPA using React and Redux
AnilSener/GPUEnabler
Provides GPU awareness to Spark, Contact: @kmadhugit and @kiszk
AnilSener/jupyterlab-hub
JupyterLab extension for running JupyterLab with JupyterHub
AnilSener/kepler.gl
AnilSener/kernel_gateway
Jupyter Kernel Gateway
AnilSener/kinesis-sql
Kinesis Connector for Structured Streaming
AnilSener/machine_learning_examples
A collection of machine learning examples and tutorials.
AnilSener/mleap
MLeap: Deploy Spark Pipelines to Production
AnilSener/mlflow
Open source platform for the machine learning lifecycle
AnilSener/Optimus
:truck: Agile Data Science Workflows made easy with Python and Spark.
AnilSener/pyspark_dist_explore
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
AnilSener/Python
Python code for YouTube videos.
AnilSener/python-sortedcontainers
Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set
AnilSener/QUALIFIER
Qualitiy control for gated flow cytometry data
AnilSener/sagemaker-spark
A Spark library for Amazon SageMaker.
AnilSener/scio
A Scala API for Apache Beam and Google Cloud Dataflow.
AnilSener/spark-notes
AnilSener/SparkInternals
Notes talking about the design and implementation of Apache Spark
AnilSener/training-data-analyst
Labs and demos for courses for GCP Training (http://cloud.google.com/training).
AnilSener/VBYO2018
Veri Bilimi Yaz Okulu