e181337

Data Scientist

İstanbul

Pinned Repositories

anagram_check
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase. In other words, both strings must contain the same exact letters in the same exact frequency. Write a python script that reads 2 strings from command line and finds out whether they are anagrams or not. If they are not anagrams, then the script should find and print the minimum number of character deletions required to make the two strings anagrams. Otherwise, just print that they are anagrams. **Input Format** - The first line contains a single string, **a**. - The second line contains a single string, **b**. Expected input and output: ``` $ python3 solution.py a: Tom Marvolo Riddle b: I Am Lord Voldemort remove 7 characters from 'Tom Marvolo Riddle' and 8 characters from 'I Am Lord Voldemort' $ python3 solution.py a: tom marvolo riddle b: i am lord voldemort remove 0 characters from 'tom marvolo riddle' and 1 characters from 'i am lord voldemort' $ python3 solution.py a: tom marvolo riddle b: i am lordvoldemort they are anagrams $ python3 solution.py a: tom riddle b: voldemort remove 3 characters from 'tom riddle' and 2 characters from 'voldemort' ```
Language:Python0 1 00
bigquery_mysql_connect
Create an ETL job with python. The python file has to retrieve data from BigQuery piece by piece (10k, 100k, etc.) Data can be stored in any relational (MySQL.) databases on the locale. o The file contains two date parameters: batch and realtime. 'batch’ parameter should get the past data and write to a database as fast as possible. Please, measure its time and improve the performance (Hint: Parallel Processing). realtime parameter should get the last day. o The file has to be robust in terms of logging and try-except mechanisms (DBs connections, etc.).
Language:Jupyter Notebook0 1 00
clustering_categorical_data
Discover different segments of sessions which differ from each other by their navigational patterns before adding a product to the baskets. You are free to differentiate your segments based on category id or domain name of the products, if you feel necessary.Dimension reduction is also applied.
Language:Jupyter Notebook0 1 01
construct_sentence_with_string
It is used to test whether given sentence can be constructed with available strings or not.
Language:Python0 1 00
credit_fraud_catboost
Catboost model is applied for imbalanced data set
Language:Jupyter Notebook0 1 00
data_analysis
In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection with IGR is applied for numerical columns. Chi-square test is applied for categorical columns in order to test whether there exist differences between distributions for target columns. Correlation analysis for an imbalanced data set is applied by using undersampling methods.
Language:Jupyter Notebook0 1 02
linkfire_data_analysis
Our goal is to understand this traffic better, in particular the volume and distribution of events, and to develop ideas how to increase the links' clickrates.
Language:Jupyter Notebook0 1 00
navigation_pattern_estimation
Come up with a prescriptive model that is able to give directions on how to maximize the “Purchase Completed” probability of a session. For example, at which state of a session what kind of directions may be given to customers, which patterns contributes at most to “purchase completed” probability etc.
Language:Jupyter Notebook0 1 00
python_hive_connection
Writing pandas df to hive db by using pyhive library. Kerberos authentication is used to reach cluster.
Language:Jupyter Notebook1 1 10
python_hive_sqlalchemy_connection
I will show how to connect kerberized hadoop cluster by using sqlalchemy library. Connection engine will be generated and used to write df to the database.
Language:Jupyter Notebook00

e181337's Repositories

e181337/python_hive_connection
Writing pandas df to hive db by using pyhive library. Kerberos authentication is used to reach cluster.
Language:Jupyter Notebook1 1 10
e181337/anagram_check
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase. In other words, both strings must contain the same exact letters in the same exact frequency. Write a python script that reads 2 strings from command line and finds out whether they are anagrams or not. If they are not anagrams, then the script should find and print the minimum number of character deletions required to make the two strings anagrams. Otherwise, just print that they are anagrams. **Input Format** - The first line contains a single string, **a**. - The second line contains a single string, **b**. Expected input and output: ``` $ python3 solution.py a: Tom Marvolo Riddle b: I Am Lord Voldemort remove 7 characters from 'Tom Marvolo Riddle' and 8 characters from 'I Am Lord Voldemort' $ python3 solution.py a: tom marvolo riddle b: i am lord voldemort remove 0 characters from 'tom marvolo riddle' and 1 characters from 'i am lord voldemort' $ python3 solution.py a: tom marvolo riddle b: i am lordvoldemort they are anagrams $ python3 solution.py a: tom riddle b: voldemort remove 3 characters from 'tom riddle' and 2 characters from 'voldemort' ```
Language:Python0 1 00
e181337/bigquery_mysql_connect
Create an ETL job with python. The python file has to retrieve data from BigQuery piece by piece (10k, 100k, etc.) Data can be stored in any relational (MySQL.) databases on the locale. o The file contains two date parameters: batch and realtime. 'batch’ parameter should get the past data and write to a database as fast as possible. Please, measure its time and improve the performance (Hint: Parallel Processing). realtime parameter should get the last day. o The file has to be robust in terms of logging and try-except mechanisms (DBs connections, etc.).
Language:Jupyter Notebook0 1 00
e181337/clustering_categorical_data
Discover different segments of sessions which differ from each other by their navigational patterns before adding a product to the baskets. You are free to differentiate your segments based on category id or domain name of the products, if you feel necessary.Dimension reduction is also applied.
Language:Jupyter Notebook0 1 01
e181337/construct_sentence_with_string
It is used to test whether given sentence can be constructed with available strings or not.
Language:Python0 1 00
e181337/credit_fraud_catboost
Catboost model is applied for imbalanced data set
Language:Jupyter Notebook0 1 00
e181337/data_analysis
In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection with IGR is applied for numerical columns. Chi-square test is applied for categorical columns in order to test whether there exist differences between distributions for target columns. Correlation analysis for an imbalanced data set is applied by using undersampling methods.
Language:Jupyter Notebook0 1 02
e181337/linkfire_data_analysis
Our goal is to understand this traffic better, in particular the volume and distribution of events, and to develop ideas how to increase the links' clickrates.
Language:Jupyter Notebook0 1 00
e181337/navigation_pattern_estimation
Come up with a prescriptive model that is able to give directions on how to maximize the “Purchase Completed” probability of a session. For example, at which state of a session what kind of directions may be given to customers, which patterns contributes at most to “purchase completed” probability etc.
Language:Jupyter Notebook0 1 00
e181337/python_hive_sqlalchemy_connection
I will show how to connect kerberized hadoop cluster by using sqlalchemy library. Connection engine will be generated and used to write df to the database.
Language:Jupyter Notebook00
e181337/data_enhancement
Data Quality: How would you improve the data quality of this data set, what are your main conclusions about the data quality? What interventions have you done on the data set before analysing further? What did you learn?
Language:Jupyter Notebook1 0
e181337/e181337
Config files for my GitHub profile.
1 0
e181337/LSTM_binary_classification
Example LSTM structure for binary classification.
1 0
e181337/top_seller_class
Write a python class using pandas that finds and prints: top seller n products in given date range (product name & quantity), top seller n stores in given date range (store name & quantity), top seller n brands in given date range (brand & quantity), top seller n cities in given date range (city & quantity)
Language:Python1 0

e181337

Pinned Repositories

anagram_check

bigquery_mysql_connect

clustering_categorical_data

construct_sentence_with_string

credit_fraud_catboost

data_analysis

linkfire_data_analysis

navigation_pattern_estimation

python_hive_connection

python_hive_sqlalchemy_connection

e181337's Repositories

e181337/python_hive_connection

e181337/anagram_check

e181337/bigquery_mysql_connect

e181337/clustering_categorical_data

e181337/construct_sentence_with_string

e181337/credit_fraud_catboost

e181337/data_analysis

e181337/linkfire_data_analysis

e181337/navigation_pattern_estimation

e181337/python_hive_sqlalchemy_connection

e181337/data_enhancement

e181337/e181337

e181337/LSTM_binary_classification

e181337/top_seller_class