Addicition-Prediction: A Python repository from aviban12

Problem statement : Develop a machine learning algorithm to predict drug addiction probability based on different features.

Approach:
	Data Preprocessing 
		1 - Sort the entire data on the basis of year
		2 - Create different list :
			 List1 - In LocationDesc column mark unique index for every new value.
			 List2 - Extract words which are related to intoxication from Greater_Risk_Question , Description manually.
			 List3 - In Sex column mark "Male" as 1 , "Female" as 2 and invalid or nan data as 0.
			 List4 - In SatisfactionType column mark unique index for every value. 
		3 - In GeoLocation column  seprate longitude and latitude and create two list
			 List5 - Longitude 
			 List6 - Latitude
		4 - In Question Code column remove alphabatic data from each code.

	Feature Extraction :
		1 -  Count number of words related to intoxication in Greater_Risk_Question column and Description.
		     Assumption - If a person consuming different type of drug then probability may increase to be addicted.
		2 -  Save all the preprocessed data , extracted data and store in a csv file.
	Algorithm Applied:
		1 - Linear Regression gives an accuracy of 68.53 on the provided test cases.
		2 - RandomForesetRegressor gives an accuracy of 93.58 on the provided test cases.

Feature Engineering:
	   Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work.
	   In the given dataset their are two type of data 
		1 - Text Data
		2 - Numeric Data
	  From text data we are determining the count of drug a person is cosuming and how much amount of drug he is consuming. Greater the number of count of drug and sample size greater will be the possibility of addiction.

IDE Used :
	Pycharm Edu
Operating System :
	Linux (ubuntu 18.04)
Tools Used :
	Numpy
	Pandas
	TextBlob
	re
Exceution : Exeute Run.py file using python3 Run.py command on Linux.

aviban12/Addicition-Prediction