SDV-Project---F.E.-Process-Helper

Sycain Vizerfur continuous updating

BRIEF INTRODUCTION:

This is a package used for engineering feature process, which could make it easier for recognizing and processing the original data. (v2: add a nlp package named lp provided some functions to process the nature language words.)

COMPONENTS:

`fea_eng` : Mainbody part of SDV package involved all funtions.

`lp` : Support functions of nature language process.

`support` : Support functions of fea_eng.

Functions:

Function list:

['del_unique_col','del_none_col','find_mul_class_col','translate','none_values_description','one_hot_encoder','data_info_desc']/['del_punctuation','stmmerized','del_stopwords','text_fully_process']

Demonstration Sample: example.ipynb

Functions Description:

Fea_eng:

No.	Function Name	Description
1	`del_unique_col(df)`	delete columns those values are just a single one.
2	`del_none_col(df,threshold = 0.5)`	delete those columns whose none values number is bigger than threshold
3	`find_mul_class_col(df,threshold_of_category_num = 100,rand_num = 500)`	A function for filter the object features that contain so many categories.Also make a infomation analysis for each object features using reguler expression.We divied the infomation to three types, number, alphabet and others.And then the function return a dataframe. Dataframe have 5 columns. The meauring scaler for data structure is Percentage
4	`translate(s)`	translate function. if input s is a string, then function will return a translated string. If s is a dataframe, then the function will return a translation dataframe of s.columns.
5	`none_values_description(df,filter_ = False)`	Count np.nan type values of dataframe and return by percentage
6	`one_hot_encoder(df,encode_list = [])`	return a processed dataframe.If encode_list(default blank list) is not appointed, the function will processing all the object features of input datafrme. If not, then just proceesing the encode_list givend
7	`data_info_desc(df)`	A all-sided description of dataframe, which is just a colloction of SVD.fea_eng function.

lp:

No.	Function Name	Description
1	`del_punctuation(text,language = 'english')`	[input:string,output:list] input a long string and output a list without punctuation and /n(enter).Both operated amony Chinese and English. --------args: language : 'english'(default) or 'chinese'
2	`stmmerized(word_list,language = 'english')`	[input:list,output:list] A stemmer extracts the root form of a given word. In a word, simplifing the decorated word. --------args:language : 'english'(default) or 'chinese',if lanuage == 'chinese', then just return the original word_list
3	`del_stopwords(word_list)`	just delete the stopwords from words list, and return a new list
4	`text_fully_process(text,language)`	[input:string,output:list] intergration of language processing. --------args:language : 'english'(default) or 'chinese'

StrangeData-v/SDV-Project---F.E.-Process-Helper