/Rental-Listing-Inquiries

kaggle Two Sigma Connect: Rental Listing Inquiries

Primary LanguagePython

After learned solutions by others,especially Little Boat,simple but effective!!amazing!

I spend a lot of time FE,and submit more times then Faron,Little Boat et.

I'll thanks KazAnova who public the leak,otherwise I can't do this better.You are my idol.

My English is poor,if I write confused,please tell me,many thanks.haha..

FE:

I split the base features two classes:

manager:created,description,price,et

building:bathrooms,bedrooms,latitude,longitude,display_address,featuers,photos,et

Then I link and compare them one by one.

My best single model at: https://github.com/plantsgo/Rental-Listing-Inquiries

score:Public:0.50379 Private:0.50500

you should add train.json and test.json in the folder.

the jpgs.json is the shape of each photos.

the listing_image_time.csv is the leak @KazAnova said.

1.run sigma.py to create the csv file.

2.run script.py to create the features what @gdy5 show.

3.run feature_tt.py to create the base features.

4.run feature_tt_long.py to create four features which spend long time,about four hours...but I have give it which named timeout.csv,so you can skip it....

5.run xgb.py and will create the last result.

Transform:

1.X

2.log10(X+1)

My best nn model is log10(X+1) score LB:0.535 before add magic feature.

Ensemble: Level 2:

I have 4 datasets:

1.My best single model.

2.some features which not improve at my best model ,but can improve at model with base features.

3.@gdy5 's kernel with some of my features.

4.@Branden Murrayit 's kernel add some of my features.

①:each dateset I used [xgb,nn,gb,rf,et,lr,xgb_reg,lgb_reg,nn_reg] cv flod=5

the reg model have a good importance in my model.

②:and I merge high and medium level ,then userd[lgb,nn,lgb_reg,nn_reg,et,rf] in my best dataset. cv flod=5

③:[nn,nn_reg,xgb,gb,rf,et,lr,xgb_reg]@last three datasets cv flod=5

④:[nn,nn_reg,xgb,gb,rf,et,lr,xgb_reg]add magic feature @last three datasets cv flod=5

⑤:[nn,nn_reg,xgb,knn,gb,rf,et,lr,ada_reg,rf_reg,gb_reg,et_reg,xgb_reg]@last three datasets cv flod=10

Level 3:

1.user ①,②,③,④ as metefeatures with xgb,nn,et.

with a feature from description,Classify the source by description: begin with " "

CooperCooper.com

p a website_redacted

li /ul /p

it improved at public but turn bad at pravate.Maybe can remove it.

pre=((xgb^0.65)*(nn^0.35))0.85+et0.15 then userd @weiwei 's Prior correction. but only improved 0.00001-0.00002

2.user ①,②,⑤ as metefeatures with xgb,nn,et. pre=((xgb^0.65)*(nn^0.35))0.85+et0.15

Level 4: 50/50 average level 3

Last,Thanks all shares,I learned many from the kernels and discussions.