Laptop price-prediction with machine learning

Java for Data Science: Thema 4

Author: Fabian Kuhn

Date: 2020-06-07

This project shows, how data can be scraped using silenium. Aim ist to receive information about laptops from galaxus to predict prices with machine learning. After scraping the data, it is prepared and then analysed with RapidMiner.

Documentation

  • Presentation documentation: TI-Präsentation.pdf
  • SQL Results in H2
    • Install Dependencies: $ mvn install
    • Run Server: $ mvn spring-boot:run
    • Open H2 console: H2-Console
    • JDBC URL: jdbc:h2:./db_scrape
    • Username: test
    • Password: <empty>
  • SQL Raw Data Export: PRODUCTRAW.sql
  • SQL Processed Data Export: PRODUCT.sqll

Project structure

├── _docs
│   ├── TI-Präsentation.pdf
│   └── model.png
├── db_scrape.mv.db
├── pom.xml
└── src
    └── main
        ├── java
        │   └── ch
        │       └── zhaw
        │           └── ti
        │               ├── TiApplication.java
        │               ├── product
        │               │   ├── Product.java
        │               │   ├── ProductConverter.java
        │               │   └── ProductRepository.java
        │               ├── productRaw
        │               │   ├── ProductRaw.java
        │               │   ├── ProductRawRepository.java
        │               │   └── ProductRawService.java
        │               └── scrape
        │                   ├── ProductScraper.java
        │                   ├── UrlScraper.java
        │                   └── WebDriverConfig.java
        └── resources
            ├── PRODUCT.sql
            ├── PRODUCTRAW.sql
            ├── application.properties
            └── geckodriver

Gradient Boost Model

Gradient Boost Model

H2 Queries Used

Write products to csv

call CSVWRITE ( '/Users/fabiankuhn/Desktop/products.txt', 'SELECT * FROM Product' ) 

Export SQL Statments

script simple columns to '/Users/fabiankuhn/Desktop/Product.sql' table "Product";

Number of features

SELECT count(*) FROM INFORMATION_SCHEMA.Columns where TABLE_NAME = 'PRODUCTRAW'

See Progress while scraping

select (100.0*count(displayResolution))/count(*) from productraw

See Progress

SELECT 
    (select count(*) 
    from productraw
    where displayresolution is not null or deviceweight is not null) AS finished,
    count(*) as sum
FROM
    productraw

ModelMapper

The Modelmapper is used for data conversion. Specifically the Converter helps to map attributes to a new entity.

Converter

https://www.gitmemory.com/issue/modelmapper/modelmapper/464/490810070

Code works with Lambda conversion

// Code works with Lambda conversion
Converter<ProductRaw, Product> converter = context -> {

    // Get Source and Destination
    ProductRaw productRaw = context.getSource();
    Product product = context.getDestination();
}

modelMapper.emptyTypeMap(ProductRaw.class, Product.class).setConverter(converter);

Code works with full implementation of Convertor Class

Converter<ProductRaw, Product> converter2 = new Converter<>(){

    @Override
    public Product convert(MappingContext<ProductRaw, Product> mappingContext) {
        ProductRaw productRaw = context.getSource();
        Product product = context.getDestination();
        return null;
    }
};

modelMapper.addConverter(converter);

Skip properties

PropertyMap<ProductRaw, Product> skipMap = new PropertyMap<ProductRaw, Product>(){
    @Override
    protected void configure() {
        skip(destination.getPreis());
        skip(destination.getGewicht());
    }
};

Add skipMap

typeMap.addMappings(skipMap);

Add Converter in addition to typeMap

TypeMap<ProductRaw, Product> typeMap = modelMapper.createTypeMap(ProductRaw.class, Product.class);
typeMap.setPreConverter(converter);