/spring-git-scraper

An app that extracts user data from GitHub.

Primary LanguageJava

GitScraper

An app that scrapes user data from GitHub.

Table of contents

Overview

The challenge

Users should be able to:

  • Retrieve API data about a GitHub user by pinging a REST endpoint.
  • See the user data displayed as JSON.

UML Diagram

Architecture Diagram

Screenshot

My process

Built With

  • Spring Boot v2.7.7
    • Spring Web
    • Spring Cache
    • Spring Retry v1.3.2
    • Spring AOP (required dependency for Retry)
  • Java 11
  • Project Lombok
  • Testing
    • JUnit 5
    • AssertJ v3.24.1
  • Springfox API Documentation v3.0.0
  • GitHub REST API

How to Scrape Data - Native

  1. Start the app using the ./gradlew bootRun command
    • If on Windows, run: gradle bootRun
  2. Ping the REST endpoint with command: curl -v localhost:8080/scraper/api/v1/git/${username} | json_pp, or use Postman.
    • Replace ${username} with a valid Github username String
  3. The endpoint will return your desired user data as JSON.

How to Scrape Data - Docker (Optional)

  1. Ensure you have Docker installed, and if you dont, go here
  2. Pull the image from my Docker Hub: docker pull belum/spring-git-scraper:latest
  3. Check if the image was downloaded successfully: docker images
  4. Run the image with: docker run -it -p8080:8080 belum/spring-git-scraper:latest
  5. Interact with the endpoint using the Native instructions

What I Learned

I learned how to get Jackson JSON to serialize JDK 8 Date/Time types.

    implementation 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.13.4'
    testRuntimeOnly 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.13.4'
@Configuration
public class ApplicationConfig {
  @Bean
  public ObjectMapper objectMapper() {
    ObjectMapper mapper = new ObjectMapper();
    mapper.registerModule(new JavaTimeModule());
    return mapper;
  }
}

I learned how to cache the data using Spring Cache.

implementation 'org.springframework.boot:spring-boot-starter-cache'
@Configuration
@EnableCaching
public class ApplicationConfig {
  @Bean
  public CacheManager cacheManager() {
    SimpleCacheManager cacheManager = new SimpleCacheManager();
    cacheManager.setCaches(List.of(
            new ConcurrentMapCache("users"),
            new ConcurrentMapCache("repos")
    ));
    return cacheManager;
  }
}
@Component
public class GithubClientImpl implements GithubClient {

  private HttpEntity<String> httpEntity() {
    HttpHeaders headers = new HttpHeaders();
    headers.set("Cache-Control", "public, max-age=60, s-maxage=60");
    return new HttpEntity<>(headers);
  }
}
@Service
public class GithubServiceImpl implements GithubService {
    
  @Override
  @Cacheable(value = "users")
  public GitUser getUserData(String username) {
      //blank for brevity
  }

  @Override
  @Cacheable(value = "repos")
  public List<GitRepo> getRepoData(String username) {
    //blank for brevity
  }
}

Useful resources

Author