No API, no problem. Jsouper helps you parse HTML into Java objects. It borrows from Square's Moshi and is powered by jsoup:
Document document = Jsoup.connect("https://play.google.com/store").get();
Jsouper jsouper = new Jsouper.Builder().build();
ElementAdapter<Movie> elementAdapter = jsouper.adapter(Movie.class);
Movie movie = elementAdapter.fromElement(document);
System.out.println(movie);
or with Retrofit:
Retrofit retrofit = new Retrofit.Builder().baseUrl(PlayStoreApi.BASE_URL)
.addConverterFactory(JsoupConverterFactory.create())
.build();
playStoreApi = retrofit.create(PlayStoreApi.class);
Call<List<Movie>> movies = playStoreApi.getMovies();
See the sample module for a Java and Android example which uses Jsouper to build a Google Play Movies UI just from hitting the website:
Unlike Moshi or Gson, there is a lot more work required to parse HTML to your Java objects. This is because HTML tags and attributes rarely map directly to how you want to structure your objects.
At minimum, you will need to define a custom ElementAdapter
for your most primitive classes. query()
is used to define the primary key for identifying the top-most Element
that maps to your Java object - it gets called using jsoup's Element.select(query)
. You will probably need to familiarize yourself with how to extract attributes with jsoup before proceeding.
From each valid Element
from this query, define how the object is constructed in fromElement
:
public class CoverAdapter extends ElementAdapter<Cover> {
@Override
public String query() {
return "cover";
}
@Override
public Cover fromElement(Element element) throws IOException {
final String imageUrl =
element.select("div.cover-inner-align").select("img").first().attr("data-cover-large");
final String targetUrl = element.select("a.card-click-target").attr("href");
return new Cover(imageUrl, targetUrl);
}
}
Objects composed by objects you've already defined ElementAdapter
's for can be generated by Jsouper. You still need to define the query
parameter, but this can be done with the @SoupQuery
annotation in your model class declaration:
@SoupQuery("div.card.no-rationale.tall-cover.movies.small")
public class Movie {
public final Cover cover;
public final Detail detail;
public final Rating rating;
public Movie(Cover cover, Detail detail, Rating rating) {
this.cover = cover;
this.detail = detail;
this.rating = rating;
}
}
...
ElementAdapter<Movie> movieAdapter = jsouper.adapter(Movie.class);
Movie movie = movieAdapter.fromElement(document);
There is no serialization support at the moment.
Jsouper supports 2 ways to register an adapter. Add it explicitly when you build Jsouper:
Jsouper jsouper = new Jsouper.Builder()
.add(Cover.class, new CoverAdapter())
.build();
Or annotate it in your model class:
@SoupAdapter(CoverAdapter.class)
public class Cover {
public final String imageUrl;
public final String targetUrl;
}
Jsouper currently has built-in support for
- Collections, Lists, Sets
Contributions to add support for Maps and Arrays (and anything else that is currently missing) are welcome.
Collections do require specifying the parameterized type (this is handled automatically with the Retrofit converter):
Type listOfMoviesType = Types.newParameterizedType(List.class, Movie.class);
ElementAdapter<List<Movie>> moviesAdapter = jsouper.adapter(listOfMoviesType);
List<Movie> movies = moviesAdapter.fromElement(Jsoup.connect("https://play.google.com/store").get());
movies.forEach(System.out::println);
Snapshot builds are currently available in Sonatype's snapshots
repository. Get the latest JAR or depend via Maven:
<dependency>
<groupId>com.ekchang.jsouper</groupId>
<artifactId>jsouper</artifactId>
<version>0.0.3-SNAPSHOT</version>
</dependency>
or Gradle:
compile 'com.ekchang.jsouper:jsouper:0.0.3-SNAPSHOT'
compile 'org.jsoup:jsoup:1.9.1'
Retrofit2 converter is also available:
compile 'com.ekchang.jsouper:retrofit-converter-jsouper:0.0.3-SNAPSHOT'
compile 'com.squareup.retrofit2:retrofit:2.0.2'
Copyright 2016 Erick Chang
Copyright 2015 Square, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.