/jpr_01

Java PCRE Library

Primary LanguageJavaApache License 2.0Apache-2.0

jpr_01

The library can be used to retrieve the offsets of the input string that are matched against the given regular expression, and it can also retrieve multiple named groups within one matching operation. This information can be used for extracting information from the input string, or replacing parts of the input string.

Building

Tested on Fedora 36

dnf install pcre2-devel
mvn clean package

Adding the library to your project

To use the library, it can be added via Maven:

<dependency>
      <groupId>com.teragrep</groupId>
      <artifactId>jpr_01</artifactId>
      <version>3.0.1</version>
</dependency>

After Maven has downloaded and setup the package, it can be imported and the JavaPcre object instantialized:

import com.teragrep.jpr_01.JavaPcre;

JavaPcre pcre = new JavaPcre();

Pattern compilation

The regex matcher can be compiled for a given pattern using standard PCRE2 syntax

final String pattern = "^Hello.*$";
pcre.compile_java(pattern);

To clear the compiled pattern, the jcompile_free() function can be used

pcre.jcompile_free();

Matching

To attempt matching the pattern, use the singlematch_java() function. The second argument specifies the offset, from which point of the String to start matching. To start from the beginning, use 0 (zero).

final String input = "Hello this is a input";
pcre.singlematch_java(input, 0);

Extraction

If named capture groups are used, the results can be found via the match table and name table. The match table contains all the matches with unique IDs. Name table connects such unique IDs with any capture groups if given.

Map<String, Integer> nameTable = pcre.get_name_table(); // e.g. NameOfTeacher=1, NameOfAssistant=2
Map<Integer, String> matchTable = pcre.get_match_table(); // e.g. 1="Anna", 2="Robert"

For example, these maps can be used to get the groups and their contents mapped into one:

final Map<String, String> results = new HashMap<>();
for (final Map.Entry<String, Integer> me : nameTable.entrySet()) {
    final String value = matchTable.get(me.getValue());
    final String name = me.getKey();
    results.put(name, value); // e.g. columnName=value
}

Match checking and location of matches

To check whether or not there is a match, the get_matchfound() function can be used.

boolean hasMatch = pcre.get_matchfound();

For multiple matches, the get_matchfound() function can be used in conjunction with the singlematch_java()'s offset argument.

int offset = 0;
pcre.singlematch_java(inputStr, offset);
while (pcre.get_matchfound()) {
    offset = pcre.get_ovector1();
    pcre.singlematch_java(inputStr, offset);
}

Offsets

The offsets (match starting index, match ending index) can be grabbed with the get_ovector0() and get_ovector1() functions.

int start = pcre.get_ovector0();
int end = pcre.get_ovector1();

Contributing

You can involve yourself with our project by opening an issue or submitting a pull request.

Contribution requirements:

  1. All changes must be accompanied by a new or changed test. If you think testing is not required in your pull request, include a sufficient explanation as why you think so.

  2. Security checks must pass

  3. Pull requests must align with the principles and values of extreme programming.

  4. Pull requests must follow the principles of Object Thinking and Elegant Objects (EO).

Read more in our Contributing Guideline.

Contributor License Agreement

Contributors must sign Teragrep Contributor License Agreement before a pull request is accepted to organization’s repositories.

You need to submit the CLA only once. After submitting the CLA you can contribute to all Teragrep’s repositories.