mrabarnett/mrab-regex

Enhance API of captures() to enable retrieval of ALL groups at once, as a dictionary

mrabarnett opened this issue · 5 comments

Original report by Marcin Wojnarski (Bitbucket: mwojnars, GitHub: mwojnars).


Hi,

For non-repeated groups, one can use match.groupdict() to retrieve a dictionary of ALL groups and their values, including un-matched groups. But there is no equivalent for repeated groups: match.captures() only returns values for groups given explicitly in arguments, while groupdict() doesn't include multiple values.

I suggest either:

  1. Change API of captures() so that captures() (no args) returns a dictionary of ALL groups, not just group 0 - this would be the most convenient and intuitive, but would break existing code if somebody relies on this feature.

  2. Add a boolean argument to captures(), say "all", equal False by default, to let the client indicate that a full dictionary is expected.

  3. Add new method, say capturesdict() to return dict of all groups.

Thanks
Marcin

What version of the product are you using? On what operating system?

0.1.20130120
Linux, Python 2.7.2

Original comment by Anonymous.


Should the dict behave like this?

capturesdict = {}
for name in m.groupdict().keys():
    capturesdict[name] = m.captures(name)

What's your usecase? Could you provide some examples of the suggested feature?

Original comment by Marcin Wojnarski (Bitbucket: mwojnars, GitHub: mwojnars).


Yes, it should behave in this way.

Usecase: web scraping, extraction of many different values from a complex html page in one go (for example, profile page of a product, with different properties listed in a fixed layout) - after applying a regex the next step is to take *all* extracted data as a dict, not one by one.

Original comment by Anonymous.


Could you provide some simple test cases?

I think it'll be called 'capturesdict'.

Original comment by Anonymous.


I've added a 'capturesdict' method to match objects in regex 0.1.20130124.

Original comment by Marcin Wojnarski (Bitbucket: mwojnars, GitHub: mwojnars).


Great, thanks for all the changes and for very useful library.