Enhance API of captures() to enable retrieval of ALL groups at once, as a dictionary
GoogleCodeExporter opened this issue · 5 comments
GoogleCodeExporter commented
Hi,
For non-repeated groups, one can use match.groupdict() to retrieve a dictionary
of ALL groups and their values, including un-matched groups. But there is no
equivalent for repeated groups: match.captures() only returns values for groups
given explicitly in arguments, while groupdict() doesn't include multiple
values.
I suggest either:
1. Change API of captures() so that captures() (no args) returns a dictionary
of ALL groups, not just group 0 - this would be the most convenient and
intuitive, but would break existing code if somebody relies on this feature.
2. Add a boolean argument to captures(), say "all", equal False by default, to
let the client indicate that a full dictionary is expected.
3. Add new method, say capturesdict() to return dict of all groups.
Thanks
Marcin
What version of the product are you using? On what operating system?
0.1.20130120
Linux, Python 2.7.2
Original issue reported on code.google.com by mwojn...@gmail.com
on 23 Jan 2013 at 6:16
GoogleCodeExporter commented
Should the dict behave like this?
capturesdict = {}
for name in m.groupdict().keys():
capturesdict[name] = m.captures(name)
What's your usecase? Could you provide some examples of the suggested feature?
Original comment by re...@mrabarnett.plus.com
on 23 Jan 2013 at 6:57
- Added labels: ****
- Removed labels: ****
GoogleCodeExporter commented
Yes, it should behave in this way.
Usecase: web scraping, extraction of many different values from a complex html
page in one go (for example, profile page of a product, with different
properties listed in a fixed layout) - after applying a regex the next step is
to take *all* extracted data as a dict, not one by one.
Original comment by mwojn...@gmail.com
on 23 Jan 2013 at 11:40
- Added labels: ****
- Removed labels: ****
GoogleCodeExporter commented
I've added a 'capturesdict' method to match objects in regex 0.1.20130124.
Original comment by re...@mrabarnett.plus.com
on 24 Jan 2013 at 8:30
- Changed state: Fixed
- Added labels: ****
- Removed labels: ****
GoogleCodeExporter commented
Could you provide some simple test cases?
I think it'll be called 'capturesdict'.
Original comment by re...@mrabarnett.plus.com
on 24 Jan 2013 at 2:01
- Changed state: Accepted
- Added labels: ****
- Removed labels: ****
GoogleCodeExporter commented
Great, thanks for all the changes and for very useful library.
Original comment by mwojn...@gmail.com
on 25 Jan 2013 at 10:59
- Added labels: ****
- Removed labels: ****