sagemath/sage

RealSet: Faster operations by scan-line (merging) techniques

mkoeppe opened this issue · 73 comments

Adapt from:

  1. Extend Union and Intersection from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time the random generator and timing test code can find in this page:
Function Dataset Scan-line original
union 1000 closed bounded disjoint interval 0.005281 0.00749
union 1000 open bounded disjoint interval 0.004339 0.006646
union 1000 mixed bounded disjoint interval 0.005025 0.006646
intersection 300 closed bounded disjoint interval 0.001182 4.376048
intersection 300 open bounded disjoint interval 0.001225 4.359314
intersection 300 mixed bounded interval 0.001537 4.407478
  1. We used scan-line methods to replace the original normalize and also make normalize optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
  2. Replace difference,symmetric_difference and are_pairwise_disjoint with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
  3. Implement convex_hull and is_connected in RealSet.

Depends on #34252

CC: @yuan-zhou

Component: geometry

Author: Yueqi Li, Yuan Zhou

Branch/Commit: 1ef7dd1

Reviewer: Matthias Koeppe

Issue created by migration from https://trac.sagemath.org/ticket/32181

Branch pushed to git repo; I updated commit sha1. New commits:

Description changed:

--- 
+++ 
@@ -2,3 +2,7 @@
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/intervals.py
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/fast_piecewise.py
 
+1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the intersection of two realsets took 10.9 seconds on average, and new methods only took 0.018 seconds on average, union of five realsets, the original method took 0.052 seconds, and new methods only took 0.041 seconds.
+2. We used scan-line methods to replace the original `normalize` and also make `normalize` optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
+3. Replace `difference`,`symmetric_difference` and `are_pairwise_disjoint` with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
+4. Implement `convex_hull` and `is_connected` in `RealSet`. 

Changed commit from 625ac58 to 33385c8

Branch pushed to git repo; I updated commit sha1. New commits:

31613391. fixed pycode test fail
33385c8Adding examples for helper funtion `_scan_line_union`, `_scan_line_intersection`, `_scan_line_difference`, `_scan_line_symmetric_difference`
comment:8

If you merge the most current beta (9.7.beta6), you can use https://trac.sagemath.org/wiki/ReleaseTours/sage-9.7#Validatingdocstringwith.sage-tox-erst

Branch pushed to git repo; I updated commit sha1. New commits:

61862a9Change scan-line methods to tree structure, improve inner function's documentation

Changed commit from 33385c8 to 61862a9

comment:10

tox -e relint will tell you that EXAMPLE:: should be EXAMPLES:: according to our style guide (whether there is a single or there are multiple examples)

comment:11

The current code does not pass the doc-tests.
Also, please follow https://doc.sagemath.org/html/en/developer/coding_basics.html. For example, in the text part, convex_hull should be fully-referenced so that it is clickable after build, i.e. :meth:~sage.sets.real_set.RealSet.convex_hull

comment:12

I took a look through the code, but didn't run the code.
I have some first comments.

  • The patchbot currently shows 31 doctests failures and 4 lint errors. To be fixed!
  • In the docstrings of _scan_left_endpoint and _scan_right_endpoint: What does an event mean. Reformat the text after output. Describe the value of delta. Include example(s) where tag is given.
  • RealSet constructor with option normalized=True. What does this assume? Describe it in the docstring of __classcall__ or that of RealSet. Mention that is does not apply to the manifold case. Include doctests/tests where normalized=True is given, and examples where arguments are of type InternalRealInterval.
  • I have doubts about where if not normalized is treated. (But of course, this depends on how you define normalized=True). I thought if normalized=True and it's not of the manifold case (lines 1267-1283), one can just return UniqueRepresentation.__classcall__(cls, *intervals) before line 1199 intervals = [], which skips all treatments on args.
  • When normalized is not True, may simplify and insert the code in def normalize here, then remove the normalize function as it is only used once here. Alternatively, keep def normalize and move all treatment of args from __classcall__ to normalize.
  • RealSet.__classcall__: Line 1188-1190 change to just: normalized = kwds.pop('normalized', False). L1194: "... take no keyword arguments other than 'normalized'.""
  • RealSet._prep: L1710 elif. L1717 elif
  • RealSet doesn't need methods _scan_left_endpoint and _scan_right_endpoint. These are methods of the class InternalRealInterval.
  • RealSet._scan_interval Rename the function, to something like _scan, or _scan_intervals, or _generate_events. Rewrite the function: for i in self._intervals: yield i._scan_left_endpoint(tag); yield i._scan_right_endpoint(tag). No more merge/sorting needed.
  • Ideally, all scan methods are generators. Currently, some are generators, some return lists.
  • In _scan_line_union, lines 2104-2122, the two while loops look very suspicious to me. I think they do not belong here. Moreover, the lines 2118-2119 in the second while loop might be wrong. You claimed that these two while loops take care of the special cases RealSet(x=-infinity) and RealSet(x<=-infinity), but the input scan to this function should never be in such form. Such empty (nonsense) special cases should already been eliminated in around line 1209 in __classcall__
  • In union_of_realsets and other similar functions: Lines 2175 to 2178 can be simplified to scan = merge(RealSet(s)._scan() for s in real_set_collection)
  • _scan_line_intersection. Add more special examples to doctest to cover the cases where intersection is just one point, or just empty. For example: [1,2] ∩ [2,3], [2,3] ∩ (3,4), (3,4) ∩ [4,5), [4,5) ∩ (5,6). Lines 2252-2261 (and other similar places) can be simplified to
if now_on:
  (on_x, on_epsilon) = (x, epsilon) 
else: 
  if was_on and (on_x, on_epsilon) < (x, epsilon): 
    yield InternalRealInterval(...)
  (on_x, on_epsilon) = (None, None)
  • Questions about 'tag'. It is said to be a real number in the docstrings of many function, but is used as True or False for the difference function only. I think tag can be eliminated throughout the code (and do some special treatment for the difference.) Also, why is it called index at some places? That's not used either, so can be removed as well. On a related note, given a collection of real sets, after generating and merging the their scan, you can no longer match the list of events back to the original real sets, because index was not passed to tag in the events. I do not like the current design.
  • _scan_line_union, _scan_line_intersection, _scan_line_symmetric_difference (and part of are_pairwise_disjoint) all look redundant. They all can be replaced by a new method RealSet._scan_to_intervals(scan, indicator) that takes scan as a generator of sorted events and indicator as a number 1 or n (or perhaps as a lambda function for are_pairwise_disjoint), and outputs a tuple intervals of type InternalRealInterval.
  • According to your ticket description, the old union method is slower than the new one. Is this true for the union of two real sets? If so, why is the old one kept in the code? If not, please update the description of the ticket, and include comments in the code. In general, are there examples where the original method is faster? Please specify the examples, and change the code to choose strategically between the two versions according to the input.
  • def convex_hull: correct but wasteful. One does not to sort all intervals, only the left most and right most intervals of each real set matter.
  • def is_connected docstring: Return whether self is a connected set. OUTPUT: Boolean. True if self is a single interval.
  • def are_pairwise_disjoint: Revise.
  • Throughout the code, (for example in _scan_line_union and _scan_line_intersection and many other functions,) the docstrings need significant improvements! The current ones make little sense. They are often copy-pasted from other places which do not correspond to the function. Input and output descriptions are not exact, variable undefined, variable mistaken (eg. realsets v.s real_set_collection) type not specified, code-style incorrect. Often bad and incomplete examples. EXAMPLES:: plural. At the beginning of the file, "Yueqi Li (2022-06-27): ..." Rephrase to one or two lines. Some places in the docstrings ":meth:~sage.sets.real_set.RealSet.convex_hull"
  • Please make sure that your docstrings follow https://doc.sagemath.org/html/en/developer/coding_basics.html.
  • Throughout the code: clean up the commented-out code.
  • Throughout the code: :class:RealInterval should be :class:InternalRealInterval
  • New method: relative closure in a superset?
comment:14

Merge-in SageMath version to 9.7.beta6.


New commits:

ed5c560Merge branch 'develop' into t/32181/realset__faster_operations_by_scan_line__merging__techniques

Changed commit from 61862a9 to ed5c560

Branch pushed to git repo; I updated commit sha1. New commits:

cd9cc4ctrivial typo fix
f6b334fremove outdated patch for the broken comparison between infinity and RLF (fixed in ticket 11506)
58aa1efadd bug example regarding empty InternalRealInterval. Will not fix for this internal class
7a65f08fix bug regarding empty real sets

Changed commit from ed5c560 to 7a65f08

Changed commit from 7a65f08 to 1bbb04f

Branch pushed to git repo; I updated commit sha1. New commits:

1bbb04facknowledge authors, reference

Branch pushed to git repo; I updated commit sha1. New commits:

dda8f6brewrite implementation of scan-line algo for real set methods, by adapting the code from https://github.com/mkoeppe/cutgeneratingfunctionology

Changed commit from 1bbb04f to dda8f6b

comment:18

Replying to @NicoleYueqiLi:
I rewrote your code according to the remarks in comment:12. Yet to update the docstrings.

Branch pushed to git repo; I updated commit sha1. New commits:

75ec519pass kwd normalize=True to UniqueRepresentation classcall
ecf22e1rename left and right endpoints to lower and upper
7967c81add comments
2bfdfb2improve docstring and add docstests

Changed commit from dda8f6b to 2bfdfb2

Author: Yueqi Li, Yuan Zhou

Branch pushed to git repo; I updated commit sha1. New commits:

9d1e07aRevive furo
f19b597Add link to the logo
09d5f5cFix the logo link for reference
ae75d53Fix a subtle reference problem for build_options
3519bedRun docbuild workflow with single thread
1274718Fix a suspicious part of categories doc
e5b1f7eAdd search.html
bc6a87fMerge branch 'u/klee/34252' of git://trac.sagemath.org/sage into t/32181/realset__faster_operations_by_scan_line__merging__techniques

Changed commit from 2bfdfb2 to bc6a87f

comment:22

Try merging #34252 to fix the docbuild errors.

Branch pushed to git repo; I updated commit sha1. New commits:

1194904minor fix for pycodestyle

Changed commit from bc6a87f to 1194904

Changed commit from 1194904 to 44926c8

Branch pushed to git repo; I updated commit sha1. New commits:

44926c8better pyflakes, better handle of RealSet(x != oo)
comment:26

As you merged #34252 here, add it to Dependencies in the ticket box

comment:27

I think you can unify RealSet.intersection and RealSet.intersection_of_realsets to just one (non-static) method RealSet.intersection.

Changed commit from 44926c8 to 523b49a

Branch pushed to git repo; I updated commit sha1. New commits:

c9d4746update RealSet.union and RealSet.intersection to handle several real sets
ebd3b77Better pygments style
73e5aa3Make white logo transparent to match with furo
523b49aMerge branch 'u/klee/34252' of git://trac.sagemath.org/sage into t/32181/realset__faster_operations_by_scan_line__merging__techniques

Dependencies: 34252

Changed dependencies from 34252 to #34252

comment:31
-    @staticmethod
-    def normalize(intervals):
-        r"""

This function is part of the public API, can't just remove

comment:32
+            # Fastpass: The input is already normalized: Args is a list of

That should be "Fast path"

comment:33
+    def is_connected(self):
+        """
+        Return whether ``self`` is a connected set.
+
+        OUTPUT:
+
+        Boolean. Whether the normalized form of ``self``
+        has a single connected component.
+

In this description, you are mixing talking about self as a set and its representation as a list of intervals

comment:34
             sage: s1.symmetric_difference(s2)
             (0, 1] ∪ [2, +oo)
         """
-        other = RealSet(*other)
-        return self.difference(other).union(other.difference(self))
+        scan = merge(self._scan(), RealSet(*other)._scan())
+        intervals = tuple(RealSet._scan_to_intervals(scan, lambda i: bool(i == 1)))
+        return RealSet(*intervals, normalized=True)
 
     def contains(self, x):

Is the explicit conversion to bool necessary here?

comment:35

Would it make sense to add the benchmarks mentioned in the ticket description as doctests (can be marked # not tested so it does not run automatically)?

The benchmarks are for complicated real sets with many intervals.

Does the new implementation have an overhead over the previous implementation that becomes noticeable for simple real sets with 0, 1, 2, or 5 intervals? That's probably the most common case how this class is used. It would be good to include a benchmark for this.

comment:36

Thanks for reviewing the ticket! I don't think that the benchmark in the current ticket description makes much sense. I was hoping that Yueqi would be able to provide an updated version and more details soon.
Replying to @mkoeppe:

Would it make sense to add the benchmarks mentioned in the ticket description as doctests (can be marked # not tested so it does not run automatically)?

The benchmarks are for complicated real sets with many intervals.

Does the new implementation have an overhead over the previous implementation that becomes noticeable for simple real sets with 0, 1, 2, or 5 intervals? That's probably the most common case how this class is used. It would be good to include a benchmark for this.

comment:37

I don't quite get it. I thought that a real set is connected iff its normalized form has only one connected component (interval or singleton).

Replying to @mkoeppe:

+    def is_connected(self):
+        """
+        Return whether ``self`` is a connected set.
+
+        OUTPUT:
+
+        Boolean. Whether the normalized form of ``self``
+        has a single connected component.
+

In this description, you are mixing talking about self as a set and its representation as a list of intervals

comment:38

Do you mean bool(i==1)? I don't know if it's always needed, but I thought there is nothing wrong to add it.
Replying to @mkoeppe:

             sage: s1.symmetric_difference(s2)
             (0, 1] ∪ [2, +oo)
         """
-        other = RealSet(*other)
-        return self.difference(other).union(other.difference(self))
+        scan = merge(self._scan(), RealSet(*other)._scan())
+        intervals = tuple(RealSet._scan_to_intervals(scan, lambda i: bool(i == 1)))
+        return RealSet(*intervals, normalized=True)
 
     def contains(self, x):

Is the explicit conversion to bool necessary here?

comment:39

Replying to @yuan-zhou:

I don't quite get it. I thought that a real set is connected iff its normalized form has only one connected component (interval or singleton).

"Connected component" is a property of the set (topological space).

"Normalized form" is a list of intervals. There's no such thing as a connected component of a list.

Branch pushed to git repo; I updated commit sha1. New commits:

a864fdcfix typo
3f6a0acrevive RealSet.normalize

Changed commit from 523b49a to 3f6a0ac

comment:41

Replying to @yuan-zhou:

Do you mean bool(i==1)? I don't know if it's always needed, but I thought there is nothing wrong to add it.

i is just an integer, right? Converting to bool is only needed in comparison when you are comparing SR elements, which have overloaded comparison operators for forming relational expressions.

comment:42

If you want, you can add a deprecation for the normalize method

Changed commit from 3f6a0ac to 79f2014

Branch pushed to git repo; I updated commit sha1. New commits:

79f2014revise the docstring of RealSet.is_connected

Branch pushed to git repo; I updated commit sha1. New commits:

1ef7dd1remove unnecessary bool()

Changed commit from 79f2014 to 1ef7dd1

comment:45

Here's a microbenchmark:

Before:

sage: from itertools import count
sage: c = count()
sage: %timeit RealSet((0, next(c)))
82.9 µs ± 3.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
sage: %timeit RealSet((0, next(c))).union([next(c), next(c)])
210 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

With this ticket:

sage: from itertools import count
sage: c = count()
sage: %timeit RealSet((0, next(c)))
86.2 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
sage: %timeit RealSet((0, next(c))).union([next(c), next(c)])
257 µs ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
comment:46

Does the overhead look acceptable on the above examples?

comment:47
     def __init__(self, lower, lower_closed, upper, upper_closed, check=True):
         """
         Initialize ``self``.
[...]
             if (upper_closed and upper == infinity):
                 raise ValueError('interval cannot be closed at +oo')
+            # TODO: take care of the empty set case.

What do you have in mind there?

Description changed:

--- 
+++ 
@@ -2,7 +2,12 @@
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/intervals.py
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/fast_piecewise.py
 
-1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the intersection of two realsets took 10.9 seconds on average, and new methods only took 0.018 seconds on average, union of five realsets, the original method took 0.052 seconds, and new methods only took 0.041 seconds.
+1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time:
+|        |       |         |        |
+|--------|-------|---------|--------|
+|Function|Dataset|Scan-line|original|
+|union|[1000 closed bounded disjoint interval ](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_cc_ff.txt)|[0.005281](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_union.txt)|[0.00749](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_union.txt)|
+|intersection|[300 closed bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_cc_ff.txt)|[0.001182](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_intersection.txt)|[4.376048](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_intersection.txt)|
 2. We used scan-line methods to replace the original `normalize` and also make `normalize` optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
 3. Replace `difference`,`symmetric_difference` and `are_pairwise_disjoint` with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
 4. Implement `convex_hull` and `is_connected` in `RealSet`. 
comment:49

Somewhere in the original code (mul ?), the empty interval is set to (0, 0)
Replying to @mkoeppe:

     def __init__(self, lower, lower_closed, upper, upper_closed, check=True):
         """
         Initialize ``self``.
[...]
             if (upper_closed and upper == infinity):
                 raise ValueError('interval cannot be closed at +oo')
+            # TODO: take care of the empty set case.

What do you have in mind there?

Description changed:

--- 
+++ 
@@ -2,12 +2,18 @@
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/intervals.py
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/fast_piecewise.py
 
-1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time:
+1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time the random generator and timing test code can find in this [page](https://github.com/NicoleYueqiLi/sagemath/blob/main/RealsetTestTool.py):
+
 |        |       |         |        |
 |--------|-------|---------|--------|
 |Function|Dataset|Scan-line|original|
 |union|[1000 closed bounded disjoint interval ](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_cc_ff.txt)|[0.005281](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_union.txt)|[0.00749](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_union.txt)|
+|union|[1000 open bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_oo_ff.txt)|[0.004339](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_open/res_union.txt)|[0.006646](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_open/res_union.txt)|
+|union|[1000 mixed bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_co_ff.txt)|[0.005025](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed_open/res_union.txt)|[0.006646](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed_open/res_union.txt)|
 |intersection|[300 closed bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_cc_ff.txt)|[0.001182](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_intersection.txt)|[4.376048](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_intersection.txt)|
+|intersection|[300 open bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_oo_ff.txt)|[0.001225](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_open/res_intersection.txt)|[4.359314](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_open/res_intersection.txt)|
+|intersection|[300 mixed bounded interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_co_ff.txt)|[0.001537](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_closed_open/res_intersection.txt)|[4.407478](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_closed_open/res_intersection.txt)|
+
 2. We used scan-line methods to replace the original `normalize` and also make `normalize` optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
 3. Replace `difference`,`symmetric_difference` and `are_pairwise_disjoint` with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
 4. Implement `convex_hull` and `is_connected` in `RealSet`. 
comment:51

Is this syntax self.intersection(0,1) new? Is it really needed?

+        elif len(real_set_collection) == 2:
+            a, b = real_set_collection
+            # allow self.intersection(0,1) syntax
+            try:
+                a.n()
+                b.n()
+                sets.append(RealSet(a, b))
+            except (AttributeError, ValueError, TypeError):
+                sets.append(RealSet(a))
+                sets.append(RealSet(b))
+        else:
comment:52

I didn't like it either, since the new code spends a lot of effect to take care of that. I wanted to eliminate that. However, the original code allowed for it (contained doctests of that kind).
Replying to @mkoeppe:

Is this syntax self.intersection(0,1) new? Is it really needed?

+        elif len(real_set_collection) == 2:
+            a, b = real_set_collection
+            # allow self.intersection(0,1) syntax
+            try:
+                a.n()
+                b.n()
+                sets.append(RealSet(a, b))
+            except (AttributeError, ValueError, TypeError):
+                sets.append(RealSet(a))
+                sets.append(RealSet(b))
+        else:
comment:53

OK, you are right, the old code did allow it. So we have to keep it.

comment:54
-    def intersection(self, *other):
+    def intersection(self, *real_set_collection):
         """

I think you can simplify it further by making it

     def intersection(*real_set_collection)

When used as a bound method, RealSet(1, 3).intersection(RealSet(2, 4)), the self would just be the first element of real_set_collection.

But it could also be used as an unbound method RealSet.intersection(RealSet(1, 3), RealSet(2,4)), and you could also handle the case RealSet.intersection(), which should give the real line as the intersection over an empty family

comment:55

(Just a suggestion, it's fine as it)

comment:56

Replying to @yuan-zhou:

Does the overhead look acceptable on the above examples?

Yes, I think it's fine.

My guess is that it could be improved by a custom version of heapq with a fast path for near-trivial cases, or even cythonized.

Let's keep this for a possible follow-up ticket.

Reviewer: Matthias Koeppe

comment:58

That's a good suggestion, but I incline to leave it as is on this ticket. The reason is the treatments for special cases such as RealSet.union(RealSet(0,1)), RealSet.union((0,1)), RealSet.union(0, 1), and RealSet(0,1).union(1, 3)==RealSet.union(RealSet(0,1), 1, 3) could be too complicated at the moment. Will discuss the deprecation of the use of RealSet(0,1).union(1, 3) on another ticket later.
Replying to @mkoeppe:

-    def intersection(self, *other):
+    def intersection(self, *real_set_collection):
         """

I think you can simplify it further by making it

     def intersection(*real_set_collection)

When used as a bound method, RealSet(1, 3).intersection(RealSet(2, 4)), the self would just be the first element of real_set_collection.

But it could also be used as an unbound method RealSet.intersection(RealSet(1, 3), RealSet(2,4)), and you could also handle the case RealSet.intersection(), which should give the real line as the intersection over an empty family

comment:59

Thank you! Replying to @mkoeppe:

comment:60

Should def normalize(intervals) be @staticmethod

comment:61

Never mind. Without @staticmethod, the following can apply.

sage: r = RealSet(RealSet(2, 6)[0], RealSet(4, 10)[0], normalized=True)
sage: r
(2, 6) ∪ (4, 10)
sage: r.normalize()
((2, 10),)

Replying to @yuan-zhou:

Should def normalize(intervals) be @staticmethod