RealSet: Faster operations by scan-line (merging) techniques

Question

RealSet: Faster operations by scan-line (merging) techniques

mkoeppe opened this issue 4 years ago · 73 comments

Adapt from:

Extend Union and Intersection from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time the random generator and timing test code can find in this page:


Function	Dataset	Scan-line	original
union	1000 closed bounded disjoint interval	0.005281	0.00749
union	1000 open bounded disjoint interval	0.004339	0.006646
union	1000 mixed bounded disjoint interval	0.005025	0.006646
intersection	300 closed bounded disjoint interval	0.001182	4.376048
intersection	300 open bounded disjoint interval	0.001225	4.359314
intersection	300 mixed bounded interval	0.001537	4.407478

We used scan-line methods to replace the original normalize and also make normalize optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
Replace difference,symmetric_difference and are_pairwise_disjoint with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
Implement convex_hull and is_connected in RealSet.

Depends on #34252

CC: @yuan-zhou

Component: geometry

Author: Yueqi Li, Yuan Zhou

Branch/Commit: 1ef7dd1

Reviewer: Matthias Koeppe

Issue created by migration from https://trac.sagemath.org/ticket/32181

Answer 1 · 2022-05-19T03:51:12.000Z

Branch: u/gh-NicoleYueqiLi/realset__faster_operations_by_scan_line__merging__techniques

Answer 2 · 2022-07-18T21:42:13.000Z

Commit: 625ac58

Answer 3 · 2022-07-18T21:42:13.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

Answer 4 · 2022-07-26T19:47:25.000Z

Description changed:

--- 
+++ 
@@ -2,3 +2,7 @@
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/intervals.py
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/fast_piecewise.py
 
+1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the intersection of two realsets took 10.9 seconds on average, and new methods only took 0.018 seconds on average, union of five realsets, the original method took 0.052 seconds, and new methods only took 0.041 seconds.
+2. We used scan-line methods to replace the original `normalize` and also make `normalize` optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
+3. Replace `difference`,`symmetric_difference` and `are_pairwise_disjoint` with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
+4. Implement `convex_hull` and `is_connected` in `RealSet`.

Answer 5 · 2022-07-26T19:56:29.000Z

Changed commit from 625ac58 to 33385c8

Answer 6 · 2022-07-26T19:56:29.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`3161339`	`1. fixed pycode test fail`
`33385c8`	Adding examples for helper funtion `_scan_line_union`, `_scan_line_intersection`, `_scan_line_difference`, `_scan_line_symmetric_difference`

Answer 7 · 2022-07-27T07:46:40.000Z

comment:8

If you merge the most current beta (9.7.beta6), you can use https://trac.sagemath.org/wiki/ReleaseTours/sage-9.7#Validatingdocstringwith.sage-tox-erst

Answer 8 · 2022-07-28T19:54:35.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`61862a9`	`Change scan-line methods to tree structure, improve inner function's documentation`

Answer 9 · 2022-07-28T19:54:35.000Z

Changed commit from 33385c8 to 61862a9

Answer 10 · 2022-07-28T20:17:56.000Z

comment:10

tox -e relint will tell you that EXAMPLE:: should be EXAMPLES:: according to our style guide (whether there is a single or there are multiple examples)

Answer 11 · 2022-07-29T18:39:24.000Z

comment:11

The current code does not pass the doc-tests.
Also, please follow https://doc.sagemath.org/html/en/developer/coding_basics.html. For example, in the text part, convex_hull should be fully-referenced so that it is clickable after build, i.e. :meth:~sage.sets.real_set.RealSet.convex_hull

Answer 12 · 2022-07-30T03:50:05.000Z

comment:12

I took a look through the code, but didn't run the code.
I have some first comments.

The patchbot currently shows 31 doctests failures and 4 lint errors. To be fixed!
In the docstrings of _scan_left_endpoint and _scan_right_endpoint: What does an event mean. Reformat the text after output. Describe the value of delta. Include example(s) where tag is given.
RealSet constructor with option normalized=True. What does this assume? Describe it in the docstring of __classcall__ or that of RealSet. Mention that is does not apply to the manifold case. Include doctests/tests where normalized=True is given, and examples where arguments are of type InternalRealInterval.
I have doubts about where if not normalized is treated. (But of course, this depends on how you define normalized=True). I thought if normalized=True and it's not of the manifold case (lines 1267-1283), one can just return UniqueRepresentation.__classcall__(cls, *intervals) before line 1199 intervals = [], which skips all treatments on args.
When normalized is not True, may simplify and insert the code in def normalize here, then remove the normalize function as it is only used once here. Alternatively, keep def normalize and move all treatment of args from __classcall__ to normalize.
RealSet.__classcall__: Line 1188-1190 change to just: normalized = kwds.pop('normalized', False). L1194: "... take no keyword arguments other than 'normalized'.""
RealSet._prep: L1710 elif. L1717 elif
RealSet doesn't need methods _scan_left_endpoint and _scan_right_endpoint. These are methods of the class InternalRealInterval.
RealSet._scan_interval Rename the function, to something like _scan, or _scan_intervals, or _generate_events. Rewrite the function: for i in self._intervals: yield i._scan_left_endpoint(tag); yield i._scan_right_endpoint(tag). No more merge/sorting needed.
Ideally, all scan methods are generators. Currently, some are generators, some return lists.
In _scan_line_union, lines 2104-2122, the two while loops look very suspicious to me. I think they do not belong here. Moreover, the lines 2118-2119 in the second while loop might be wrong. You claimed that these two while loops take care of the special cases RealSet(x=-infinity) and RealSet(x<=-infinity), but the input scan to this function should never be in such form. Such empty (nonsense) special cases should already been eliminated in around line 1209 in __classcall__
In union_of_realsets and other similar functions: Lines 2175 to 2178 can be simplified to scan = merge(RealSet(s)._scan() for s in real_set_collection)
_scan_line_intersection. Add more special examples to doctest to cover the cases where intersection is just one point, or just empty. For example: [1,2] ∩ [2,3], [2,3] ∩ (3,4), (3,4) ∩ [4,5), [4,5) ∩ (5,6). Lines 2252-2261 (and other similar places) can be simplified to

if now_on:
  (on_x, on_epsilon) = (x, epsilon) 
else: 
  if was_on and (on_x, on_epsilon) < (x, epsilon): 
    yield InternalRealInterval(...)
  (on_x, on_epsilon) = (None, None)

Questions about 'tag'. It is said to be a real number in the docstrings of many function, but is used as True or False for the difference function only. I think tag can be eliminated throughout the code (and do some special treatment for the difference.) Also, why is it called index at some places? That's not used either, so can be removed as well. On a related note, given a collection of real sets, after generating and merging the their scan, you can no longer match the list of events back to the original real sets, because index was not passed to tag in the events. I do not like the current design.
_scan_line_union, _scan_line_intersection, _scan_line_symmetric_difference (and part of are_pairwise_disjoint) all look redundant. They all can be replaced by a new method RealSet._scan_to_intervals(scan, indicator) that takes scan as a generator of sorted events and indicator as a number 1 or n (or perhaps as a lambda function for are_pairwise_disjoint), and outputs a tuple intervals of type InternalRealInterval.
According to your ticket description, the old union method is slower than the new one. Is this true for the union of two real sets? If so, why is the old one kept in the code? If not, please update the description of the ticket, and include comments in the code. In general, are there examples where the original method is faster? Please specify the examples, and change the code to choose strategically between the two versions according to the input.
def convex_hull: correct but wasteful. One does not to sort all intervals, only the left most and right most intervals of each real set matter.
def is_connected docstring: Return whether self is a connected set. OUTPUT: Boolean. True if self is a single interval.
def are_pairwise_disjoint: Revise.
Throughout the code, (for example in _scan_line_union and _scan_line_intersection and many other functions,) the docstrings need significant improvements! The current ones make little sense. They are often copy-pasted from other places which do not correspond to the function. Input and output descriptions are not exact, variable undefined, variable mistaken (eg. realsets v.s real_set_collection) type not specified, code-style incorrect. Often bad and incomplete examples. EXAMPLES:: plural. At the beginning of the file, "Yueqi Li (2022-06-27): ..." Rephrase to one or two lines. Some places in the docstrings ":meth:~sage.sets.real_set.RealSet.convex_hull"
Please make sure that your docstrings follow https://doc.sagemath.org/html/en/developer/coding_basics.html.
Throughout the code: clean up the commented-out code.
Throughout the code: :class:RealInterval should be :class:InternalRealInterval
New method: relative closure in a superset?

Answer 13 · 2022-07-30T04:40:33.000Z

Changed branch from u/gh-NicoleYueqiLi/realset__faster_operations_by_scan_line__merging__techniques to u/yzh/realset__faster_operations_by_scan_line__merging__techniques

Answer 14 · 2022-07-30T04:42:00.000Z

comment:14

Merge-in SageMath version to 9.7.beta6.

New commits:

`ed5c560`	`Merge branch 'develop' into t/32181/realset__faster_operations_by_scan_line__merging__techniques`

Answer 15 · 2022-07-30T04:42:00.000Z

Changed commit from 61862a9 to ed5c560

Answer 16 · 2022-07-31T06:52:48.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`cd9cc4c`	`trivial typo fix`
`f6b334f`	`remove outdated patch for the broken comparison between infinity and RLF (fixed in ticket 11506)`
`58aa1ef`	`add bug example regarding empty InternalRealInterval. Will not fix for this internal class`
`7a65f08`	`fix bug regarding empty real sets`

Answer 17 · 2022-07-31T06:52:48.000Z

Changed commit from ed5c560 to 7a65f08

Answer 18 · 2022-07-31T07:07:36.000Z

Changed commit from 7a65f08 to 1bbb04f

Answer 19 · 2022-07-31T07:07:36.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`1bbb04f`	`acknowledge authors, reference`

Answer 20 · 2022-07-31T13:26:46.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`dda8f6b`	`rewrite implementation of scan-line algo for real set methods, by adapting the code from https://github.com/mkoeppe/cutgeneratingfunctionology`

Answer 21 · 2022-07-31T13:26:46.000Z

Changed commit from 1bbb04f to dda8f6b

Answer 22 · 2022-07-31T13:46:10.000Z

comment:18

Replying to @NicoleYueqiLi:
I rewrote your code according to the remarks in comment:12. Yet to update the docstrings.

Answer 23 · 2022-08-01T09:58:06.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`75ec519`	`pass kwd normalize=True to UniqueRepresentation classcall`
`ecf22e1`	`rename left and right endpoints to lower and upper`
`7967c81`	`add comments`
`2bfdfb2`	`improve docstring and add docstests`

Answer 24 · 2022-08-01T09:58:06.000Z

Changed commit from dda8f6b to 2bfdfb2

Answer 25 · 2022-08-01T10:01:22.000Z

Author: Yueqi Li, Yuan Zhou

Answer 26 · 2022-08-01T10:17:11.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`9d1e07a`	`Revive furo`
`f19b597`	`Add link to the logo`
`09d5f5c`	`Fix the logo link for reference`
`ae75d53`	`Fix a subtle reference problem for build_options`
`3519bed`	`Run docbuild workflow with single thread`
`1274718`	`Fix a suspicious part of categories doc`
`e5b1f7e`	`Add search.html`
`bc6a87f`	`Merge branch 'u/klee/34252' of git://trac.sagemath.org/sage into t/32181/realset__faster_operations_by_scan_line__merging__techniques`

Answer 27 · 2022-08-01T10:17:11.000Z

Changed commit from 2bfdfb2 to bc6a87f

Answer 28 · 2022-08-01T10:20:37.000Z

comment:22

Try merging #34252 to fix the docbuild errors.

Answer 29 · 2022-08-01T10:33:53.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`1194904`	`minor fix for pycodestyle`

Answer 30 · 2022-08-01T10:33:53.000Z

Changed commit from bc6a87f to 1194904

Answer 31 · 2022-08-01T13:47:50.000Z

Changed commit from 1194904 to 44926c8

Answer 32 · 2022-08-01T13:47:50.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`44926c8`	`better pyflakes, better handle of RealSet(x != oo)`

Answer 33 · 2022-08-01T17:58:07.000Z

comment:26

As you merged #34252 here, add it to Dependencies in the ticket box

Answer 34 · 2022-08-01T18:06:34.000Z

comment:27

I think you can unify RealSet.intersection and RealSet.intersection_of_realsets to just one (non-static) method RealSet.intersection.

Answer 35 · 2022-08-01T21:31:37.000Z

Changed commit from 44926c8 to 523b49a

Answer 36 · 2022-08-01T21:31:37.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`c9d4746`	`update RealSet.union and RealSet.intersection to handle several real sets`
`ebd3b77`	`Better pygments style`
`73e5aa3`	`Make white logo transparent to match with furo`
`523b49a`	`Merge branch 'u/klee/34252' of git://trac.sagemath.org/sage into t/32181/realset__faster_operations_by_scan_line__merging__techniques`

Answer 37 · 2022-08-01T21:32:22.000Z

Dependencies: 34252

Answer 38 · 2022-08-01T21:33:22.000Z

Changed dependencies from 34252 to #34252

Answer 39 · 2022-08-04T17:20:24.000Z

comment:31

-    @staticmethod
-    def normalize(intervals):
-        r"""

This function is part of the public API, can't just remove

Answer 40 · 2022-08-04T17:21:19.000Z

comment:32

+            # Fastpass: The input is already normalized: Args is a list of

That should be "Fast path"

Answer 41 · 2022-08-04T17:25:35.000Z

comment:33

+    def is_connected(self):
+        """
+        Return whether ``self`` is a connected set.
+
+        OUTPUT:
+
+        Boolean. Whether the normalized form of ``self``
+        has a single connected component.
+

In this description, you are mixing talking about self as a set and its representation as a list of intervals

Answer 42 · 2022-08-04T17:27:05.000Z

comment:34

             sage: s1.symmetric_difference(s2)
             (0, 1] ∪ [2, +oo)
         """
-        other = RealSet(*other)
-        return self.difference(other).union(other.difference(self))
+        scan = merge(self._scan(), RealSet(*other)._scan())
+        intervals = tuple(RealSet._scan_to_intervals(scan, lambda i: bool(i == 1)))
+        return RealSet(*intervals, normalized=True)
 
     def contains(self, x):

Is the explicit conversion to bool necessary here?

Answer 43 · 2022-08-04T17:34:33.000Z

comment:35

Would it make sense to add the benchmarks mentioned in the ticket description as doctests (can be marked # not tested so it does not run automatically)?

The benchmarks are for complicated real sets with many intervals.

Does the new implementation have an overhead over the previous implementation that becomes noticeable for simple real sets with 0, 1, 2, or 5 intervals? That's probably the most common case how this class is used. It would be good to include a benchmark for this.

Answer 44 · 2022-08-04T17:44:05.000Z

comment:36

Thanks for reviewing the ticket! I don't think that the benchmark in the current ticket description makes much sense. I was hoping that Yueqi would be able to provide an updated version and more details soon.
Replying to @mkoeppe:

Would it make sense to add the benchmarks mentioned in the ticket description as doctests (can be marked # not tested so it does not run automatically)?

The benchmarks are for complicated real sets with many intervals.

Does the new implementation have an overhead over the previous implementation that becomes noticeable for simple real sets with 0, 1, 2, or 5 intervals? That's probably the most common case how this class is used. It would be good to include a benchmark for this.

Answer 45 · 2022-08-04T17:46:09.000Z

comment:37

I don't quite get it. I thought that a real set is connected iff its normalized form has only one connected component (interval or singleton).

Replying to @mkoeppe:

+    def is_connected(self):
+        """
+        Return whether ``self`` is a connected set.
+
+        OUTPUT:
+
+        Boolean. Whether the normalized form of ``self``
+        has a single connected component.
+
In this description, you are mixing talking about self as a set and its representation as a list of intervals

Answer 46 · 2022-08-04T17:47:23.000Z

comment:38

Do you mean bool(i==1)? I don't know if it's always needed, but I thought there is nothing wrong to add it.
Replying to @mkoeppe:

             sage: s1.symmetric_difference(s2)
             (0, 1] ∪ [2, +oo)
         """
-        other = RealSet(*other)
-        return self.difference(other).union(other.difference(self))
+        scan = merge(self._scan(), RealSet(*other)._scan())
+        intervals = tuple(RealSet._scan_to_intervals(scan, lambda i: bool(i == 1)))
+        return RealSet(*intervals, normalized=True)
 
     def contains(self, x):

Is the explicit conversion to bool necessary here?

Answer 47 · 2022-08-04T17:58:31.000Z

comment:39

Replying to @yuan-zhou:

I don't quite get it. I thought that a real set is connected iff its normalized form has only one connected component (interval or singleton).

"Connected component" is a property of the set (topological space).

"Normalized form" is a list of intervals. There's no such thing as a connected component of a list.

Answer 48 · 2022-08-04T17:59:56.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`a864fdc`	`fix typo`
`3f6a0ac`	`revive RealSet.normalize`

Answer 49 · 2022-08-04T17:59:56.000Z

Changed commit from 523b49a to 3f6a0ac

Answer 50 · 2022-08-04T18:00:49.000Z

comment:41

Replying to @yuan-zhou:

Do you mean bool(i==1)? I don't know if it's always needed, but I thought there is nothing wrong to add it.

i is just an integer, right? Converting to bool is only needed in comparison when you are comparing SR elements, which have overloaded comparison operators for forming relational expressions.

Answer 51 · 2022-08-04T18:01:42.000Z

comment:42

If you want, you can add a deprecation for the normalize method

Answer 52 · 2022-08-04T18:03:38.000Z

Changed commit from 3f6a0ac to 79f2014

Answer 53 · 2022-08-04T18:03:38.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`79f2014`	`revise the docstring of RealSet.is_connected`

Answer 54 · 2022-08-04T18:08:57.000Z

Branch pushed to git repo; I updated commit sha1. New commits:

`1ef7dd1`	`remove unnecessary bool()`

Answer 55 · 2022-08-04T18:08:57.000Z

Changed commit from 79f2014 to 1ef7dd1

Answer 56 · 2022-08-04T19:04:41.000Z

comment:45

Here's a microbenchmark:

Before:

sage: from itertools import count
sage: c = count()
sage: %timeit RealSet((0, next(c)))
82.9 µs ± 3.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
sage: %timeit RealSet((0, next(c))).union([next(c), next(c)])
210 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

With this ticket:

sage: from itertools import count
sage: c = count()
sage: %timeit RealSet((0, next(c)))
86.2 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
sage: %timeit RealSet((0, next(c))).union([next(c), next(c)])
257 µs ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Answer 57 · 2022-08-04T19:39:31.000Z

comment:46

Does the overhead look acceptable on the above examples?

Answer 58 · 2022-08-04T19:40:01.000Z

comment:47

     def __init__(self, lower, lower_closed, upper, upper_closed, check=True):
         """
         Initialize ``self``.
[...]
             if (upper_closed and upper == infinity):
                 raise ValueError('interval cannot be closed at +oo')
+            # TODO: take care of the empty set case.

What do you have in mind there?

Answer 59 · 2022-08-04T19:40:59.000Z

Description changed:

--- 
+++ 
@@ -2,7 +2,12 @@
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/intervals.py
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/fast_piecewise.py
 
-1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the intersection of two realsets took 10.9 seconds on average, and new methods only took 0.018 seconds on average, union of five realsets, the original method took 0.052 seconds, and new methods only took 0.041 seconds.
+1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time:
+|        |       |         |        |
+|--------|-------|---------|--------|
+|Function|Dataset|Scan-line|original|
+|union|[1000 closed bounded disjoint interval ](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_cc_ff.txt)|[0.005281](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_union.txt)|[0.00749](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_union.txt)|
+|intersection|[300 closed bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_cc_ff.txt)|[0.001182](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_intersection.txt)|[4.376048](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_intersection.txt)|
 2. We used scan-line methods to replace the original `normalize` and also make `normalize` optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
 3. Replace `difference`,`symmetric_difference` and `are_pairwise_disjoint` with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
 4. Implement `convex_hull` and `is_connected` in `RealSet`.

Answer 60 · 2022-08-04T19:44:31.000Z

comment:49

Somewhere in the original code (mul ?), the empty interval is set to (0, 0)
Replying to @mkoeppe:

     def __init__(self, lower, lower_closed, upper, upper_closed, check=True):
         """
         Initialize ``self``.
[...]
             if (upper_closed and upper == infinity):
                 raise ValueError('interval cannot be closed at +oo')
+            # TODO: take care of the empty set case.

What do you have in mind there?

Answer 61 · 2022-08-04T19:56:24.000Z

Description changed:

--- 
+++ 
@@ -2,12 +2,18 @@
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/intervals.py
 - https://github.com/mkoeppe/cutgeneratingfunctionology/blob/master/cutgeneratingfunctionology/igp/fast_piecewise.py
 
-1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time:
+1. Extend `Union` and `Intersection` from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time the random generator and timing test code can find in this [page](https://github.com/NicoleYueqiLi/sagemath/blob/main/RealsetTestTool.py):
+
 |        |       |         |        |
 |--------|-------|---------|--------|
 |Function|Dataset|Scan-line|original|
 |union|[1000 closed bounded disjoint interval ](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_cc_ff.txt)|[0.005281](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_union.txt)|[0.00749](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_union.txt)|
+|union|[1000 open bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_oo_ff.txt)|[0.004339](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_open/res_union.txt)|[0.006646](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_open/res_union.txt)|
+|union|[1000 mixed bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_1000_1000_co_ff.txt)|[0.005025](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed_open/res_union.txt)|[0.006646](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed_open/res_union.txt)|
 |intersection|[300 closed bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_cc_ff.txt)|[0.001182](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_cloesed/res_intersection.txt)|[4.376048](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_cloesed/res_intersection.txt)|
+|intersection|[300 open bounded disjoint interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_oo_ff.txt)|[0.001225](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_open/res_intersection.txt)|[4.359314](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_open/res_intersection.txt)|
+|intersection|[300 mixed bounded interval](https://github.com/NicoleYueqiLi/sagemath/blob/main/testing_file/test_100_300_co_ff.txt)|[0.001537](https://github.com/NicoleYueqiLi/sagemath/blob/main/Scanline_res/disjoint_unsorted_closed_open/res_intersection.txt)|[4.407478](https://github.com/NicoleYueqiLi/sagemath/blob/main/Original_res/disjoint_unsorted_closed_open/res_intersection.txt)|
+
 2. We used scan-line methods to replace the original `normalize` and also make `normalize` optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
 3. Replace `difference`,`symmetric_difference` and `are_pairwise_disjoint` with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
 4. Implement `convex_hull` and `is_connected` in `RealSet`.

Answer 62 · 2022-08-04T21:23:49.000Z

comment:51

Is this syntax self.intersection(0,1) new? Is it really needed?

+        elif len(real_set_collection) == 2:
+            a, b = real_set_collection
+            # allow self.intersection(0,1) syntax
+            try:
+                a.n()
+                b.n()
+                sets.append(RealSet(a, b))
+            except (AttributeError, ValueError, TypeError):
+                sets.append(RealSet(a))
+                sets.append(RealSet(b))
+        else:

Answer 63 · 2022-08-04T21:26:19.000Z

comment:52

I didn't like it either, since the new code spends a lot of effect to take care of that. I wanted to eliminate that. However, the original code allowed for it (contained doctests of that kind).
Replying to @mkoeppe:

Is this syntax self.intersection(0,1) new? Is it really needed?

+        elif len(real_set_collection) == 2:
+            a, b = real_set_collection
+            # allow self.intersection(0,1) syntax
+            try:
+                a.n()
+                b.n()
+                sets.append(RealSet(a, b))
+            except (AttributeError, ValueError, TypeError):
+                sets.append(RealSet(a))
+                sets.append(RealSet(b))
+        else:

Answer 64 · 2022-08-04T21:30:22.000Z

comment:53

OK, you are right, the old code did allow it. So we have to keep it.

Answer 65 · 2022-08-04T21:34:57.000Z

comment:54

-    def intersection(self, *other):
+    def intersection(self, *real_set_collection):
         """

I think you can simplify it further by making it

     def intersection(*real_set_collection)

When used as a bound method, RealSet(1, 3).intersection(RealSet(2, 4)), the self would just be the first element of real_set_collection.

But it could also be used as an unbound method RealSet.intersection(RealSet(1, 3), RealSet(2,4)), and you could also handle the case RealSet.intersection(), which should give the real line as the intersection over an empty family

Answer 66 · 2022-08-04T21:44:08.000Z

comment:55

(Just a suggestion, it's fine as it)

Answer 67 · 2022-08-04T21:51:41.000Z

comment:56

Replying to @yuan-zhou:

Does the overhead look acceptable on the above examples?

Yes, I think it's fine.

My guess is that it could be improved by a custom version of heapq with a fast path for near-trivial cases, or even cythonized.

Let's keep this for a possible follow-up ticket.

Answer 68 · 2022-08-04T21:53:22.000Z

Reviewer: Matthias Koeppe

Answer 69 · 2022-08-04T21:58:05.000Z

comment:58

That's a good suggestion, but I incline to leave it as is on this ticket. The reason is the treatments for special cases such as RealSet.union(RealSet(0,1)), RealSet.union((0,1)), RealSet.union(0, 1), and RealSet(0,1).union(1, 3)==RealSet.union(RealSet(0,1), 1, 3) could be too complicated at the moment. Will discuss the deprecation of the use of RealSet(0,1).union(1, 3) on another ticket later.
Replying to @mkoeppe:

-    def intersection(self, *other):
+    def intersection(self, *real_set_collection):
         """
I think you can simplify it further by making it
     def intersection(*real_set_collection)
When used as a bound method, RealSet(1, 3).intersection(RealSet(2, 4)), the self would just be the first element of real_set_collection.

But it could also be used as an unbound method RealSet.intersection(RealSet(1, 3), RealSet(2,4)), and you could also handle the case RealSet.intersection(), which should give the real line as the intersection over an empty family

Answer 70 · 2022-08-04T21:59:43.000Z

comment:59

Thank you! Replying to @mkoeppe:

Answer 71 · 2022-08-05T16:27:22.000Z

comment:60

Should def normalize(intervals) be @staticmethod

Answer 72 · 2022-08-05T16:51:51.000Z

comment:61

Never mind. Without @staticmethod, the following can apply.

sage: r = RealSet(RealSet(2, 6)[0], RealSet(4, 10)[0], normalized=True)
sage: r
(2, 6) ∪ (4, 10)
sage: r.normalize()
((2, 10),)

Replying to @yuan-zhou:

Should def normalize(intervals) be @staticmethod

Answer 73 · 2022-08-06T07:44:13.000Z

Changed branch from u/yzh/realset__faster_operations_by_scan_line__merging__techniques to 1ef7dd1