BR1py/itertree

Missing dependency to numpy

Closed this issue · 10 comments

When itertree is installed via pip, it doesn't install numpy.
But when itertree is used, numpy is required.

C:\Program Files\Python3.10\lib\site-packages\itertree\itree_helpers.py:11: in <module>
    import numpy as np
E   ModuleNotFoundError: No module named 'numpy'
BR1py commented

Thank you for the bug report.

You are right. I will remove the dependency.

The idea is that itertree is independent from other packages.

But in case numpy is available we have some additional functionalities related to numpy objects. that are only activated in case the import works.

Btw. I can confirm the very poor performance of anytree, because I also wrote my own tree after looking into that code :). I can currently also see my implementation is circa 2x faster then yours, but this might be related to the number of supported features. I was thinking about improvements to anytree as it's widely used, but it would require a complete rewrite of it and modifying the API.

See c0fec0de/anytree#169 (comment)

BR1py commented

Patrick,
I saw your remarks related to anytree and I agree related to bad performance and even related to the analysis you made.

I had a short look on your Tree implementation but I had some troubles to understand the methods without any description. Also I didn't understand how you can access a node that have the same ID or (should your tree not contain the same ID multiple times)? Is there a way to access the items by index?

Your code looks quite interesting and I see still room for improvements (e.g. use slots= ... ) but anyway your code is already faster than mine, As long I don't understand the feature-set I cannot judge (but i can still sleep well even that my tree is 2x slower).

BR1py commented

The dependency issue will be solved with next commit then I will close the issue.

Hi,
It's not fully implemented, that's why documentation is not complete (in code and in Sphinx online).

As a rough summary:

  • IDs are unique as I consider it an (unique) identifier. Except for given no identifier, here all nodes with ID None are collected in the same bucket as a list. Maybe I'll separate the current special case in the IDs dictionary for key None so a separate field in Node.
  • The global list of IDs is always keept with a representative of the whole tree, here it's the root.
  • If trees get merged, the ID dictionaries get merged.
  • If you split trees, they get divided (in efficient, but a not so common operation).
  • Each Node has a pointer to the root.
  • Cycle-fee check can be done based on the representative (_root).
  • Most methods use generators. (some not not correctly typed, I need to improve them like using values() of a dict is not a generator.
  • You can attache a direct value to it or you can use dictionary syntax to attach any key-value pair to a node.
  • You can construct top-down (Node(parent=parentNode)) or bottom up (parentNode.AddChild(node)).
  • Order is preserved, because internal dictionaries and lists are order preserving.
BR1py commented

Ok may be not the right place here, but you must protect your Node from setting nodes with same IDs.

Here we have a main difference in between your Node class which expects unique IDs and iTree where a tag (counter part of ID in your case) must not be unique. Same tags are collected in the tag-family and they have a specific order.

I use itertree in a test-automation project and there we define test sequence in a tree structure (commands and subcommands) and it is required that we can put same command (tag) multiple times in the tree.

To realize the functionality the iTree items (children) must be stored in two dictionairies and lists (main list/ and mapping dict; familly dict with list) and they must be managed in each operation.

Therefore iTree can never be as quick as your code.

Thanks, for the hint, I'll check if it throws an exception in case IDs exists already.

I like the grouping idea you have and I was think to have an optional tag or maybe label field with similar semantics.

I'll also check for slots. I never used/needed it in the past years of Python programming. But it's on my list to understand the pros and use-cases. (Some how strange, that I have so many Python projects with so many code lines and I never digged into that aspect of the language.

For me it's not so important be be faster then you. Because we have different feature sets and I think mine will get maybe more features and slow down a bit. The important part is that both our codes perform well and have optimal scaling.

We could also talk about collaborating and bringing both things closer together if this might be an option for you. I'm owning pyTooling as a namespace with several sub namespaces. I would like to concentrate on electronic design automation (EDA) more, but not on side tools. We made e.g. a big helper for GitHub Actions by providing job templates for Python projects. See pyTooling/Actions. We have now unified pipelines (workflows) in every project with Python code. Some are even checking Python example code from README in the pipeline :).

image
https://github.com/pyTooling/pyTooling.CLIAbstraction/actions/runs/1965550288

Btw. we also might need a basic framework for testsuites / testcases in the future. Therefore, we'll create an abstract model and create then a real implementation for the EDA tooling background. Maybe we can share such an abstract model.


We can move the discussion to e.g. Gitter.im, which is a chat connected to GitHub.com. Here you'll find multiple chat groups for various projects or 1-to-1 private rooms. I check, but I can't see your name in Gitter yet.
https://gitter.im/Paebbels

Any plans for the next release to PyPI when this change will be included?

I would love to see it released to PyPI, so I can remove lots of workaround code to get numpy installed in various platforms for speed testing :).

BR1py commented

I cannot estimate at the moment but I guess it will be in the next week.

You might use the wheel in the development branch for your testing, which considers the change:
https://github.com/BR1py/itertree/blob/br_development/dist/itertree-0.8.0rc1-py3-none-any.whl

Any news on this?

Btw. my package pyTooling has now a meta-class ExtendedType to make a class using slots.