CyNetDiff Submission

Question

CyNetDiff Submission

eliotwrobson opened this issue 2 years ago · 54 comments

Answer 1 · 2024-04-26T18:49:30.000Z

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.

Initial onboarding survey was filled out
We appreciate each maintainer of the package filling out this survey individually. 🙌
Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌

Editor comments

I think the documentation is a little bare currently to get started with the review. I'd like to see the information you wrote in this submission issue in the project repository itself. For example:

Network diffusion processes aim to model the spread of information through social networks, represented using graphs. Experimental work involving these models usually involves simulating these processes many times over large graphs, which can be computationally very expensive. At the same time, being able to conduct experiments using a high-level language like Python is helpful to researchers, as this gives greater flexibility in developing research software. To address both of these concerns, CyNetDiff is a Cython module implementing the independent cascade and linear threshold models, two of the most popular network diffusion models. Development has been focused on performance, while still giving an intuitive, high-level interface to assist in research tasks.

This is a nice elevator pitch to rework and include in the README.

I'd like to see more content in the documentation as well. The API section is great, but the basic example lacks context and does not really show case what CyNetDiff is great for. I want to share with you this resource: Diátaxis framework.
I am not asking that you follow the full framework to the letter, and it would be a huge task to do so, but I think there is some virtue in thinking about the kind of documentation you aim to showcase.
The documentation is the first thing a potential user will take a look at. As such, it needs to quickly guide the reader to:

what does it do? (= should I try it?)
how does it work? (= ok, I think this is the tool I need, how do I proceed?)

Answer 2 · 2024-04-26T21:31:36.000Z

Thank you for the feedback! I think those changes are reasonable and will significantly enhance the package, I'll try and expand the documentation by Monday so we can get the ball rolling on the review before too long 🚀

Answer 3 · 2024-04-29T06:58:09.000Z

@Batalex I have just published a release with the following changes:

Updated the elevator pitch and example in the README
Added two longer extended examples to the documentation site

Answer 4 · 2024-05-02T16:49:07.000Z

Thanks, this looks great. From what I can tell, we are good to go. I'll get started on finding the perfect editor for this review 🐈‍⬛

Answer 5 · 2024-05-10T23:39:00.000Z

@Batalex I know we don't have an editor yet, but is it too early to begin suggesting reviewers? We've already found someone that might be a good candidate for this.

Answer 6 · 2024-05-11T12:09:34.000Z

@Batalex I know we don't have an editor yet, but is it too early to begin suggesting reviewers? We've already found someone that might be a good candidate for this.

That's fantastic!
Could you please orient them to our signups form? https://forms.gle/GHfxvmS47nQFDcBM6

Answer 7 · 2024-05-11T15:55:00.000Z

For sure! I'll be sure to email this person as well, but this is @jeremyzhangsq

Answer 8 · 2024-05-14T16:46:20.000Z

Shiqi has agreed to review and filled out the form!

Answer 9 · 2024-06-28T18:23:07.000Z

@Batalex just wanted to check back in and see if there was anything we could do to help find a suitable editor for this. We're not in a huge rush or anything, but we want to submit this package to a conference before too long.

Answer 10 · 2024-06-29T17:17:07.000Z

Hello @eliotwrobson,
Unfortunately, I do not have an editor at the moment. There are two reasons for that: first, we are so successful our current editors have their hands full currently (let me brag a little). Second, the domain is more niche than others, so we (or more specifically I) cannot rely on personal connections or previous involvements to find someone with a relevant expertise. What would be your target conference? I'll see what I can do to help accelerate the process.

Answer 11 · 2024-06-29T20:46:32.000Z

@Batalex Thank you for your detailed response. We don't quite have a target conference in mind, but we're starting more work on a longer version of what will be our paper for this. I mainly asked because we may be able to help search for an editor as well, depending on what the preferred qualifications are. For example, Shiqi has published papers related to this, and might be able to serve, depending on how heavy the workload is, and how difficult it would be to find two other reviewers.

Regarding this general area, anyone who has worked on projects related to influence maximization or even just general graph theory might know enough to get the ball rolling.

Answer 12 · 2024-06-30T01:16:45.000Z

I can edit this, although I just was the editor for the automata package and maybe the authors want a break from me ;)

Answer 13 · 2024-06-30T01:29:07.000Z

@sneakers-the-rat we would greatly appreciate you serving as editor 🙂

Answer 14 · 2024-07-02T22:44:29.000Z

Happy to do it, i'll be able to get started searching for reviewers tmrw :). (i'll let the EiC do the formal assignment)

Answer 15 · 2024-07-03T20:27:41.000Z

i'll let the EiC do the formal assignment

Thanks a lot @sneakers-the-rat and thanks to @Batalex who did the hard work as EIC here!
Happy review to all!

Answer 16 · 2024-07-04T17:44:22.000Z

Awesome! @sneakers-the-rat just to fill you in, we've already identified a reviewer (@/jeremyzhangsq) who has papers published in this area. So the other reviewer doesn't necessarily need to be a subject area expert, and I think it would be appropriate to have someone with more knowledge of Cython packaging / best practices, since that was the part I was most unfamiliar with when writing this package.

Answer 17 · 2024-10-01T17:35:41.000Z

hi there team! i'm just checking in on this review to see the status. it looks like there's been no activity for almost 3 months. Is there anything that i can do to support moving things forward?

Answer 18 · 2024-10-01T18:35:38.000Z

Thanks for checking in @lwasser ! We found one reviewer and are looking for another

Answer 19 · 2024-10-01T18:59:25.000Z

can we help @eliotwrobson @sneakers-the-rat with the second reviewer? if you have one domain-specific review, we can pull a second from our signup who knows about packaging and can test for usability. i just updated the label to seeking-reviewers as well. please let me know how we can move things forward. I want to support everyone here!

Answer 20 · 2024-10-01T19:00:04.000Z

Also thank you for the speedy reply and to everyone her that's already invested in this review ✨

Answer 21 · 2024-10-01T19:39:04.000Z

@lwasser your assessment is correct, the first reviewer we have is knowledgeable about the domain, so I think having a reviewer focused on packaging / usability / documentation is the way to go 👍🏽

Answer 22 · 2024-10-14T16:52:27.000Z

hey @eliotwrobson @sneakers-the-rat I think I found a second reviewer for this @Kai-Striega . 🚀 Kai if you are still able to review this package with a focus on the Cython side of things, we'd appreciate it!

Answer 23 · 2024-10-14T19:02:56.000Z

@lwasser I'm happy to. I should have time this weekend. If I haven't replied by Monday please remind me.

Answer 24 · 2024-10-17T21:40:51.000Z

@eliotwrobson the plan is to spend a couple of hours on an informal review tomorrow (informal as I'm still waiting to do the pyOpenSci induction). I've spent some time (~30 mins) this morning building the project from scratch (aka with Cython) in case there are any build problems. Seems to build so far.

Answer 25 · 2024-10-18T23:45:22.000Z

ok let's kick off this review!
Below is the editor template that we need to get started with.

Editor response to review:

Editor comments

👋 Hi @Kai-Striega @jeremyzhangsq ! Thank you for volunteering to review
for pyOpenSci! It's so awesome to have you here supporting our open source / open science community 🚀

To begin there are a few items to complete:

Please fill out our pre-review survey

Before beginning your review, can you both please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey--it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

@Kai-Striega survey completed.
@jeremyzhangsq survey completed.

Thank you in advance for doing this!

Review templates

The following resources will help you complete your review:

Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
Here is the review template that you will need to fill out and submit
here as a comment once your review is complete.

Let's shoot for a review deadline of 8 November 2024. Please get in touch with any questions that you have!! Have an awesome weekend y'all!! 🚀

Reviewers:
Due date: 8 November 2024.

Answer 26 · 2024-10-19T02:33:22.000Z

YIKES sorry for being AWOL on this one, i couldn't remember if this was one i agreed to do or one that i waved off because i am a little slammed at the moment.

@lwasser You're pretty busy with fall festival stuff atm, would you like me to pick this up again?

Answer 27 · 2024-10-19T16:38:56.000Z

@sneakers-the-rat please don't apologize! It's so understandable!

If you can take this over that would be wonderful!!! Thank you Jonny! 😊

Answer 28 · 2024-10-19T17:56:43.000Z

Thanks all for getting the ball rolling! I'm not sure how much @/jeremyzhangsq checks GitHub, so I'll go ahead and ping him again via email.

Answer 29 · 2024-10-20T02:15:07.000Z

A quick couple of questions @lwasser /@sneakers-the-rat:

as previously discussed, my review has focused primarily on the Cython aspects of the project. Do you require me to review the remaining points on the review template too? I'm happy to do this, but I need to be aware that it's required.
I've look through the package and written plenty of notes but haven't typed them into the review template yet. As this is my first time reviewing, would we be able to organise a mentoring session to ensure I do a great job?

Answer 30 · 2024-10-20T18:55:41.000Z

@Kai-Striega 👋🏻 i just saw your discord dm's! So sorry it took me a few days to respond. I answered you there, please do go through the entire template. I'm happy to either provide a mentor session OR to get someone else in our community to do so as well. You're also in great hands here with @sneakers-the-rat so i don't want to step on any toes more than I already have!! so please just let me know how I can best support everyone here 🚀

Answer 31 · 2024-10-26T06:38:43.000Z

@eliotwrobson please see my review below. I'm happy to expand/talk about any of the points I have made.

@sneakers-the-rat / @lwasser I haven't received a mentor yet, but had the available time to do it today. Please let me know if I am required to change/update anything.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README.
Installation instructions: for the development version of the package and any non-standard dependencies in README.
- Instructions provided, however I had difficulty following them. See comments below.
Vignette(s) demonstrating major functionality that runs successfully locally.
- Vignette in README and examples/ directory
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
- No examples for user facing documentation page
- No examples in models.pyi file with rest of documentation
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A repostatus.org badge,
- Python versions supported,
- Current package version (on PyPI / Conda).
  - PyPi supported and badged
  - conda not yet available, however this package seems appropriate for conda

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Any additional setup required to use the package (authentication tokens, etc.)
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- Link to examples/ directory missing in README
Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
- Vignettes/examples are sufficient
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
- Not provided, have not done a lit review to check if this is required.
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
- Functions/methods have documentation, but not examples in the documentation.
The package is easy to install
- See comments in reviewer comments about wheels and required C++ compiler

Functionality

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8

Review Comments

I have several review comments below. These are ordered by importance.

Requires a C++ compiler to build/lack of wheels

Wheels are provided only CPython 3.11 on Linux. This requires all other possible installers to compile the project from source. This is fairly simple, however it provides an extra step/requirements which is possible to avoid. My recommendation is to provide wheels for all officially supported Python versions on all major platforms.

Cython code generation

The project currently caches the file src/cynetdiff/models.cpp explicitly. I usually recommended not to ship the .c/.cpp files directly. There are several problems with shipping .c/.cpp files:

They are not guaranteed to be forward compatible with future CPython C-API or Cython versions. The CPython C-API seems to be changing fairly frequently at the moment.
It's possible that Cython generated code will not work on certain platforms e.g. arm if built for x86

This is echoed by the official Cython user guide

it is not recommended to include generated files in source distributions. Instead, require Cython at build-time to generate the C/C++ files

Pseudo random number generation

The package currently uses C's rand function. From models.pyx:

from libc.stdlib cimport rand, RAND_MAX
cdef double RAND_SCALE = 1.0 / RAND_MAX

cdef inline double next_rand() nogil:
    return rand() * RAND_SCALE

There are several issues with rand:

There are no guarantees about the algorithm, and therefore the quality of the randomness, used to generate random numbers
It uses global state which can lead to problems with multithreaded applications. While the current project doesn't use any parallelism, this could be significant if CyNetDiff-1 is implemented.

My recommendation is to adopt a more suitable random number generation engine. In the Cython world there are two ways I can recommend of doing this:

C++'s <random> header via the Cython bindings
Adopt NumPy's random number generation. This is officially documented and an example can be seen here

Note that (2) would also allow you to be compliant with SPEC-7.

Minor Cython related improvements

Note these points are minor and should not be considered blocking.

Use of `array.array`

The package uses array.array both for type hints and Cython typing in several places. Consider using Cython's memoryview instead. memoryviews provide the advantage of being available to anything that provides the Buffer Protocol, such as NumPy arrays, which I suspect are used significantly more frequently than array.arrays are by the packages target audience.

Missing potential bounds checks when running cythonize

[1/1] Cythonizing src/cynetdiff/models.pyx
warning: src/cynetdiff/models.pyx:97:51: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:107:35: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:135:39: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:137:38: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:141:34: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:261:39: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:263:45: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:264:34: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:268:38: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:284:52: Use boundscheck(False) for faster access

It may be possible to improve performance by disabling bounds checking in these situations.
I have not verified if this actually improves performance in these particular circumstances

Review resizing `array.array`

    for i in range(m):
        influence_arr.append(1.0 / in_degrees[self.edges[i]])

This computations resizes influence_arr multiple times, leading to multiple memory allocations. It would be more efficient to use .extend rather than .append.

Use of name mangling

Starting a name with __ introduces name mangling in Python. In C/C++ these are reserved for the compilers internal keywords. Name mangling can make life difficult. CyNetDiff uses name mangling at several points (e.g. __activation_suceeds and __advance_model). I'd be careful with these.

Answer 32 · 2024-10-27T06:03:19.000Z

@Kai-Striega thank you for your detailed comments! I have a couple of questions, but I'll try to get started addressing some of these items soon.

Requires a C++ compiler to build/lack of wheels

Do you know of any documentation that has discussion of how to do this? I agree it is much more convenient to not need a compiler installed, but I wasn't sure how to set this up.

Cython code generation

This makes sense to me, I could swear I read the opposite somewhere but I must have been mistaken.

Pseudo random number generation

Thank you for the references! I had actually thought about whether there was a standard way to handle the random number generation. I think I will follow option (2) and add the SPEC-7 compliance.

Use of array.array

Is there a recommended type annotation that expresses what works with the Cython memoryview? It would be nice to add this annotation along with support for numpy arrays.

Answer 33 · 2024-10-27T06:22:53.000Z

@Kai-Striega thank you for your detailed comments! I have a couple of questions, but I'll try to get started addressing some of these items soon.

You're welcome.

Feel free to message if you have any more questions. I'll try answer your questions below.

Do you know of any documentation that has discussion of how to do this? I agree it is much more convenient to not need a compiler installed, but I wasn't sure how to set this up.

AFAIK, there isn't much. You could look at how SciPy builds their wheels for nightly builds. This should include all the info you need. I'm happy to help out with this, but might be a bit slow at the moment.

Note that SciPy's CI is far more complex than I would expect for this project.

This makes sense to me, I could swear I read the opposite somewhere but I must have been mistaken.

This isn't cut and dry. In the past, it was advised to do the opposite. This has changed over the last decade or so. Some people still prefer to include the .c/cpp files. However, this isn't recommended anymore.

Thank you for the references! I had actually thought about whether there was a standard way to handle the random number generation. I think I will follow option (2) and add the SPEC-7 compliance.

You're welcome :)

Is there a recommended type annotation that expresses what works with the Cython memoryview? It would be nice to add this annotation along with support for numpy arrays.

Unfortunately, not that I've come across. The problem is that Cython memoryviews support anything that supports the Buffer Protocol this could include custom objects.

My usual tactic is to use ArrayLike with a Python level wrapper, check if the value is scalar and then convert it to an array. This isn't great.

An alternative option would be to create your own Protocol with all the buffer protocol methods.

Answer 34 · 2024-10-28T00:11:18.000Z

@Kai-Striega I've been able to implement most of the above changes without too many issues. I'm starting to take a look at the spec7 documentation and the examples you linked. Would it be better to provide client code with a wrapper class like the one you linked here that then gets passed to the model class that executes simulations? Or should client code directly pass in a numpy generator? From the timing code in the notebook you linked, it seems like it would be much faster to expose this wrapper class to client code, rather than having a numpy generator passed in.

Answer 35 · 2024-10-28T00:20:33.000Z

I'm at work right now, and can't respond in the length I'd like. I'll edit this over the next few days to expand on my thoughts.

TLDR: Neither. SPEC-7 allows for a couple of different options.

Here is the officially recommended way

from collections.abc import Sequence
import numpy as np


SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator


def my_func(*, rng: RNGLike | SeedLike | None = None):
    """My function summary.

    Parameters
    ----------
    rng : `numpy.random.Generator`, optional
        Pseudorandom number generator state. When `rng` is None, a new
        `numpy.random.Generator` is created using entropy from the
        operating system. Types other than `numpy.random.Generator` are
        passed to `numpy.random.default_rng` to instantiate a `Generator`.
    """
    rng = np.random.default_rng(rng)

    ...

I would use this as a wrapper for your Cython functions.

Answer 36 · 2024-10-29T03:40:46.000Z

Short update that I've addressed most of the items from the review comments in a new release (v0.1.14) besides the rng seeding that is still being discussed and adding more builds. Not quite ready for a re-review till we address the above. Also emailed shiqi, and looks like he is traveling now but will be back to review in early November 👍🏽

Answer 37 · 2024-10-29T04:15:13.000Z

Catching up - here are the things i see raised in the review:

making a checklist mostly for tidyiness sake, not trying to list out requirements - tell me which of these are taken care of/still outstanding

docs: examples
statement of need: we don't have a hard statement of need requirement, and i don't like novelty tests so it's not that, but it would be nice to have some statement for the pyos site that puts this package in context with other packages - "similar to x except y, etc." kind of stuff
packaging: wheels for multiple platforms (looks ongoing)
packaging: versioning generated artifacts (looks like this is done?)
function: use of c's rand function (looks like this is ongoing)
function: name mangling

Thanks for the detailed review :) I learned some things about cython here too.

If it would be helpful, you might consider opening up the remaining items as issues that link back to this one just to keep track of what's left todo/use issues as a cheap form of threading.

Re: building for multiple platforms, this sounds like something we would want to have in the packaging guide, so if you figure something out, maybe we put instructions there?

Answer 38 · 2024-10-29T18:14:23.000Z

@sneakers-the-rat thanks for writing that out. I've removed all the instances of name mangling I could find, and you are correct that removing generated artifacts is done. For the statement of need, would you like something written that can go on the pyos site, or that also to be added to the library documentation? A simple description is that this package is a faster version of ndlib.

Answer 39 · 2024-10-29T22:12:56.000Z

Re: building for multiple platforms, this sounds like something we would want to have in the packaging guide, so if you figure something out, maybe we put instructions there?

While this is noble, I think a unified approach to building will be hard, especially when extension modules are involved. It will depend on which languages are used, which platforms you build for, how complicated you want to get, and so on...

In particular, I feel that Poetry doesn't handle extension modules particularly well. There is SPIN which much of the scientific Python ecosystem seems to be standardising on, however this would require a complete rewrite of CyNetDiff's infrastructure.

A simple description is that this package is a faster version of ndlib.

If you're going to make claims that it's faster, it would be great to show it. I think this should be required if you make performance claims.

Answer 40 · 2024-10-30T00:08:40.000Z

While this is noble, I think a unified approach to building will be hard

Fair! i was thinking of it as a starting point for a more general guide, but let's leave that aside so we don't muddy this review :) Moved that over here: pyOpenSci/python-package-guide#422

I think this should be required if you make performance claims.

Agreed that we should be able to show a benchmark to claim this. @eliotwrobson to answer your earlier question, yes this would be for the pyos website, where ideally the goal is that we build out the website so that it's possible to see a package and its 'neighbors,' "the family of tools that do things like x," and much like how open reviews on eLife allow readers to know the context of a paper and how people in that field think of it, it would be great (but not required) to provide that same context for software. "why this and not that? etc." If benchmarks are not in reach at the moment, a simple "see also: ndlib" would be fine

Answer 41 · 2024-10-30T05:07:46.000Z

For the benchmark, some light benchmarks are included in this example notebook: https://github.com/eliotwrobson/CyNetDiff/blob/main/examples/visualizations.ipynb

I'm not sure if there's a more standard way of doing this? We also include some discussion of these benchmarks in our paper: https://www.vldb.org/pvldb/vol17/p4409-umrawal.pdf

Regarding building, although changing / documenting something to work with Poetry is probably not the most productive given what others have said, I think having some example related to this in the packaging guide would be extremely useful. I had a very hard time even figuring out how to get started writing a package that used Cython, and this would be extremely useful to me on another project I want to get started on before too long.

Answer 42 · 2024-10-30T06:15:55.000Z

If you have published benchmarks already, then that's good enough for me for a claim like that - we can just cite them :)

Also am i reading this right, that an identical operation took 14s with ndlib and 0.067s with cynetdiff? that's.... certainly a perf improvement. well done.

Also i forgot to say earlier, i was asking for a context statement for the website but i also generally (i.e. outside of the purposes of the review) think they are good to have in documentation. If you think you've improved on something, I don't think it's disrespectful to compare yourself to it esp on something objective like 'the time it takes to perform an identical operation,' and it's helpful to everyone to have better linking between what's out there. One example i love is how the first two sections of the dask dataframes 'best practices' page are "use pandas" (and then clarifications about when you would want to use dask)

Answer 43 · 2024-11-01T23:00:52.000Z

@eliotwrobson

For the benchmark, some light benchmarks are included in this example notebook: https://github.com/eliotwrobson/CyNetDiff/blob/main/examples/visualizations.ipynb

I'm happy with the benchmarks as-is, for the purpose of a formal review, I think there is room for improvement.

I'm not sure if there's a more standard way of doing this? We also include some discussion of these benchmarks in our paper: https://www.vldb.org/pvldb/vol17/p4409-umrawal.pdf

There are tools out there to do this for you. The one that seems to be favoured by NumPy/SciPy (where I do most of my FOSS stuff) is air speed velocity. There are several other tools also available, but I don't know too much about them.

Regarding building, although changing / documenting something to work with Poetry is probably not the most productive given what others have said, I think having some example related to this in the packaging guide would be extremely useful. I had a very hard time even figuring out how to get started writing a package that used Cython, and this would be extremely useful to me on another project I want to get started on before too long.

I've tried building Cython projects with Poetry and actually gave up. I found it very hard/confusing, so just getting your package to build should be considered an achievement. My go to for building Cython projects is now SPIN. This uses Meson which has an inbuilt Cython Module.

Answer 44 · 2024-11-04T21:38:00.000Z

I've tried building Cython projects with Poetry and actually gave up. I found it very hard/confusing, so just getting your package to build should be considered an achievement. My go to for building Cython projects is now SPIN. This uses Meson which has an inbuilt Cython Module.

@sneakers-the-rat I think that even this simple recommendation alone would be a useful thing to add to the packaging guide 👍🏽

Regarding the remaining review items, I need to refactor another feature for this package to implement something for a paper I'm working on. I think the other reviewer is going to be able to take a look soon, so I'll wait for his comments before attacking the rest of these items.

Answer 45 · 2024-11-04T23:20:47.000Z

Great - if it would be possible to open issues for each of the requested changes linking back to this issue (either of you), then I (or you) can make a list in the OP of completed/outstanding work to do so that the second reviewer can know whats already been raised/is being addressed and they can focus their review elsewhere, or comment on those ongoing issues!

Answer 46 · 2024-11-05T19:22:10.000Z

@sneakers-the-rat just made associated issues and linked them back here (they are the top 4 in the current issues list).

Answer 47 · 2024-11-16T15:27:39.000Z

@eliotwrobson check this out:D

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README.
Installation instructions: for the development version of the package and any non-standard dependencies in README.
Vignette(s) demonstrating major functionality that runs successfully locally.
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a [README.md](http://readme.md/) file in the root directory.

The README should include, from top to bottom:

The package name
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A [repostatus.org](https://www.repostatus.org/) badge,
- Python versions supported,
- Current package version (on PyPI / Conda).
  - Yes for PyPI

NOTE: If the README has many more badges, you might want to consider using a table for badges: [see this example](https://github.com/ropensci/drake). Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Any additional setup required to use the package (authentication tokens, etc.)
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the [README.md](http://readme.md/) file.
- Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
- The authors have compared this package with NDlib, a wide-accepted library for network diffusion. This package is more efficient (See the PVLDB paper).
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
The package is easy to install

Functionality

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their [submission requirements](http://joss.theoj.org/about#submission_requirements).

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching [JOSS's requirements](http://joss.theoj.org/about#paper_structure) with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6 hours

Review Comments

I have to admit that this project really surprised me. Previously, the IM community mainly used C++ for implementation for the sake of efficiency. However, it may cause inconvenience for real-world development and deployment (from my own experience). Thus, I believe this project will help our community a lot. Thanks to other reviewers’ comments, I can quickly catch up with the progress. Overall, this project meets my standards except two minors to improve.

I suggest showing some key evaluation results in the README, such as how fast CyNetDiff can be compared to NDlib.
To adapt other variants of IC and LT models in the future, I suggest to provide args about initial node thresholds and edge weights in

def linear_threshold(
    G: t.Union[nx.Graph, nx.DiGraph], seeds: t.Iterable[int], steps: int = 0
) -> t.List[t.List[int]]:

An example of the LT variant that requires customized thresholds and weights is this:

Nicola Barbieri and Francesco Bonchi. 2014. Influence maximization with viral
product design. In ICDM.

Answer 48 · 2024-11-21T03:15:02.000Z

Thanks to both reviewers for the feedback! Have a few more urgent things to take care of but will try to get through some of these changes over the upcoming holiday 👍🏽

EDIT: A little behind schedule (whoops), but will get to everything mentioned here before too long.

Answer 49 · 2024-12-14T20:20:37.000Z

Finally had a chance to go over things again, and I managed to get a PR for the most significant item so far (SPEC7 compliance): eliotwrobson/CyNetDiff#28

@Kai-Striega if you have time, could you glance at this PR briefly? No need to spend too much time looking through the internals, I just want to double check that we're doing everything as expected in the spec.

As for some of the other comments, I think that we have a sufficient amount of customization in the model creation for now (edge weights are accepted as arguments by both models, and adding custom thresholds for the linear threshold model won't give a performance boost over pure-python unless we adopt a custom model of randomly generating thresholds in each simulation). So the only remaining items are the following

More examples in the documentation.
ASV / something showing more current benchmarks.
Releasing wheels on multiple platforms (I probably won't do this because I can't figure out how).

Answer 50 · 2024-12-14T21:29:34.000Z

@eliotwrobson I'll take a look, hopefully today.

Releasing wheels on multiple platforms (I probably won't do this because I can't figure out how).

I think having wheels for multiple platforms is important. If you don't supply them, the user has to figure out how to compile your project on their own. i.e. you're moving the burden of having to figure out how to compile it on their platform from yourself to each individual user.

I'm happy to help with this a bit if you need it, basically for NumPy/SciPy we build it as a CI job. You can see how NumPy does it by looking in the .github/workflows/ directory.

Answer 51 · 2024-12-14T21:41:09.000Z

I'm happy to help with this a bit if you need it, basically for NumPy/SciPy we build it as a CI job. You can see how NumPy does it by looking in the .github/workflows/ directory.

@Kai-Striega Thank you! This would be greatly appreciated 🙏🏽 🙏🏽 I'd be open to switching the build system as well if that is necessary, I'm just currently not sure how to set up automated publishing with respect to this.

Answer 52 · 2024-12-30T20:03:41.000Z

Update on the status of things here. I've made a pass and resolved all but one of the outstanding requested changes from this review (the remaining is the distribution of wheels for other platforms, which I'll be handling next). The latest released version (0.1.16) has the following changes:

Examples for all of the user facing functions in the API, in addition to the extended examples present already: https://eliotwrobson.github.io/CyNetDiff/api/
Benchmarks run through pytest-benchmark on GitHub actions for the main branch. The comparison here is done with NDlib and there is a link to the latest benchmarks run from the main branch in the README. Not the most sophisticated benchmarks in the world but appropriate given the size of the project (ASV was ultimately too much overhead when there are only really a couple of tasks we care about benchmarking here).
Support for randomized activation and other utility functions to support additional use cases.

The wheel distribution will take a little more time than everything else, but I'll get to that as soon as I can. However, I think everything else is in a good spot 👍🏽

Code of Conduct & Commitment to Maintain Package

Description

Scope

Domain Specific

Community Partnerships

Technical checks

Publication Options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Please fill out our survey

Editor and Review Templates

Review issues

Footnotes

Editor in Chief checks

Editor comments

Editor response to review:

Editor comments

Please fill out our pre-review survey

Review templates

Let's shoot for a review deadline of 8 November 2024. Please get in touch with any questions that you have!! Have an awesome weekend y'all!! 🚀

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

Requires a C++ compiler to build/lack of wheels

Cython code generation

Pseudo random number generation

Minor Cython related improvements

Use of array.array

Missing potential bounds checks when running cythonize

Review resizing array.array

Use of name mangling

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

Use of `array.array`

Review resizing `array.array`