taichi-dev/poisson-sampling-homework

Submit your work here

neozhaoliang opened this issue · 17 comments

Please submit your homework in this issue. Recommended submission formats:

Repo link: https://github.com/your-name/your-repo

Description: quickly phrase what you did in your code.

Screenshots: https://raw.githubusercontent.com.../xxxx.png

Repo link:https://github.com/yangyongkang2000/C-Programming/tree/master/poisson_disk_sample/poisson_disk_sample

最近看了微信公众号的太极图形一篇文章《如何在平面上随机均匀撒点?60 行 Python 代码实现 Poisson disk 采样,比 NumPy 快 100 倍!》,根据文章上的描述,用C++对快速泊松取样进行了简单的实现(差不多是对着Python实现照葫芦画瓢)。实现的内容为在二维[0,1]*[0,1]正方形 (正方形是由N * N个同样大小的方格组成的)进行快速泊松取样。

@yangyongkang2000 非常棒!你是否介意我们使用你的代码作为 benchmark

@yangyongkang2000 非常棒!你是否介意我们使用你的代码作为 benchmark

不介意

我写了一个 PR 优化了 Numba 的实现:taichi-dev/taichi_benchmark#11

我写了一个 PR 优化了 Numba 的实现:taichi-dev/taichi_benchmark#11

我觉得还有优化空间。我觉得重点优化区域是生成泊松点的循环内部。我用C++最初写的代码

for(int _=0;_<Max_Retry;_++)
        {
            auto theta=dis(gen)*2*std::numbers::pi_v<T>;
            auto tr=(dis(gen)+1)*r;
            auto x=p[0]+tr*std::cos(theta);
            if(x<0||x>=1) continue;
            auto y=p[1]+tr*std::sin(theta);
            if(y<0||y>=1) continue;
            auto w=static_cast<int>(x*N);
            auto h=static_cast<int>(y*N);
            if(check(x,y,w,h)){
                *(beg+tail)={x,y};
                grid[h*N+w]=tail++;
                if(tail>=len) return end;
            }
        }

在循环内部生成x,y坐标,我把代码稍微改了一下,先在循环外部生成一个x,y坐标组,然后对数组元素进行check,修改之后

std::generate_n(r_theta.begin(), Max_Retry, [&]{return std::pair<T,T>{(dis(gen)+1)*r,dis(gen)*2*std::numbers::pi_v<T>};});
        std::transform(r_theta.begin(), r_theta.end(), xy_l.begin(), [&](auto &rt){return std::pair<T,T>{p[0]+rt.first*std::cos(rt.second),p[1]+rt.first*std::sin(rt.second)};});
        for(auto &[x,y]:xy_l)
            if(0<=x&&x<1&&0<=y&&y<1){
                auto w=static_cast<int>(x*N);
                auto h=static_cast<int>(y*N);
                if(check(x,y,w,h)){
                    *(beg+tail)={x,y};
                    grid[h*N+w]=tail++;
                    if(tail>=len) return end;
                }
            }

最后测试,性能有提升。

我写了一个 PR 优化了 Numba 的实现:taichi-dev/taichi_benchmark#11

我觉得还有优化空间。我觉得重点优化区域是生成泊松点的循环内部。我用C++最初写的代码

for(int _=0;_<Max_Retry;_++)
        {
            auto theta=dis(gen)*2*std::numbers::pi_v<T>;
            auto tr=(dis(gen)+1)*r;
            auto x=p[0]+tr*std::cos(theta);
            if(x<0||x>=1) continue;
            auto y=p[1]+tr*std::sin(theta);
            if(y<0||y>=1) continue;
            auto w=static_cast<int>(x*N);
            auto h=static_cast<int>(y*N);
            if(check(x,y,w,h)){
                *(beg+tail)={x,y};
                grid[h*N+w]=tail++;
                if(tail>=len) return end;
            }
        }

在循环内部生成x,y坐标,我把代码稍微改了一下,先在循环外部生成一个x,y坐标组,然后对数组元素进行check,修改之后

std::generate_n(r_theta.begin(), Max_Retry, [&]{return std::pair<T,T>{(dis(gen)+1)*r,dis(gen)*2*std::numbers::pi_v<T>};});
        std::transform(r_theta.begin(), r_theta.end(), xy_l.begin(), [&](auto &rt){return std::pair<T,T>{p[0]+rt.first*std::cos(rt.second),p[1]+rt.first*std::sin(rt.second)};});
        for(auto &[x,y]:xy_l)
            if(0<=x&&x<1&&0<=y&&y<1){
                auto w=static_cast<int>(x*N);
                auto h=static_cast<int>(y*N);
                if(check(x,y,w,h)){
                    *(beg+tail)={x,y};
                    grid[h*N+w]=tail++;
                    if(tail>=len) return end;
                }
            }

最后测试,性能有提升。

好想法!另外比较好奇,请问你有对比 C++ 实现和 Taichi/Numba 这样的 Python + JIT 的性能差异吗?

我写了一个 PR 优化了 Numba 的实现:taichi-dev/taichi_benchmark#11

我觉得还有优化空间。我觉得重点优化区域是生成泊松点的循环内部。我用C++最初写的代码

for(int _=0;_<Max_Retry;_++)
        {
            auto theta=dis(gen)*2*std::numbers::pi_v<T>;
            auto tr=(dis(gen)+1)*r;
            auto x=p[0]+tr*std::cos(theta);
            if(x<0||x>=1) continue;
            auto y=p[1]+tr*std::sin(theta);
            if(y<0||y>=1) continue;
            auto w=static_cast<int>(x*N);
            auto h=static_cast<int>(y*N);
            if(check(x,y,w,h)){
                *(beg+tail)={x,y};
                grid[h*N+w]=tail++;
                if(tail>=len) return end;
            }
        }

在循环内部生成x,y坐标,我把代码稍微改了一下,先在循环外部生成一个x,y坐标组,然后对数组元素进行check,修改之后

std::generate_n(r_theta.begin(), Max_Retry, [&]{return std::pair<T,T>{(dis(gen)+1)*r,dis(gen)*2*std::numbers::pi_v<T>};});
        std::transform(r_theta.begin(), r_theta.end(), xy_l.begin(), [&](auto &rt){return std::pair<T,T>{p[0]+rt.first*std::cos(rt.second),p[1]+rt.first*std::sin(rt.second)};});
        for(auto &[x,y]:xy_l)
            if(0<=x&&x<1&&0<=y&&y<1){
                auto w=static_cast<int>(x*N);
                auto h=static_cast<int>(y*N);
                if(check(x,y,w,h)){
                    *(beg+tail)={x,y};
                    grid[h*N+w]=tail++;
                    if(tail>=len) return end;
                }
            }

最后测试,性能有提升。

好想法!另外比较好奇,请问你有对比 C++ 实现和 Taichi/Numba 这样的 Python + JIT 的性能差异吗?

我没有对比。

先生成 100 个点然后再遍历确实会快一些。其实还有加速的技巧,比如想想怎么把那个越界检查去掉?

Hi, @Mike-Leo-Smith @yangyongkang2000
Thanks for your great job. We have souvenirs for those who finished this homework. Please fill in the form if you'd like to get it. We will ship the souvenir the first Friday after we receive your information.

Hi, @Mike-Leo-Smith @yangyongkang2000
Thanks for your great job. We have souvenirs for those who finished this homework. Please fill in the form if you'd like to get it. We will ship the souvenir the first Friday after we receive your information.

Thanks!

Hi, @Mike-Leo-Smith @yangyongkang2000

Thanks for your great job. We have souvenirs for those who finished this homework. Please fill in the form if you'd like to get it. We will ship the souvenir the first Friday after we receive your information.

谢谢🙏

Hi, @Mike-Leo-Smith @yangyongkang2000

Thanks for your great job. We have souvenirs for those who finished this homework. Please fill in the form if you'd like to get it. We will ship the souvenir the first Friday after we receive your information.

快递发出去了吗?

Hi, @Mike-Leo-Smith @yangyongkang2000

Thanks for your great job. We have souvenirs for those who finished this homework. Please fill in the form if you'd like to get it. We will ship the souvenir the first Friday after we receive your information.

收到了🫡谢谢image
image

Repo link: https://github.com/lucifer1004/poisson-sampling-julia

Description: Implement the algorithm in Julia and extend it to N-Dimension.

Benchmarks: See this notebook. Generating 50,000 2-D points takes ~250 ms on a single CPU thread.

Screenshots:

image

image

@neozhaoliang 可以申请一份纪念品吗(✧◡✧)

@lucifer1004 可以的呀,后面会有社区的同学联系您。

@lucifer1004 Thanks for your great job. We have souvenirs for those who finished this homework. Please fill in the form if you'd like to get it. We will ship the souvenir the first Friday after we receive your information.