Echopype: Upgrade robustness and scalability of ocean sonar data processing
Opened this issue ยท 16 comments
Project Description
Echosounders, or high-frequency ocean sonar systems, are the workhorse to study life in the ocean. They provide continuous observations of fish and zooplankton by transmitting sounds and analyzing the echoes bounced off these animals, just like how medical ultrasound images the interior of the human body. In recent years echosounders are widely deployed on ships, autonomous vehicles, or moorings, bringing in significant volumes of data that allow scientists to study the rapidly changing marine ecosystems. This project aims to upgrade the robustness and scalability of the Echopype package, which standardizes data from different echosounder instruments into widely accessible netCDF or Zarr files. The project work will focus on making the Echopype testing suite more robust by overhauling its Continuous Integration (CI) mechanisms and tackling distributed computing bottlenecks in processing irregularly spaced echosounder data across computing agents.
Expected Outcomes
[1] Robust Continuous Integration mechanisms that utilize GitHub release assets for hosting test files
[2] Increased test coverage for foundational data conversion functions
[3] Improved distributed computing performance for major processing functions on large (100s of GB) data sets
Skills required
Python; Libraries: Xarray, Dask, Zarr; Interests in working with oceanographic, acoustic or geospatial data
Mentor(s)
Wu-Jung Lee (@leewujung), Valentina Staneva (@valentina-s)
Expected Project Size
175
What is the difficulty of the project?
Intermediate
@leewujung @valentina-s Could you choose either 175 or 350 hours for the project size and update the description above? We got some feedback from GSoC that 'Project sizes need to be scoped to 90, 175 or 350 hours (you can not have a project that is 200 hours, or some other random number of hours)'.
Not entirely sure if that means an option of two of the official project sizes is allowable or not, but to be safe we should just go with one size. I will update the project ideas list accordingly. Thanks!
@mwengren : Yes! I think we can change it to 175 hours. I'll submit a pull request for that edit. Thanks
Hello @leewujung @valentina-s
My name is Mohamed Nasser I am last year student at Biomedical Engineering I have experience in python and it's libraries
Through my acadimic journey i dealed with different biological data and visualizations.
I think this idea is very interesting to me and want to learn more about it.
I don't have all the skills required but I am eager to learn, so can you tell how to start.
I think i will begin by getting familier with Xarray, Dask, Zarr and know more about oceanographic
Sould I make contribution to the main repo or can @leewujung , @valentina-s guide me how to start?
Hey @MohamedNasser8:
Thanks for reaching out! You can start out by checking out our contributor's guide and make sure:
- You have the dev environment ready to do, and
- You can run the notebooks in the echopype-examples repo which we use to host example notebooks of using the echopype package.
In the next few days we'll start marking relevant issues in the echopype repo, but feel free to start by looking into needs and ask questions or propose anything about upgrading the testing framework and improve test coverage.
I created a new label "GSoC24" and will continue to add issues to that, feel free to take a look: https://github.com/OSOceanAcoustics/echopype/labels/GSoC24
Hello @leewujung and @valentina-s,
I hope you guys are doing well! My name is Duong. I'm a 2nd year student majoring in Computer Science at University of Alberta in Canada. All of my coursework have been taught in Python, therefore I would say I have a pretty good grasp of the language. I'd love to have the chance to contribute to this project as working with different echosounder instruments and oceanographic data sound intriguing.
I'm currently going over the "Contributing to echopype" section but I'm stuck at the "Running the test" part. After installing docker, I typed the first command in the activated conda env (this one: python .ci_helpers/docker/setup-services.py --deploy) but it gave me a bunch of errors. For example, in step 2, I got " TypeError: kwargs_from_env() got an unexpected keyword argument 'ssl_version' ", as well as connection errors. I was wondering if this was on my end and what I should do next.
Regarding the proposal, say I've done my draft proposal, would it be okay if I email that to you guys and get some feedbacks on how I can refine it? If yes, please tell me the appropriate email address.
Thanks,
Hey @leewujung
what will be the selection criteria ? what weightage will contributions have ?
Greetings @leewujung,
Myself Kshitij Patil, a third-year electronics engineering student, Veermata Jijabai Institute of technology, Mumbai, India.
I am interested to contribute in this project as I find it a very good learning opportunity for me. I am confident that I will be able to work efficiently as this project is quite intriguing for me.
I was going through the "Contributing to echopype" document, successfully completed the installation. But now I'm stuck at the "Running the tests" segment. I had docker previously installed and I am getting stuck at this specific command
python .ci_helpers/run-test.py --local --pytest-args="-vv"
This is the error I am receiving -
I tried to resolve the error as I thought it must be the error regarding my pytest not present in the pip list. But thats not the case, i have been trying a lot but now finally came to this conclusion from Chat-gpt
If the issue persists, you might need to check for any custom configurations or specific instructions provided by the Echopype project for running tests locally. Additionally, consider reaching out to the project maintainers or community for further assistance, as they may have insights into project-specific configurations. Remember to consult the project's documentation or README file for any additional requirements or setup instructions related to testing.
Can you please assist me regarding this error, please
I would also want to know what exactly is expected to proceed for this project, which resources should i go through and what issues can I look after to
Thanks and regards,
Kshitij Patil
@skald1311 : Looks like what you ran into is an issue with the latest docker version: https://stackoverflow.com/questions/77641240/getting-docker-compose-typeerror-kwargs-from-env-got-an-unexpected-keyword-ar Try to see if you could downgrade the version and the problem should be resolved -- @ctuguinay who's on our team found this problem last week!
@Kshitijpatil16 : Seems like it is an environment setup problem. Maybe see if you can verify that you are in the right environment where pytest
is available.
I added an issue template for asking questions and discussing ideas under GSoC24. Feel free to give it a try, as well as asking questions directly under existing issues.
I've also added the "GSoC24" labels to more existing issues that are within the realm of GSoC24. The newly added ones are more related to the scalability component, and the previous ones are more related to testing.
I will put together a GSoC contributor's guide in the next couple days, which should answer some of the above questions, but in general:
- We would like to see some rough prototypes of what you plan to do as PR(s) to go with your proposal
- Your proposals and/or PR(s) should demonstrate that you have taken concrete steps toward understanding any previously unfamiliar libraries, and show that you will be able to improve the testing framework/tests and enhance scalability. Examples include (but not limited to) benchmarking report or potential solutions you find on the internet related to existing issues
- We are happy to provide feedback to your proposals. Please find email from my profile
Hello @leewujung, Is there a template for the porposal?
Alright, here's the GSoC24 contributor's guide: https://github.com/OSOceanAcoustics/echopype/blob/main/gsoc_contrib_guide.md
@MohamedNasser8 : Yes, please use the IOOS template. See the contributor's guide linked above for more info.
Greeting @leewujung
I am a 3rd year student pursuing B.Tech in Computer Science(specialisation in Data Science and AI) at Shri Ramswaroop Memorial University, Lucknow.Through my academy Journey i have dealed in Core Java, Python, Machine Learning, visualization and has been active on LeetCode which has so much improved my Coding as well as analytical skills. This Project has piqued my interest on how to maximize its Potential and want to contribute to it. I am eager to start working on it. So can you please guide me through
on what are the current weakness and how much of increase in scalability you expect
Hey @yusuf-khaan : Thanks for your interest! Please see the links in my response above to get started on your contribution/proposed work.