school-brainhack/school-brainhack.github.io

Maintenance of python_data_analysis module

Closed this issue ยท 6 comments

Module to maintain

Website: https://school-brainhack.github.io/modules/python_data_analysis/
Code: https://github.com/school-brainhack/school-brainhack.github.io/tree/main/content/en/modules/python_data_analysis

Work to be done

Please ensure:

  • links, resources and exercises reflect the state-of-the-art
  • videos are of good quality and reflect the materials.

If any update / re-write of the module is needed, feel free to suggest alternative material in a comment!
If you think major changes are needed, we suggest you open an issue for new modules.

sltou commented

I can take this one!

I can take this one!

Great, thank you! I have assigned you.

Not sure if I should post this in Installation or Python, but since this is mainly an issue with the Jupyter kernel, I'm posting it here.

    1. This is a minor question: A new tab is supposed to open automatically when one types jupyter notebook or jupyter lab in the terminal. Indeed it does in a MacOS terminal and in my Windows PowerShell. But in my Ubuntu 18.04 and 22.04, (installed on top of Windows 11), a browser tab does not pop up automatically; one has to press Ctrl and click on the url. I've tried manually editing the configuration file but the auto-open-tab thing still isn't happening.
    1. Normally Jupyter dies when one closes the terminal to which the kernel is connected. As such, I sometimes type jupyter lab & disown to put Jupyter to work in the background. However, I've observed something interesting as I've been working on a 20-subject EEG dataset (file size 300MB/per file): when Jupyter is in the background and the terminal is closed, Jupyter's kernel dies automatically at the 4th or 5th subject when it's supposed to loop over all 20 subjects. Initially I thought it had something to do with pip and conda installed package conflicts, but the kernel behavior persisted after I switched to a virtualenv with no conda involvement. Then I realized it was due to the terminal having been closed.
    1. WSL is slow in performance compared to a pure Linux environment or a native Windows; in particular, WSL2 is much slower than WSL1 (I saw a thread on Stack overflow or Reddit which extensively compared their speed, but I can't find the web link now; I can confirm, though, that my 20-subject loop runs twice as fast via PowerShell than via Ubuntu). I'm wondering if students are considering working with large datasets or memory-consuming computations, it's better to work in their native environment for better performance provided that they have Python there?

Hi @amandalin047, thanks so much for this! My thoughts:

i. What would you suggest for this? Perhaps a note in the module letting students know the browser will not pop up automatically? If so, it would be awesome if you could add that.

ii. I don't routinely use this command, but is it the same behaviour if you do jupyter notebook & disown? If it's a jupyter lab issue specifically I don't think we should bring it into the module since we don't use juypter lab. Out of interest though, have you tried using tmux instead?

iii. If students are working with large datasets or running intensive computations we recommend to use the HPC servers - this year we are updating that module to include info about the Brainhack Cloud. We can direct them to that module if any issues. Do you think that would suffice?

Hi @sltou, how are you getting on with maintenance of this module? Do you need any help? Just a reminder that we are hoping to have these done by April 21st

sltou commented

@clarkenj checked and everything ran smoothly :) No edits on my end.
@amandalin047
i. Agreed with @clarkenj ,perhaps you can add a note for the students who use Linux? I personally do not use Linux and cannot reproduce the issue
ii. This is out of the scope of my knowledge, but other people seem to have the same issue https://stackoverflow.com/questions/47331050/how-to-run-jupyter-notebook-in-the-background-no-need-to-keep-one-terminal-for
I don't think it is a conda environment issue. Probably a combination of OS and how the jupyter kernal is set up to keep alive. For the students, I suggest just reminding them to keep the terminal open.
iii. Also agreed that cloud is a good solution for large datasets!