- House Prices - Advanced Regression Techniques
- Classify Leaves
- CIFAR-10 - Object Recognition in Images
- Dog Breed Identification
- CowBoy Outfits Detection
- Train with full precision: choose
P100
. - Train with half precision: choose
T4 x2
.
- Add
accelerator="gpu", devices=2, strategy="ddp_notebook"
topl.Trainer
. - Use another trainer with
devices=1
for test and predict. - Add
sync_dist=True
tolog
andlog_dict
. trainer.fit
can only called once.- When rewriting any epoch_end function, if you log, just make sure that the tensor is on gpu device. If you initialize new tensor, initialize it with device=self.device. See #18803.
- Fix warnings with
permute
andtranspose
when using DDP, issued in #47163, as shown below:
sed -i 's#\(permute(.*\?)\)#\1.contiguous()#' \
/opt/conda/lib/python3.10/site-packages/torchvision/models/convnext.py \
/opt/conda/lib/python3.10/site-packages/torchvision/ops/misc.py \
/opt/conda/lib/python3.10/site-packages/torchvision/models/detection/roi_heads.py \
/opt/conda/lib/python3.10/site-packages/torchvision/models/detection/rpn.py \
...
### Torchvision
- The version of
torchvision
is 0.15.1, so there are not v2.CutMix
and v2.MixUp
in torchvision.transforms
.- Use import torchvision.transforms as v1
instead.
pip install pyngrok
## Attach NGROK_AUTHTOKEN in `Add-ons > Secrets` first
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
ngrokToken = user_secrets.get_secret("NGROK_AUTHTOKEN")
from pyngrok import conf, ngrok
conf.get_default().auth_token = ngrokToken
conf.get_default().monitor_thread = False
ssh_tunnels = ngrok.get_tunnels(conf.get_default())
if len(ssh_tunnels) == 0:
ssh_tunnel = ngrok.connect(6006)
print('address:'+ssh_tunnel.public_url)
else:
print('address:'+ssh_tunnels[0].public_url)
from subprocess import Popen
Popen("tensorboard --logdir ./lightning_logs/ --host 0.0.0.0 --port 6006", shell=True)
ps aux | grep tensorboard