Collection of interesting links and papers on several topics like Robotics, Machine learning, AI and others.
- FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent
- End-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence.
- You Only Look Once: Unified, Real-Time Object Detection
- Single-shot detector approach for real-time object detection.
- Focal Loss for Dense Object Detection
- Loss function focusing on "hard" examples to improve standard cross-entropy loss.
- FaceNet: A Unified Embedding for Face Recognition and Clustering
- Introduces Triplet Loss for direct embeddings training - intuitive, yet innnovative concept. Also touches other training pipeline components.
- Project DeepSpeech
- Open-source speech-to-text engine by Mozzila.
- Whisper
- OpenAI Automatic Speech recognition in many languages with MIT license.
- CBAM: Convolutional Block Attention Module
- Spatial and channel feature attention blocks added to improve convolutions by focusing on features.
- Mind the Pad -- CNNs can Develop Blind Spots
- A bit hidden, but very interesting paper pointing out overlooked blind spots emerging in training convolutional networks.
- A ConvNet for the 2020s
- Rethinking ResNet architecture with knowledge emerging from transformer architectures.
- Attention Is All You Need
- Novel attention based approach to replace convolutional and recurrent networks in NLP.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Transformers applied to computer vision with aim to replace convolution.
- Per-Pixel Classification is Not All You Need for Semantic Segmentation
- Unified model architecture for semantic and panoptic segmentation.
- FlashAttention-2: Faster Attention with Better Parallelism
- Introduces improved FlashAttention by improving both architecture, but also low-level GPU operations to gain performance.
- Training data-efficient image transformers & distillation through attention
- Teacher-student based transformer (DeiT) architecture with many nice tricks to improve training efficiency.
- Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
- SOTA performance open weights&data multimodal models.
- LLM Visualization
- Very illustrative interactive GPT LLM visualisation.
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
- Interpreting high level features (multimodal, language agnostic) in LLM autopencoder, even safety relevant.
- Mixtral of Experts (2024)
- Mixture-of-experts paradigm for effective inference surpassing larger models in both speed and performance.
- NVLM: Open Frontier-Class Multimodal LLMs
- Comparison of multimodal approaches, introduction of novel hybrid approach.
- Language agents achieve superhuman synthesis of scientific knowledge
- Scientific knoledge synthesis RAG agent tool supporting local models (version 2). Also introduces LitQA2 benchmark.
- Generative Adversarial Networks
- Reversed corruption process based data generation from noise.
- High-Resolution Image Synthesis with Latent Diffusion Models
- Realistic image synthesis based on text input.
- ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters
- Control policy for athletics learning improved by combining adversal imitation with unsupervised reinforcement learning.
- Visualizing and Understanding Convolutional Networks (Zeiler, Fergus)
- Novel article on visualisation CNN's feature layers in order to get deeper understanding.
- Distilling the Knowledge in a Neural Network
- Model destillation and ensembelling to improve performance.
- Revisiting ResNets: Improved Training and Scaling Strategies
- Applying new training methods to old networks (ResNet) for baseline methods use. Nicely shows, that some improvements were more of training process, then architecture.
- Deep Residual Learning for Image Recognition
- Effective training of very deep neural networks.
- Dataset Distillation
- Distilling datasets to fraction of original size to make learning more effective.
- NeuralVDB: High-resolution Sparse Volume Representation using Hierarchical Neural Networks
- Improve storage efficiency of OpenVDB with nerual network architecture.
- Can I use this publicly available dataset to build commercial AI software? -- A Case Study on Publicly Available Image Datasets
- Analysis of licenses of common public datasets and it's implication for commercial use.
- Andrej Karpathy: A Recipe for Training Neural Networks
- Inspiring blog post about with self-explanatory title.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
- Mixing model weights (in the soup) from grid search to improve performance.
- Robust fine-tuning of zero-shot models
- Improving model robustness while fine-tuning by simple ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT).
- Color-to-Grayscale: Does the Method Matter in Image Recognition?
- Study on the topic of RGB-grayscale algorithms and their influence on ML algorithms.
- Open Images Dataset V7 and Extensions
- Open source image dataset with various labels (bounding boxes, text labels, segmentation maps,...).
- More Inclusive Annotations for People
- Extends Open Images wit additional labels for persons to achieve higher ML fairness.
- Stop using the elbow criterion for k-means and how to choose the number of clusters instead
- Elbow method k-means clustering critics with proposal of better methods.
-
Software Engineering at Google
- Collection of good engineering practices for SW development. Not that much about programming, but also covers team leading etc
-
- Great collection of (4000+) free programming books and courses (2000+).
-
- Yearly Stanford University AI index report "Measuring trends in AI".
-
- A Developer's Guide to Enterprise-Grade RAG Systems from Galileo.
- Genesis
- Open-source physics simulation platform designed for general purpose Robotics, Embodied AI, & Physical AI applications.
- Awesome Foundation Models
- List of large scale pretrained foundation models.
- Blendify
- Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.
-
A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards
- Eavesdropping your computer keyboard with smartphone or from videocall is possible!
-
- Pushing forward gr-tempest effort of intercepting HDMI image transfer via electromagnetic emanations.
- Passphrase generator
- A Passphrase dice-based generator. One of the best ways how to create password-passphrase.