LeRobot Hackathons - VLA Models for SO-100
Hackathon projects using VLA and VLM models with SO-100 robot: shorts folding, kitchen tasks with Moondream, and plantain leaf cleaning
LeRobot Hackathons - VLA Models for SO-100
Overview
Hackathon projects using VLA and VLM models with SO-100 robot: shorts folding, kitchen tasks with Moondream, and plantain leaf cleaning
Project Overview
Participation in LeRobot hackathons exploring Vision-Language-Action (VLA) models and Vision-Language Models (VLM) for robotic manipulation with the SO-100 robot. Three progressive projects demonstrating cloth manipulation, kitchen tasks, and cultural task adaptation.
Hackathon Projects
- Shorts Folding - VLA models for deformable object manipulation
- Kitchen Tasks with Moondream - VLM-guided navigation, cleaning, and utensil manipulation
- Plantain Leaf Cleaning - Cultural adaptation for Colombian culinary traditions
What are Vision-Language-Action Models?
VLA models combine vision, language understanding, and robotic control in a single end-to-end system:
- Vision: Camera input for scene perception
- Language: Natural language task instructions
- Action: Direct robot control commands
This enables robots to understand tasks from human descriptions and execute them using visual feedback.
Hackathon 1: Shorts Folding
Challenge: Fold shorts using VLA models - a classic deformable object manipulation task.
Approach:
- Imitation learning from teleoperated demonstrations
- Vision encoder + language instruction encoder → action policy
- Trained on SO-100 robot kinematics
Key Skills: Cloth state perception, grasp point detection, sequential folding actions
Hackathon 2: Kitchen Manipulation with Moondream
Moondream VLM is a lightweight vision-language model providing scene understanding and spatial reasoning.
Tasks Completed:
Navigation
Navigate kitchen using natural language: “Go to the sink”, “Move to the countertop”
Countertop Cleaning
Visual inspection → dirt detection → wiping trajectory → execution
Fork & Utensil Manipulation
- Grasp forks from various orientations
- Precise placement and organization
- Sorting by type
Table Cleaning
Object detection → removal planning → surface wiping → verification
Hackathon 3: Plantain Leaf Cleaning
Cultural Context
In Colombian cuisine, plantain leaves wrap tamales and traditional dishes. They must be cleaned carefully before use.
Challenge
Adapt learned manipulation skills to clean large, flexible plantain leaves without tearing them.
Approach:
- Gentle edge grasping
- Adaptive force control
- VLM-guided inspection: “Is this area clean?”, “Where is the dirt?”
Impact: Demonstrates how robotic systems can adapt to culturally-specific tasks.
Technical Stack
Frameworks:
- LeRobot - Robot learning framework
- PyTorch - Deep learning
- Moondream - Vision-language model
Robot: SO-100 (6 DOF + gripper, ~500mm reach)
Training: Behavior cloning from human demonstrations
Key Learnings
✅ VLAs simplify pipelines - Direct vision-to-action mapping
✅ Language enables intuition - Natural task specification
✅ Pre-training is powerful - Vision-language models provide strong priors
✅ Cultural relevance matters - Robotics should serve diverse communities
Resources
- LeRobot: huggingface.co/lerobot
- Moondream: huggingface.co/vikhyatk/moondream2
- SO-100 Robot: Open-source manipulation platform
Keywords
LeRobot Vision-Language-Action VLA Moondream SO-100 Robotic Manipulation Imitation Learning Kitchen Robotics Cultural Robotics Deep Learning PyTorch
Vision, language, and action for intelligent manipulation 🤖✨
LeRobot hackathon overview - VLA models in action
SO-100 robot manipulation demonstration
Kitchen manipulation tasks with Moondream VLM
Plantain leaf cleaning - cultural task adaptation
Full demonstration video