Vision Language Action based E2E AD
- Vision-Language Model
- Reasoning-Action Align
- End-to-End Framework
- Vision Language Action Based E2E AD for Reasoning-Action Alignment
- Input: [Images, Text, Ego info]
- Output: [Text(Reasoning, Meta Actions), Action(Trajectory)]