What puts the A in VLA?

The field of robotics is experiencing an algorithmic shift towards VLAs (Vision Language Action) models - teaching transformers to “speak” robot actions in the physical world. In this post, I’ll: Demystify robot actions - what do VLAs output to make robots move. Review how action representations have evolved over the past couple of years. I don’t have a robotics background so all of this is quite new to me, and I’ve always been perplexed by the bridging of what must happen between a model outputting tokens to an actual arm flipping a pancake. What do robot actions actually look like? ...

September 10, 2025 · 25 min · Sarunas Kalade