Daniel Stenson explores how AI is about to make one of its most significant leaps yet - from the digital realm into the physical world through cognitive robots.
In 2022, AI was still a future fantasy hyped by researchers, philosophers, and VCs; then ChatGPT emerged, and within weeks, it changed everyone's view of what AI could do.
GenAI's rise has been astonishing. In two years, GenAI has massively boosted productivity worldwide. The use cases are vast; from automating entire marketing campaigns to enabling anyone to create fully functioning web apps.
But these digital superpowers have yet to translate into the physical world. Companies and consumers doing manual labor have barely tapped into AI's potential for efficiency, cost savings, and knowledge gains. What if AI in physical form could perform tasks in the real world? What impact would that have on the global economy?
We believe there is an upcoming ChatGPT movement for the physical world, taking the form of cognitive robots, where we will soon see thousands of these in our industries and homes. Cognitive robots could solve many problems but ultimately help the global economy prosper as we face a shortage of staff in multiple sectors and an increasingly aging workforce.
If you don't have time to read the entire thesis, here are the high-level takeaways,
It helps to examine the evolution of robotics to understand why these cognitive robots are starting to enter homes and industries.
Wave 1: When Industrial robotics first appeared: The first wave emerged in the 1980s, mainly in the electronics and automotive industries. When you imagine robots working along a car factory conveyor belt, you're picturing this era: robotic arms that lift and manipulate objects like Articulated and Delta robots. These robots are precise, lifting and manipulating objects with exact timing and position. However, their robust construction makes them expensive for both producers and buyers.
Wave 2: Autonomous mobility from Amazon and Alibaba: The second wave began around 2010, focusing on autonomous mobility. This wave started with the growing popularity of online shopping, and e-commerce giants like Amazon and Alibaba made significant investments in autonomous robots to keep up with order fulfillment. This type of robot was among the first to make it into the home through vacuum cleaners and lawn mower robots.
These robots relied on algorithms for movement, often using walls or guidelines to navigate. They provided great value for businesses and consumers and were the first type of robot to rely heavily on sensors. However, they were not cognitive and still depended on deterministic approaches.
Wave 3: The humanoid era, combining autonomous movement and manipulation: The third wave emerged around 2020. It built upon the progress made in previous waves, automating picking and movement, but this time, it attempted to combine the two. This gave rise to humanoid robots.
Humanoids had long been a fantasy in science fiction books and movies, but with the progress made in autonomous vehicles and deep learning, there was now a clear path toward turning these into a reality.
This wave began before we had seen the impact of LLMs and VLMs. At this time, the companies attempting to build humanoids had to build everything from scratch: the end-to-end AI systems, the workings of the bipedal mechanisms, customized actuators, and frames.
Despite the difficulties, these early efforts proved what was possible and inspired the future wave of robotics.
Wave 4 - Specialized Cognitive Robots (Now): Today, we are witnessing the fourth wave, cognitive robots. Just as GenAI applications were initially limited to a few deep-research labs, advances in robotics have created a similar tipping point. Developments like the release of Mobile ALOHA, better and more affordable foundation models for robotics (π0, AutoRT, GR00T), and affordable 3D printing techniques have lowered barriers, enabling an explosion of new robotics companies.
Unlike language models, which train on massive internet datasets, robotics lack equivalent training data. We believe startups focused on targeted use cases can address this by capturing domain-specific data while delivering value to their customers. This dual focus allows them to generate revenue while building the datasets necessary for advancing cognitive robotics.
Ultimately, these developments will redefine industries and set the stage for the next wave of robotics.
1) Robotics foundation models are now accessible. Over the past few years, players like Google have led this charge. Google has focused on its Robotic Transformer models, culminating in AutoRT. This model builds on the previous models, RT-1 and RT-2, but with the addition of a VLM and an LLM. This addition enables robots to perform chain-of-thought reasoning and perform tasks not previously trained.
Meanwhile, Physical Intelligence recently introduced its first robotic foundation model, π0, demonstrating impressive results across various tasks. Although the exact details of the architecture were not specified, π0 introduces a notable advancement: flow matching. Unlike models that output discrete language tokens, flow matching enables π0 to generate continuous motor commands. This capability is crucial for dexterous tasks and for rapidly adapting to unexpected changes in the environment.
Other tech giants are also releasing and researching robotics foundation models. Nvidia announced its robotics foundation model project, Gr00T, and Meta FAIR has continued its open-source work in AI by releasing several papers on robotics foundation models.
As these models scale and train on new data, their capabilities will only improve. Nonetheless, the accessibility of these foundation models enables founders to build cognitive robots previously confined to a few well-funded companies (read Figure, Tesla, 1x).
2) We now have a way to teach robotic systems to automate complex end-to-end tasks: The technical advancements extend beyond foundation models. Imitation learning, a method for teaching robots through human demonstrations, has significantly improved in recent years.
In 2023, the ALOHA framework demonstrated the ability to train bi-manual robotic systems (e.g., robots with two arms) to perform end-to-end tasks with as few as 50 real-world demonstrations. The paper showed how the framework enabled robots to automate intricate tasks such as putting on shoes and slotting batteries.
Building on this success, researchers introduced Mobile ALOHA, which adds mobility to the system, in this case, via a wheeled base. This upgrade allowed robots to take on tasks like rinsing pans, opening elevators, and cooking, again with just 50 demonstrations.
This is groundbreaking because the papers released all their code and hardware, allowing anyone with the resources to replicate their results.
3) Over $100 billion has been invested in robotics startups in the past decade
In addition to technological advancements making cognitive robotics more accessible for founders, VC funding in the category is experiencing significant growth. According to Crunchbase, over $100 billion has been invested in robotics startups over the past decade. After peaking in 2021, robotics funding is rising again, projected to reach $12 billion in 2024, a nearly 40% increase from 2023.
TLDR: We see the robotics landscape changing in the following way:
The infrastructure layer in robotics is large and has remained fairly unchanged over the last decade. We see the main change as the appearance of foundation model providers, from Physical Intelligence's π0 to Yaak's tools for unifying multimodal sensor data. Apart from this, we believe there will be a greater need for new and better infrastructure with the new wave of robotics, spanning everything from better simulation platforms to chip designs that are more tailored for robotics. On top of that, we believe there are many use cases for GenAI in the infrastructure layer, especially in hardware design, helping decrease the time it takes to go from idea to prototype, similar to what Cursor and Lovable have done for software.
Cognitive robots are disrupting applications: As shown in the applications layer, robotics companies have solved many tasks, ranging from underwater operations to warehouse automation. However, most companies still need to perform their tasks via rule-based approaches, limiting their ability to automate tasks end-to-end. We will likely see Cognitive robots disrupt these verticals.
An ideal application for cognitive robots to enter has several key characteristics. The most important one is that the data collected and skills learned can be applied across different sectors. For example, learning a task like picking up an item can be used both in the kitchen (for cooking) and in the bedroom (for making the bed). Other traits include verticals with challenging unit economics, low margins, price-sensitive customers, regulatory constraints, and staff shortages. Here are some examples:
These are just a few promising applications and verticals for cognitive robotics companies, particularly for collecting data to train generalizable robotics foundation models.
We believe it will be some time for consumer applications before cognitive robots are present in homes. We are currently in a data collection phase to enhance cognitive robot capabilities. Privacy concerns are a significant hurdle for home use, and even aside from this, the high cost of cognitive robots, especially humanoids, makes them unaffordable for most consumers right now.
Specialized, non-humanoid cognitive robots have an interesting way to challenge humanoids to reach full autonomy first. Since we’re still in the data collection phase, having a robot that 1) is reasonably priced and 2) can solve a task in a way worth paying for allows companies of specialized cognitive robots to get plenty of robots out quickly to market and, in turn, collect data faster.
This advantage is clear when comparing the cost to produce and the number of tasks they can automate with humanoids and old-gen robotic systems (single use-case). When doing so, specialized cognitive robots come out on top, delivering the highest value per dollar spent, an important “metric” to gauge potential demand.
But let’s explain the difference in costs. Humanoids are expensive because they include multiple arms and legs, each costing around $10,000. In contrast, specialized cognitive robots often have just one arm and wheels instead of legs, making them up to ten times cheaper to produce.
At the same time, humanoids and specialized cognitive robots have similar capabilities. This means buyers will likely pay five to ten times more for a humanoid with equivalent proficiency in a given use case.
When instead comparing specialized cognitive robots to older generations of robotics systems from companies like ABB and Universal Robots, cognitive robots offer greater versatility at a lower cost. Older systems, typically used for high-precision tasks in a single use case, can cost upwards of $100,000.
So why the big cost difference? The short answer lies in the fact that cognitive robots rely on vision, eliminating the need for expensive materials or complex component configurations to achieve precision. Their vision enables them to overcome challenges like backlash or misplacements by autonomously adjusting to the observed state, using cameras or LiDAR.
As a result, specialized cognitive robots provide the highest value per dollar spent, outperforming both humanoids and older-generation robotics systems.
It’s still too early to know how the cognitive robotics market will develop. Will it be concentrated to a few companies generating tens of billions of dollars in revenue? Or fragmented with plenty of companies with hundreds of millions or billions of dollars in revenue? We believe it will be more like the latter.
Let’s compare two of the biggest markets today: automotives and smartphones. In terms of sheer size, with billions of users, and similar in terms of an underlying technology powered with software and hardware, these two markets could reflect what the robotics market might look like down the line. What is interesting about these two markets is that they essentially have opposite market dynamics; the automotive market is fragmented, while the smartphone market is led by a few big players.
Cognitive robots are complex systems with thousands of parts, produced in-house or outsourced from various suppliers. This leads us to believe that the supply chain for cognitive robots could resemble the automotive industry, which, unlike the smartphone industry, involves thousands of parts and lacks standardization.
Relying on different suppliers makes it hard to scale production and limits the ability to capture global market share, allowing regional competitors to emerge. This is how the automotive industry developed. In contrast, the smartphone market grew quickly because of a less complex, already established supply chain and a global user base, making it global from the start. This is the opposite of how robotics operates today, particularly in the consumer market.
Like compact cars, heavy-duty trucks, and sports cars, cognitive robots will come in different types, each designed for a specific use-case. For example, robots made for moving goods will likely have wheels instead of legs, because speed is important. Price will also vary depending on the task. Just like compact cars cost less than sports cars, robots will have different prices depending on the buyer’s budget.
This inherent fragmentation limits the potential for a one-size-fits-all solution.
In conclusion, while the robotics market will take years to evolve, its trajectory is more likely to mirror the automotive industry than smartphones. The high complexity, diverse use cases, and lack of standardization point to a fragmented future, where specialized robotics companies capture portions of the market rather than a few players dominating it.
There are plenty of hurdles for robotics companies to overcome. Elon Musk has said on multiple occasions that we will have fully self-driving cars "next year," and we still don't unless you live in San Fransico. Similarly, the robotics industry will face the same challenges in building autonomous end-to-end systems that self-driving cars have faced – namely, technological and regulatory hurdles.
Self-driving car systems have improved, but edge cases are still difficult to capture in training data. We have models that work in 99% of scenarios, but the last 1% is critical for regulatory approval, as it can mean the difference between life and death. This leads us to believe cognitive robotics companies will face similar hurdles as robots make an appearance in our homes, stores, offices, and industries.
On the technological side, a major problem is that we don't have the necessary data to build general-purpose robots. However, recent advances in imitation learning have shown a way to overcome this by collecting data via teleoperation and then training AI systems to learn task after task.
Market need, market-entry: In addition to challenges like gathering data for edge cases and navigating regulatory hurdles, a significant obstacle is the lack of demand. This is especially true for humanoid robots. We believe the lack of demand is straightforward: the functionality of these robots is not good enough for their current price point. Why would anyone pay over $100,000 for a humanoid to clean or do laundry if it performs on par with a toddler?
But we know countless technologies have faced and overcome similar challenges, and we observe a few approaches currently being tested. The first approach, used by most cognitive robotics companies, involves targeting the enterprise segment with a niche, single or dual-use case, collecting real-world data while generating revenue, and then expanding to other use cases. The second approach targets a wealthy, small customer base, where the high price tag justifies limited sales while collecting data, eventually moving downstream as costs decrease with improved supply and manufacturing chains (see the Tesla Roadster). The challenge with this approach, which many humanoid robotics companies are adopting, is that data remains critical for improving the robots' capabilities across various tasks, and with only a few robots sold, the amount of data collected may be significantly lower than in the first approach.
We believe cognitive robotics is at an inflection point and that the key elements for its success are now in place. We believe the true potential of AI will only be realized when it takes physical form.
We've already backed a few founders in this space and are eager to support more visionary builders. If you're working on something groundbreaking, reach out to me directly at daniel@byfounders.vc.