Investment Thesis
December 19, 2024

A Path Toward Physical Intelligence: Seeing AI's Full Potential

Daniel Stenson explores how AI is about to make one of its most significant leaps yet - from the digital realm into the physical world through cognitive robots.

Daniel Stenson
Investor
A Path Toward Physical Intelligence: Seeing AI's Full Potential

In 2022, AI was still a future fantasy hyped by researchers, philosophers, and VCs; then ChatGPT emerged, and within weeks, it changed everyone's view of what AI could do.

GenAI's rise has been astonishing. In two years, GenAI has massively boosted productivity worldwide. The use cases are vast; from automating entire marketing campaigns to enabling anyone to create fully functioning web apps. 

But these digital superpowers have yet to translate into the physical world. Companies and consumers doing manual labor have barely tapped into AI's potential for efficiency, cost savings, and knowledge gains. What if AI in physical form could perform tasks in the real world? What impact would that have on the global economy? 

We believe there is an upcoming ChatGPT movement for the physical world, taking the form of cognitive robots, where we will soon see thousands of these in our industries and homes. Cognitive robots could solve many problems but ultimately help the global economy prosper as we face a shortage of staff in multiple sectors and an increasingly aging workforce.

Cognitive robots, an overview:

If you don't have time to read the entire thesis, here are the high-level takeaways,

  • Cognitive robots, robots that use AI and vision to autonomously make decisions, are the next big thing in robotics.
  • Robotics is at an inflection point, driven by progress in LLMs and robotics foundation models. The AI systems developed over the years by companies like 1X, Tesla, and Figure are now available to startups and founders worldwide.
  • We are still in the data collection phase: Robotics foundation models lack the vast amounts of openly available training data seen with LLMs. At this stage, the most critical task for robotics companies is collecting data to improve the underlying AI.
  • Specialized cognitive robotics companies present a unique opportunity. These companies can challenge humanoid players by solving a few use cases exceptionally well while collecting data to expand and improve their applications. Additionally, specialized robots often do not require the same expensive components as humanoids, enabling easier market entry with strong unit economics.

The Rise of Cognitive Robotics

It helps to examine the evolution of robotics to understand why these cognitive robots are starting to enter homes and industries.

Wave 1: When Industrial robotics first appeared: The first wave emerged in the 1980s, mainly in the electronics and automotive industries. When you imagine robots working along a car factory conveyor belt, you're picturing this era: robotic arms that lift and manipulate objects like Articulated and Delta robots. These robots are precise, lifting and manipulating objects with exact timing and position. However, their robust construction makes them expensive for both producers and buyers.


Wave 2: Autonomous mobility from Amazon and Alibaba:
The second wave began around 2010, focusing on autonomous mobility. This wave started with the growing popularity of online shopping, and e-commerce giants like Amazon and Alibaba made significant investments in autonomous robots to keep up with order fulfillment. This type of robot was among the first to make it into the home through vacuum cleaners and lawn mower robots. 

These robots relied on algorithms for movement, often using walls or guidelines to navigate. They provided great value for businesses and consumers and were the first type of robot to rely heavily on sensors. However, they were not cognitive and still depended on deterministic approaches.


Wave 3: The humanoid era, combining autonomous movement and manipulation:
The third wave emerged around 2020. It built upon the progress made in previous waves, automating picking and movement, but this time, it attempted to combine the two. This gave rise to humanoid robots.
Humanoids had long been a fantasy in science fiction books and movies, but with the progress made in autonomous vehicles and deep learning, there was now a clear path toward turning these into a reality. 

This wave began before we had seen the impact of LLMs and VLMs. At this time, the companies attempting to build humanoids had to build everything from scratch: the end-to-end AI systems, the workings of the bipedal mechanisms, customized actuators, and frames. 

Despite the difficulties, these early efforts proved what was possible and inspired the future wave of robotics.


Wave 4 - Specialized Cognitive Robots (Now):
Today, we are witnessing the fourth wave, cognitive robots. Just as GenAI applications were initially limited to a few deep-research labs, advances in robotics have created a similar tipping point. Developments like the release of Mobile ALOHA, better and more affordable foundation models for robotics (π0, AutoRT, GR00T), and affordable 3D printing techniques have lowered barriers, enabling an explosion of new robotics companies.

Unlike language models, which train on massive internet datasets, robotics lack equivalent training data. We believe startups focused on targeted use cases can address this by capturing domain-specific data while delivering value to their customers. This dual focus allows them to generate revenue while building the datasets necessary for advancing cognitive robotics. 

Ultimately, these developments will redefine industries and set the stage for the next wave of robotics.

Tailwinds and why the time is now

1) Robotics foundation models are now accessible. Over the past few years, players like Google have led this charge. Google has focused on its Robotic Transformer models, culminating in AutoRT. This model builds on the previous models, RT-1 and RT-2, but with the addition of a VLM and an LLM. This addition enables robots to perform chain-of-thought reasoning and perform tasks not previously trained.

Meanwhile, Physical Intelligence recently introduced its first robotic foundation model, π0, demonstrating impressive results across various tasks. Although the exact details of the architecture were not specified, π0 introduces a notable advancement: flow matching. Unlike models that output discrete language tokens, flow matching enables π0 to generate continuous motor commands. This capability is crucial for dexterous tasks and for rapidly adapting to unexpected changes in the environment.

Other tech giants are also releasing and researching robotics foundation models. Nvidia announced its robotics foundation model project, Gr00T, and Meta FAIR has continued its open-source work in AI by releasing several papers on robotics foundation models.

As these models scale and train on new data, their capabilities will only improve. Nonetheless, the accessibility of these foundation models enables founders to build cognitive robots previously confined to a few well-funded companies (read Figure, Tesla, 1x).

2) We now have a way to teach robotic systems to automate complex end-to-end tasks: The technical advancements extend beyond foundation models. Imitation learning, a method for teaching robots through human demonstrations, has significantly improved in recent years. 

In 2023, the ALOHA framework demonstrated the ability to train bi-manual robotic systems (e.g., robots with two arms) to perform end-to-end tasks with as few as 50 real-world demonstrations. The paper showed how the framework enabled robots to automate intricate tasks such as putting on shoes and slotting batteries.

Building on this success, researchers introduced Mobile ALOHA, which adds mobility to the system, in this case, via a wheeled base. This upgrade allowed robots to take on tasks like rinsing pans, opening elevators, and cooking, again with just 50 demonstrations. 

This is groundbreaking because the papers released all their code and hardware, allowing anyone with the resources to replicate their results.

3) Over $100 billion has been invested in robotics startups in the past decade

In addition to technological advancements making cognitive robotics more accessible for founders, VC funding in the category is experiencing significant growth. According to Crunchbase, over $100 billion has been invested in robotics startups over the past decade. After peaking in 2021, robotics funding is rising again, projected to reach $12 billion in 2024, a nearly 40% increase from 2023.

Market Landscape

TLDR: We see the robotics landscape changing in the following way:

  1. The infrastructure layer is due for an upgrade. Apart from the appearance of the foundation models, not much has happened here in the last decade. However, much is needed to further excel in the next wave of robotics, from specialized chips to simulation platforms powered by GenAI.
  2. Cognitive robots are set to disrupt legacy vertical robot systems: Most vertical robots are based on old-school rule-based methods. Cognitive robots can focus on several verticals simultaneously while offering cheaper and more reliable performance.
  3. The best applications for Cognitive robots are those where valuable data can be collected and where there is an actual demand. Some of the applications that fulfill this include retail, delivery, and cleaning.

The infrastructure layer in robotics is large and has remained fairly unchanged over the last decade. We see the main change as the appearance of foundation model providers, from Physical Intelligence's π0 to Yaak's tools for unifying multimodal sensor data. Apart from this, we believe there will be a greater need for new and better infrastructure with the new wave of robotics, spanning everything from better simulation platforms to chip designs that are more tailored for robotics. On top of that, we believe there are many use cases for GenAI in the infrastructure layer, especially in hardware design, helping decrease the time it takes to go from idea to prototype, similar to what Cursor and Lovable have done for software.

Cognitive robots are disrupting applications: As shown in the applications layer, robotics companies have solved many tasks, ranging from underwater operations to warehouse automation. However, most companies still need to perform their tasks via rule-based approaches, limiting their ability to automate tasks end-to-end. We will likely see Cognitive robots disrupt these verticals.

Cognitive robots should focus on applications where the data collected can be transferred to others

An ideal application for cognitive robots to enter has several key characteristics. The most important one is that the data collected and skills learned can be applied across different sectors. For example, learning a task like picking up an item can be used both in the kitchen (for cooking) and in the bedroom (for making the bed). Other traits include verticals with challenging unit economics, low margins, price-sensitive customers, regulatory constraints, and staff shortages. Here are some examples:

  • Delivery: We have seen the success of robotics companies in last-mile delivery, particularly in food delivery, such as Nuro and Starship. These companies are experiencing strong market demand due to macroeconomic factors such as inflation and employee regulations. We believe more intelligent robots could automate a larger portion of the delivery process, from packaging to final drop-off.
  • Retail and groceries: While the first wave of robotics brought automation to warehouses and factories, it has yet to significantly impact in-store operations. Cognitive robots can handle deliveries, restock shelves, and assist customers in person. We expect to see more of these robots as retail chains struggle with profitability.
  • Cooking: As inflation impacts global markets and restaurants face profitability challenges, this vertical shows significant potential. Cooking requires dexterity, involving precise movements, tool handling, and ingredient use. The skills learned can transfer to various industrial and domestic tasks.
  • Laundry: A universally disliked task, it is also a daily challenge for hotel chains, hospitals, factories, and more. Although industrial laundry machines exist, they still require manual operation for loading, drying, and folding.
  • Dishwashing: Kitchens worldwide use industrial dishwashers, which, like laundry machines, still require manual operation. Cognitive robots could automate even more of this process.
  • Cleaning: This vertical is challenging due to the varied nature of cleaning tasks. Tasks range from dusting to cleaning windows or toilets. However, it is a task that all companies and consumers either outsource or perform themselves. On the commercial side, companies also struggle with low levels of differentiation and shortage of staff.


These are just a few promising applications and verticals for cognitive robotics companies, particularly for collecting data to train generalizable robotics foundation models.

We believe it will be some time for consumer applications before cognitive robots are present in homes. We are currently in a data collection phase to enhance cognitive robot capabilities. Privacy concerns are a significant hurdle for home use, and even aside from this, the high cost of cognitive robots, especially humanoids, makes them unaffordable for most consumers right now.

Specialized cognitive robots provide high value per dollar spent

Specialized, non-humanoid cognitive robots have an interesting way to challenge humanoids to reach full autonomy first. Since we’re still in the data collection phase, having a robot that 1) is reasonably priced and 2) can solve a task in a way worth paying for allows companies of specialized cognitive robots to get plenty of robots out quickly to market and, in turn, collect data faster.

This advantage is clear when comparing the cost to produce and the number of tasks they can automate with humanoids and old-gen robotic systems (single use-case). When doing so, specialized cognitive robots come out on top, delivering the highest value per dollar spent, an important “metric” to gauge potential demand.

But let’s explain the difference in costs. Humanoids are expensive because they include multiple arms and legs, each costing around $10,000. In contrast, specialized cognitive robots often have just one arm and wheels instead of legs, making them up to ten times cheaper to produce.

At the same time, humanoids and specialized cognitive robots have similar capabilities. This means buyers will likely pay five to ten times more for a humanoid with equivalent proficiency in a given use case.

When instead comparing specialized cognitive robots to older generations of robotics systems from companies like ABB and Universal Robots, cognitive robots offer greater versatility at a lower cost. Older systems, typically used for high-precision tasks in a single use case, can cost upwards of $100,000.

So why the big cost difference? The short answer lies in the fact that cognitive robots rely on vision, eliminating the need for expensive materials or complex component configurations to achieve precision. Their vision enables them to overcome challenges like backlash or misplacements by autonomously adjusting to the observed state, using cameras or LiDAR.

As a result, specialized cognitive robots provide the highest value per dollar spent, outperforming both humanoids and older-generation robotics systems.

Future predictions

It’s still too early to know how the cognitive robotics market will develop. Will it be concentrated to a few companies generating tens of billions of dollars in revenue? Or fragmented with plenty of companies with hundreds of millions or billions of dollars in revenue? We believe it will be more like the latter.

Let’s compare two of the biggest markets today: automotives and smartphones. In terms of sheer size, with billions of users, and similar in terms of an underlying technology powered with software and hardware, these two markets could reflect what the robotics market might look like down the line. What is interesting about these two markets is that they essentially have opposite market dynamics; the automotive market is fragmented, while the smartphone market is led by a few big players.

Cognitive robots are complex systems with thousands of parts, produced in-house or outsourced from various suppliers. This leads us to believe that the supply chain for cognitive robots could resemble the automotive industry, which, unlike the smartphone industry, involves thousands of parts and lacks standardization. 

Relying on different suppliers makes it hard to scale production and limits the ability to capture global market share, allowing regional competitors to emerge. This is how the automotive industry developed. In contrast, the smartphone market grew quickly because of a less complex, already established supply chain and a global user base, making it global from the start. This is the opposite of how robotics operates today, particularly in the consumer market.

Like compact cars, heavy-duty trucks, and sports cars, cognitive robots will come in different types, each designed for a specific use-case. For example, robots made for moving goods will likely have wheels instead of legs, because speed is important. Price will also vary depending on the task. Just like compact cars cost less than sports cars, robots will have different prices depending on the buyer’s budget.

This inherent fragmentation limits the potential for a one-size-fits-all solution.

In conclusion, while the robotics market will take years to evolve, its trajectory is more likely to mirror the automotive industry than smartphones. The high complexity, diverse use cases, and lack of standardization point to a fragmented future, where specialized robotics companies capture portions of the market rather than a few players dominating it.

Challenges

There are plenty of hurdles for robotics companies to overcome. Elon Musk has said on multiple occasions that we will have fully self-driving cars "next year," and we still don't unless you live in San Fransico. Similarly, the robotics industry will face the same challenges in building autonomous end-to-end systems that self-driving cars have faced – namely, technological and regulatory hurdles.

Self-driving car systems have improved, but edge cases are still difficult to capture in training data. We have models that work in 99% of scenarios, but the last 1% is critical for regulatory approval, as it can mean the difference between life and death. This leads us to believe cognitive robotics companies will face similar hurdles as robots make an appearance in our homes, stores, offices, and industries.

On the technological side, a major problem is that we don't have the necessary data to build general-purpose robots. However, recent advances in imitation learning have shown a way to overcome this by collecting data via teleoperation and then training AI systems to learn task after task.

Market need, market-entry: In addition to challenges like gathering data for edge cases and navigating regulatory hurdles, a significant obstacle is the lack of demand. This is especially true for humanoid robots. We believe the lack of demand is straightforward: the functionality of these robots is not good enough for their current price point. Why would anyone pay over $100,000 for a humanoid to clean or do laundry if it performs on par with a toddler?

But we know countless technologies have faced and overcome similar challenges, and we observe a few approaches currently being tested. The first approach, used by most cognitive robotics companies, involves targeting the enterprise segment with a niche, single or dual-use case, collecting real-world data while generating revenue, and then expanding to other use cases. The second approach targets a wealthy, small customer base, where the high price tag justifies limited sales while collecting data, eventually moving downstream as costs decrease with improved supply and manufacturing chains (see the Tesla Roadster). The challenge with this approach, which many humanoid robotics companies are adopting, is that data remains critical for improving the robots' capabilities across various tasks, and with only a few robots sold, the amount of data collected may be significantly lower than in the first approach.

Calling for startups

We believe cognitive robotics is at an inflection point and that the key elements for its success are now in place. We believe the true potential of AI will only be realized when it takes physical form.

We've already backed a few founders in this space and are eager to support more visionary builders. If you're working on something groundbreaking, reach out to me directly at daniel@byfounders.vc.

More about the author(s)
Daniel Stenson
Investor

Daniel is on the investment team.

More about the author(s)
No items found.