spatial-intelligence-li
Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.
SKILL.md
| Name | spatial-intelligence-li |
| Description | Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs. |
name: spatial-intelligence-li description: Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.
Fei-Fei Li: Spatial Intelligence Knowledge Base
Core Thesis
AGI will not be complete without spatial intelligence—the ability to understand, generate, reason about, and interact with 3D worlds. This represents the hardest and most important unsolved problem in AI.
Key Concepts
Spatial Intelligence
The multifaceted ability encompassing:
- Understanding 3D world structure - Perceiving depth, objects, relationships
- Generating 3D worlds - Creating novel 3D environments and content
- Reasoning about 3D space - Making inferences about spatial relationships
- Navigating and interacting - Moving through and manipulating physical world
- Communicating spatially - Describing and understanding spatial language
Why Spatial Intelligence is Fundamental
Use evolutionary evidence to explain importance:
- Vision took ~540 million years to evolve (Cambrian explosion)
- Language took ~300-500 million years
- What evolution spent longest developing is likely most fundamental to intelligence
- Visual/spatial intelligence predates and underlies linguistic intelligence
World Models
AI models that capture 3D structure and spatial intelligence, going beyond:
- Flat pixels (2D image understanding)
- Language tokens (text-only reasoning)
- To represent true physical reality
Generation-Reconstruction Continuum
3D AI applications exist on a spectrum:
Pure Generation <-------------------------> Pure Reconstruction
| |
Gaming, Metaverse, Robotics,
Creative content Physical manipulation
| |
+------ Most applications fall somewhere between ----+
World models must serve the entire continuum.
The Data-Algorithm-Compute Trinity
Major AI breakthroughs require convergence of three elements:
| Element | ImageNet Revolution Example |
|---|---|
| Data | ImageNet dataset (14M+ labeled images) |
| Algorithm | Convolutional Neural Networks (CNNs, published 1980s) |
| Compute | GPUs enabling parallel processing |
Key insight: CNNs existed for decades before working. The algorithm was ready—data and compute were not.
Technical References
Foundational Technologies for Spatial AI
| Technology | Description | Use Case |
|---|---|---|
| NeRF (Neural Radiance Fields) | Neural network representing 3D scenes as continuous volumetric functions | Novel view synthesis, 3D reconstruction |
| Gaussian Splatting | Point-based 3D representation using Gaussian primitives | Real-time rendering, efficient 3D capture |
| Differentiable Rendering | Rendering that allows gradient backpropagation | Learning 3D from 2D supervision |
| Pulsar | Point-based neural rendering (Christoph Lassner) | Efficient 3D scene representation |
Discriminative vs Generative Models
| Approach | Function | 3D Application |
|---|---|---|
| Discriminative | Classify, recognize patterns | Object detection, scene understanding |
| Generative | Create new content | 3D world generation, content creation |
The tension between these represents different approaches to spatial AI.
AI Research Strategy
Problem Selection Framework
When choosing PhD research or startup ideas:
- Avoid collision courses - Don't compete where industry scaling advantages dominate
- Find compute-immune problems - Focus on areas where throwing more compute alone won't solve it
- Embrace delusional problems - If a problem seems impossibly hard, that's often a good sign
- Follow curiosity - Burning curiosity sustains multi-year research efforts
The Delusional Problem Mindset
"My entire career is going after problems that are just so hard,
bordering delusional... if they were easy, someone would have solved them."
Apply this framework:
- Identify problems others dismiss as impossible
- Ask: "What would need to be true for this to work?"
- Find the leverage points (data, algorithm, or compute gaps)
- Commit fully despite uncertainty
AI Entrepreneurship Advice
Ground Zero Mindset
When starting a company:
- Forget past achievements
- Forget what others think of you
- Hunker down and build
- Focus entirely on building from scratch
Open Source Strategy
Decide based on business model alignment:
| Scenario | Recommendation |
|---|---|
| Research acceleration needed | Open source to enable global collaboration |
| Network effects valuable | Open source to build ecosystem |
| Proprietary advantage critical | Selective openness |
| Community contribution beneficial | Open source with contribution model |
ImageNet example: Open sourcing enabled AlexNet's emergence and accelerated the entire field.
Hiring for AI Teams
Primary trait to seek: Intellectual fearlessness
Characteristics:
- Courage to embrace impossibly hard problems
- Willingness to go all-in on uncertain bets
- Comfort with ambiguity and potential failure
- Driven by curiosity over career optimization
Graduate School Decision Framework
Pursue a PhD only if:
- Driven by burning curiosity about a specific problem
- Cannot imagine doing anything else
- The research question obsesses you
- Willing to accept opportunity cost
Do not pursue if:
- Primarily for credentials or career advancement
- Uncertain about research interests
- Industry experience would serve goals better
Handling Imposter Syndrome
When feeling like a minority or outsider:
"Don't over-index on feeling like a minority...
that moment will pass if you focus on the work."
Apply:
- Acknowledge the feeling without dwelling
- Redirect attention to the problem at hand
- Let results speak over time
- Build community with fellow researchers
Historical Context
The ImageNet Story (2007-2012)
Timeline for understanding AI breakthrough patterns:
| Year | Event |
|---|---|
| 2007 | ImageNet conceived at Princeton |
| 2009 | Initial CVPR poster publication |
| 2009-2011 | Open sourced, challenge created, ~30% error rate |
| 2012 | AlexNet achieves breakthrough, error drops dramatically |
Key decisions that enabled success:
- Open sourcing from the beginning
- Creating competitive challenge to attract talent
- Patience through years of modest results
- Betting on data-driven paradigm shift
AI Timeline Perspective
| Era | Focus |
|---|---|
| 1956 | Dartmouth Conference - "machines that think" |
| 1980s | CNNs published (Yann LeCun et al.) |
| 2009 | ImageNet dataset |
| 2012 | Deep learning revolution (AlexNet) |
| 2015 | Image captioning breakthroughs |
| 2020s | Spatial intelligence frontier |
Applying These Insights
When Explaining Spatial Intelligence
- Start with evolutionary argument (540M years for vision)
- Distinguish from language-only AI
- Describe the generation-reconstruction continuum
- Reference concrete technologies (NeRF, Gaussian Splatting)
When Advising AI Researchers
- Assess their curiosity level and problem obsession
- Guide toward compute-immune research areas
- Encourage intellectual fearlessness
- Emphasize the data-algorithm-compute trinity
When Discussing AI Startups
- Apply ground zero mindset
- Evaluate open source strategy for their context
- Emphasize hiring for intellectual fearlessness
- Frame problems in terms of the trinity convergence
Key Organizations and Resources
| Resource | Description |
|---|---|
| World Labs (worldlabs.ai) | Fei-Fei Li's spatial intelligence startup |
| Stanford HAI | Human-Centered AI Institute |
| ImageNet (imagenet.org) | Original dataset |
| CVPR, ICCV | Primary computer vision conferences |
| "The Worlds I See" | Fei-Fei Li's book |