name: spatial-intelligence-li description: Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.

Fei-Fei Li: Spatial Intelligence Knowledge Base

Core Thesis

AGI will not be complete without spatial intelligence—the ability to understand, generate, reason about, and interact with 3D worlds. This represents the hardest and most important unsolved problem in AI.

Key Concepts

Spatial Intelligence

The multifaceted ability encompassing:

Understanding 3D world structure - Perceiving depth, objects, relationships
Generating 3D worlds - Creating novel 3D environments and content
Reasoning about 3D space - Making inferences about spatial relationships
Navigating and interacting - Moving through and manipulating physical world
Communicating spatially - Describing and understanding spatial language

Why Spatial Intelligence is Fundamental

Use evolutionary evidence to explain importance:

Vision took ~540 million years to evolve (Cambrian explosion)
Language took ~300-500 million years
What evolution spent longest developing is likely most fundamental to intelligence
Visual/spatial intelligence predates and underlies linguistic intelligence

World Models

AI models that capture 3D structure and spatial intelligence, going beyond:

Flat pixels (2D image understanding)
Language tokens (text-only reasoning)
To represent true physical reality

Generation-Reconstruction Continuum

3D AI applications exist on a spectrum:

Pure Generation <-------------------------> Pure Reconstruction
   |                                                    |
Gaming, Metaverse,                              Robotics,
Creative content                           Physical manipulation
   |                                                    |
   +------ Most applications fall somewhere between ----+

World models must serve the entire continuum.

The Data-Algorithm-Compute Trinity

Major AI breakthroughs require convergence of three elements:

Element	ImageNet Revolution Example
Data	ImageNet dataset (14M+ labeled images)
Algorithm	Convolutional Neural Networks (CNNs, published 1980s)
Compute	GPUs enabling parallel processing

Key insight: CNNs existed for decades before working. The algorithm was ready—data and compute were not.

Technical References

Foundational Technologies for Spatial AI

Technology	Description	Use Case
NeRF (Neural Radiance Fields)	Neural network representing 3D scenes as continuous volumetric functions	Novel view synthesis, 3D reconstruction
Gaussian Splatting	Point-based 3D representation using Gaussian primitives	Real-time rendering, efficient 3D capture
Differentiable Rendering	Rendering that allows gradient backpropagation	Learning 3D from 2D supervision
Pulsar	Point-based neural rendering (Christoph Lassner)	Efficient 3D scene representation

Discriminative vs Generative Models

Approach	Function	3D Application
Discriminative	Classify, recognize patterns	Object detection, scene understanding
Generative	Create new content	3D world generation, content creation

The tension between these represents different approaches to spatial AI.

AI Research Strategy

Problem Selection Framework

When choosing PhD research or startup ideas:

Avoid collision courses - Don't compete where industry scaling advantages dominate
Find compute-immune problems - Focus on areas where throwing more compute alone won't solve it
Embrace delusional problems - If a problem seems impossibly hard, that's often a good sign
Follow curiosity - Burning curiosity sustains multi-year research efforts

The Delusional Problem Mindset

"My entire career is going after problems that are just so hard, 
bordering delusional... if they were easy, someone would have solved them."

Apply this framework:

Identify problems others dismiss as impossible
Ask: "What would need to be true for this to work?"
Find the leverage points (data, algorithm, or compute gaps)
Commit fully despite uncertainty

AI Entrepreneurship Advice

Ground Zero Mindset

When starting a company:

Forget past achievements
Forget what others think of you
Hunker down and build
Focus entirely on building from scratch

Open Source Strategy

Decide based on business model alignment:

Scenario	Recommendation
Research acceleration needed	Open source to enable global collaboration
Network effects valuable	Open source to build ecosystem
Proprietary advantage critical	Selective openness
Community contribution beneficial	Open source with contribution model

ImageNet example: Open sourcing enabled AlexNet's emergence and accelerated the entire field.

Hiring for AI Teams

Primary trait to seek: Intellectual fearlessness

Characteristics:

Courage to embrace impossibly hard problems
Willingness to go all-in on uncertain bets
Comfort with ambiguity and potential failure
Driven by curiosity over career optimization

Graduate School Decision Framework

Pursue a PhD only if:

Driven by burning curiosity about a specific problem
Cannot imagine doing anything else
The research question obsesses you
Willing to accept opportunity cost

Do not pursue if:

Primarily for credentials or career advancement
Uncertain about research interests
Industry experience would serve goals better

Handling Imposter Syndrome

When feeling like a minority or outsider:

"Don't over-index on feeling like a minority... 
that moment will pass if you focus on the work."

Apply:

Acknowledge the feeling without dwelling
Redirect attention to the problem at hand
Let results speak over time
Build community with fellow researchers

Historical Context

The ImageNet Story (2007-2012)

Timeline for understanding AI breakthrough patterns:

Year	Event
2007	ImageNet conceived at Princeton
2009	Initial CVPR poster publication
2009-2011	Open sourced, challenge created, ~30% error rate
2012	AlexNet achieves breakthrough, error drops dramatically

Key decisions that enabled success:

Open sourcing from the beginning
Creating competitive challenge to attract talent
Patience through years of modest results
Betting on data-driven paradigm shift

AI Timeline Perspective

Era	Focus
1956	Dartmouth Conference - "machines that think"
1980s	CNNs published (Yann LeCun et al.)
2009	ImageNet dataset
2012	Deep learning revolution (AlexNet)
2015	Image captioning breakthroughs
2020s	Spatial intelligence frontier

Applying These Insights

When Explaining Spatial Intelligence

Start with evolutionary argument (540M years for vision)
Distinguish from language-only AI
Describe the generation-reconstruction continuum
Reference concrete technologies (NeRF, Gaussian Splatting)

When Advising AI Researchers

Assess their curiosity level and problem obsession
Guide toward compute-immune research areas
Encourage intellectual fearlessness
Emphasize the data-algorithm-compute trinity

When Discussing AI Startups

Apply ground zero mindset
Evaluate open source strategy for their context
Emphasize hiring for intellectual fearlessness
Frame problems in terms of the trinity convergence

Key Organizations and Resources

Resource	Description
World Labs (worldlabs.ai)	Fei-Fei Li's spatial intelligence startup
Stanford HAI	Human-Centered AI Institute
ImageNet (imagenet.org)	Original dataset
CVPR, ICCV	Primary computer vision conferences
"The Worlds I See"	Fei-Fei Li's book

spatial-intelligence-li

SKILL.md