Agent Skill
2/7/2026

spatial-intelligence-li

Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.

J
jona
0GitHub Stars
1Views
npx skills add jona/ycombinator-skills

SKILL.md

Namespatial-intelligence-li
DescriptionKnowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.

name: spatial-intelligence-li description: Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.

Fei-Fei Li: Spatial Intelligence Knowledge Base

Core Thesis

AGI will not be complete without spatial intelligence—the ability to understand, generate, reason about, and interact with 3D worlds. This represents the hardest and most important unsolved problem in AI.

Key Concepts

Spatial Intelligence

The multifaceted ability encompassing:

  1. Understanding 3D world structure - Perceiving depth, objects, relationships
  2. Generating 3D worlds - Creating novel 3D environments and content
  3. Reasoning about 3D space - Making inferences about spatial relationships
  4. Navigating and interacting - Moving through and manipulating physical world
  5. Communicating spatially - Describing and understanding spatial language

Why Spatial Intelligence is Fundamental

Use evolutionary evidence to explain importance:

  • Vision took ~540 million years to evolve (Cambrian explosion)
  • Language took ~300-500 million years
  • What evolution spent longest developing is likely most fundamental to intelligence
  • Visual/spatial intelligence predates and underlies linguistic intelligence

World Models

AI models that capture 3D structure and spatial intelligence, going beyond:

  • Flat pixels (2D image understanding)
  • Language tokens (text-only reasoning)
  • To represent true physical reality

Generation-Reconstruction Continuum

3D AI applications exist on a spectrum:

Pure Generation <-------------------------> Pure Reconstruction
   |                                                    |
Gaming, Metaverse,                              Robotics,
Creative content                           Physical manipulation
   |                                                    |
   +------ Most applications fall somewhere between ----+

World models must serve the entire continuum.

The Data-Algorithm-Compute Trinity

Major AI breakthroughs require convergence of three elements:

ElementImageNet Revolution Example
DataImageNet dataset (14M+ labeled images)
AlgorithmConvolutional Neural Networks (CNNs, published 1980s)
ComputeGPUs enabling parallel processing

Key insight: CNNs existed for decades before working. The algorithm was ready—data and compute were not.

Technical References

Foundational Technologies for Spatial AI

TechnologyDescriptionUse Case
NeRF (Neural Radiance Fields)Neural network representing 3D scenes as continuous volumetric functionsNovel view synthesis, 3D reconstruction
Gaussian SplattingPoint-based 3D representation using Gaussian primitivesReal-time rendering, efficient 3D capture
Differentiable RenderingRendering that allows gradient backpropagationLearning 3D from 2D supervision
PulsarPoint-based neural rendering (Christoph Lassner)Efficient 3D scene representation

Discriminative vs Generative Models

ApproachFunction3D Application
DiscriminativeClassify, recognize patternsObject detection, scene understanding
GenerativeCreate new content3D world generation, content creation

The tension between these represents different approaches to spatial AI.

AI Research Strategy

Problem Selection Framework

When choosing PhD research or startup ideas:

  1. Avoid collision courses - Don't compete where industry scaling advantages dominate
  2. Find compute-immune problems - Focus on areas where throwing more compute alone won't solve it
  3. Embrace delusional problems - If a problem seems impossibly hard, that's often a good sign
  4. Follow curiosity - Burning curiosity sustains multi-year research efforts

The Delusional Problem Mindset

"My entire career is going after problems that are just so hard, 
bordering delusional... if they were easy, someone would have solved them."

Apply this framework:

  1. Identify problems others dismiss as impossible
  2. Ask: "What would need to be true for this to work?"
  3. Find the leverage points (data, algorithm, or compute gaps)
  4. Commit fully despite uncertainty

AI Entrepreneurship Advice

Ground Zero Mindset

When starting a company:

  • Forget past achievements
  • Forget what others think of you
  • Hunker down and build
  • Focus entirely on building from scratch

Open Source Strategy

Decide based on business model alignment:

ScenarioRecommendation
Research acceleration neededOpen source to enable global collaboration
Network effects valuableOpen source to build ecosystem
Proprietary advantage criticalSelective openness
Community contribution beneficialOpen source with contribution model

ImageNet example: Open sourcing enabled AlexNet's emergence and accelerated the entire field.

Hiring for AI Teams

Primary trait to seek: Intellectual fearlessness

Characteristics:

  • Courage to embrace impossibly hard problems
  • Willingness to go all-in on uncertain bets
  • Comfort with ambiguity and potential failure
  • Driven by curiosity over career optimization

Graduate School Decision Framework

Pursue a PhD only if:

  • Driven by burning curiosity about a specific problem
  • Cannot imagine doing anything else
  • The research question obsesses you
  • Willing to accept opportunity cost

Do not pursue if:

  • Primarily for credentials or career advancement
  • Uncertain about research interests
  • Industry experience would serve goals better

Handling Imposter Syndrome

When feeling like a minority or outsider:

"Don't over-index on feeling like a minority... 
that moment will pass if you focus on the work."

Apply:

  1. Acknowledge the feeling without dwelling
  2. Redirect attention to the problem at hand
  3. Let results speak over time
  4. Build community with fellow researchers

Historical Context

The ImageNet Story (2007-2012)

Timeline for understanding AI breakthrough patterns:

YearEvent
2007ImageNet conceived at Princeton
2009Initial CVPR poster publication
2009-2011Open sourced, challenge created, ~30% error rate
2012AlexNet achieves breakthrough, error drops dramatically

Key decisions that enabled success:

  1. Open sourcing from the beginning
  2. Creating competitive challenge to attract talent
  3. Patience through years of modest results
  4. Betting on data-driven paradigm shift

AI Timeline Perspective

EraFocus
1956Dartmouth Conference - "machines that think"
1980sCNNs published (Yann LeCun et al.)
2009ImageNet dataset
2012Deep learning revolution (AlexNet)
2015Image captioning breakthroughs
2020sSpatial intelligence frontier

Applying These Insights

When Explaining Spatial Intelligence

  1. Start with evolutionary argument (540M years for vision)
  2. Distinguish from language-only AI
  3. Describe the generation-reconstruction continuum
  4. Reference concrete technologies (NeRF, Gaussian Splatting)

When Advising AI Researchers

  1. Assess their curiosity level and problem obsession
  2. Guide toward compute-immune research areas
  3. Encourage intellectual fearlessness
  4. Emphasize the data-algorithm-compute trinity

When Discussing AI Startups

  1. Apply ground zero mindset
  2. Evaluate open source strategy for their context
  3. Emphasize hiring for intellectual fearlessness
  4. Frame problems in terms of the trinity convergence

Key Organizations and Resources

ResourceDescription
World Labs (worldlabs.ai)Fei-Fei Li's spatial intelligence startup
Stanford HAIHuman-Centered AI Institute
ImageNet (imagenet.org)Original dataset
CVPR, ICCVPrimary computer vision conferences
"The Worlds I See"Fei-Fei Li's book
Skills Info
Original Name:spatial-intelligence-liAuthor:jona