Hello, I'm Nikita Karaev

Iโ€™m a founder of Pixelwise AI, where weโ€™re building technology to train robots using internet-scale human data. Previously, I was fortunate to do a PhD at Meta AI and the Visual Geometry Group, University of Oxford, supervised by Christian Rupprecht, Natalia Neverova, and Andrea Vedaldi. Before my PhD, I completed an engineering program at ร‰cole Polytechnique in beautiful Paris. I also enjoy running and exploring mountains ๐Ÿ”๏ธ


News

  • Jul 2025: CoTracker3 and SpatialTrackerV2 are accepted at ICCV 2025!
  • Jun 2025: ๐ŸŽ‰VGGT won the best paper award at CVPR 2025!๐ŸŽ‰
  • Mar 2025: We released VGGT, a feed-forward neural net that directly predicts all key 3D attributes of a scene. Try our HF demo!
  • Nov 2025: Yuri and I left Meta AI and started Pixelwise AI to unlock training robots on internet-scale data via imitation learning!
  • Oct 2024: We`re releasing CoTracker3, a new point tracking model trained on real data.
  • Jul 2024: CoTracker is accepted at ECCV 2024!
  • Mar 2024: VGGSfM is accepted at CVPR 2024 as a highlight!
  • Jan 2024: CoTracker now supports tracking of 10x more points.
  • Aug 2023: We released CoTracker, a model for tracking any pixel in a video.
  • Mar 2023: My first PhD paper, DynamicStereo, has been accepted at CVPR 2023!
  • Jan 2022: I have started my PhD at Meta AI and Oxford!
  • Sep 2021: We climbed Mount Elbrus, the highest mountain in Europe! ๐Ÿ”๏ธ 5642m
  • Aug 2021: I have completed the internship at FAIR.
  • May 2021: I have started a research internship at Facebook AI Research with Natalia Neverova and Andrea Vedaldi.

Publications

VGGT: Visual Geometry Grounded Transformer

CVPR 2025 (Best Paper Award)

VGGT is a feed-forward neural net that directly predicts all key 3D attributes of a scene: cameras, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views, within seconds.

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

ICCV 2025

In CoTracker3, we propose a simple and effective method for scaling synthetic-trained point trackers on real data.

Visual Geometry Grounded Deep Structure From Motion

CVPR 2024 (Highlight)

We propose a new fully differentiable Structure-from-Motion pipeline.

CoTracker: It is Better to Track Together

ECCV 2024

CoTracker bridges the gap between long-term point tracking and Optical Flow by jointly tracking multiple points (pixels) throughout an entire video.

DynamicStereo: Consistent Dynamic Depth from Stereo Videos

CVPR 2023

We introduce Dynamic Replica, a synthetic benchmark dataset for dynamic depth-from-stereo models, and propose DynamicStereo, a temporally consistent disparity estimation model that we train on this dataset.