Nikita Karaev

I’m a founder of Pixelwise AI, where we’re building technology to train robots using internet-scale human data. Previously, I was fortunate to do a PhD at Meta AI and the Visual Geometry Group, University of Oxford, supervised by Christian Rupprecht, Natalia Neverova, and Andrea Vedaldi. Before my PhD, I completed an engineering program at École Polytechnique in beautiful Paris. I also enjoy running and exploring mountains 🏔️

Github Scholar Twitter

Jul 2025: CoTracker3 and SpatialTrackerV2 are accepted at ICCV 2025!
Jun 2025: 🎉VGGT won the best paper award at CVPR 2025!🎉
Mar 2025: We released VGGT, a feed-forward neural net that directly predicts all key 3D attributes of a scene. Try our HF demo!
Nov 2025: Yuri and I left Meta AI and started Pixelwise AI to unlock training robots on internet-scale data via imitation learning!
Oct 2024: We`re releasing CoTracker3, a new point tracking model trained on real data.
Jul 2024: CoTracker is accepted at ECCV 2024!
Mar 2024: VGGSfM is accepted at CVPR 2024 as a highlight!
Jan 2024: CoTracker now supports tracking of 10x more points.
Aug 2023: We released CoTracker, a model for tracking any pixel in a video.
Mar 2023: My first PhD paper, DynamicStereo, has been accepted at CVPR 2023!
Jan 2022: I have started my PhD at Meta AI and Oxford!
Sep 2021: We climbed Mount Elbrus, the highest mountain in Europe! 🏔️ 5642m
Aug 2021: I have completed the internship at FAIR.
May 2021: I have started a research internship at Facebook AI Research with Natalia Neverova and Andrea Vedaldi.

VGGT: Visual Geometry Grounded Transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotny

CVPR 2025 (Best Paper Award)

VGGT is a feed-forward neural net that directly predicts all key 3D attributes of a scene: cameras, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views, within seconds.

Project Page PDF arXiv

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Nikita Karaev, , Jianyuan Wang, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

ICCV 2025

In CoTracker3, we propose a simple and effective method for scaling synthetic-trained point trackers on real data.

Project Page PDF arXiv

Visual Geometry Grounded Deep Structure From Motion

Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny

CVPR 2024 (Highlight)

We propose a new fully differentiable Structure-from-Motion pipeline.

Project Page PDF arXiv

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

ECCV 2024

CoTracker bridges the gap between long-term point tracking and Optical Flow by jointly tracking multiple points (pixels) throughout an entire video.

Project Page PDF arXiv

DynamicStereo: Consistent Dynamic Depth from Stereo Videos

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

CVPR 2023

We introduce Dynamic Replica, a synthetic benchmark dataset for dynamic depth-from-stereo models, and propose DynamicStereo, a temporally consistent disparity estimation model that we train on this dataset.

Project Page PDF arXiv

Hello, I'm Nikita Karaev

News

Publications

VGGT: Visual Geometry Grounded Transformer

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Visual Geometry Grounded Deep Structure From Motion

CoTracker: It is Better to Track Together

DynamicStereo: Consistent Dynamic Depth from Stereo Videos