TartanIMU

A Light Foundation Model for Inertial Positioning in Robotics

TartanIMU

Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics

Shibo Zhao1†*, Sifan Zhou1†*, Raphael Blanchard1, Yuheng Qiu1, Wenshan Wang1, Sebastian Scherer1

Equal contribution 1Carnegie Mellon University *Corresponding Author

Paper Poster Code (Coming soon...) Dataset & Checkpoints

Results (Foundation Model Performance on Different Robot Platform)

Preview 1
Preview 2
Preview 3
Preview 4

About TartanIMU

Despite recent advances in deep learning, most existing learning IMU odometry methods are trained on specific datasets, lack generalization, and are prone to overfitting, which limits their real-world application. To address these challenges, we present Tartan IMU, a foundation model designed for generalizable, IMU-based state estimation across diverse robotic platforms.

Our approach consists of three stages: First, a pre-trained foundation model leverages over 100 hours of multi-platform data to establish general motion knowledge, achieving 36% improvement in ATE over specialized models. Second, to adapt to previously unseen tasks, we use Low-Rank Adaptation (LoRA), allowing positive transfer with only 1.1 M trainable parameters. Finally, to support robotics deployment, we introduce online test-time adaptation, which eliminates the boundary between training and testing, allowing the model to continuously "learn as it operates" at 200 FPS in real-time.

TartanIMU Overview

Tartan IMU is to our knowledge the first open-source cross-robot foundation model for pose estimation using solely IMU data.

System Architecture

System architecture

Figure 1: Three learning stages of TartanIMU. (a) Pretrained IMU Model features a shared backbone to capture generalizable IMU knowledge. (b) Efficient Fine-Tuning utilizes an adapter to enable positive transfer for new tasks. (c) Online Adaptation employs an adaptive memory buffer to support on-the-fly model updates during deployment.

Method

Stage 1: Pretrained IMU Model

Our foundation model leverages a shared backbone architecture to capture generalizable IMU motion patterns across different robotic platforms. This stage establishes the core motion understanding that serves as the foundation for subsequent adaptation stages.

t-SNE visualization of learned features

t-SNE visualization of the learned ResNet feature space. Cluster separation across platforms shows the model's ability to learn motion-specific dynamics.

Stage 2: Efficient Fine-Tuning

Once the base TartanIMU model is pretrained, we adapt it to unseen robot motions or challenging deployment scenarios using Low-Rank Adaptation (LoRA). This technique introduces only a small number of trainable parameters while freezing the original model, preserving its robust general motion understanding.

LoRA achieves this by reparameterizing weight updates as a low-rank matrix decomposition:

\[ h = W_0 x + \Delta W x = W_0 x + B A x \tag{1} \]

Here, \(W_0\) is the pretrained weight, and \(A, B\) are the small matrices trained for the new task. This structure ensures that learning is efficient, allowing use even with very limited data.

Offline finetuning results

Our LoRA-based finetuning improves accuracy on new motion tasks while keeping computational and data costs low.

One of the key benefits of LoRA adaptation is non-forgetting: the core representation remains stable across tasks. This enables lifelong learning capabilities and is particularly useful in robotics where new environments and tasks are continuously encountered.

No forgetting comparison

Comparison of LoRA vs. full fine-tuning. LoRA retains prior knowledge, while full finetuning can degrade earlier performance.

Stage 3: Online Adaptation

In the final stage of our TartanIMU pipeline, we enable real-time test-time adaptation through a novel online learning strategy. Unlike traditional pipelines that maintain a static model during deployment, we allow the model to evolve as it operates. This is critical in real-world robotics, where domain shifts such as speed, terrain, or motion patterns frequently occur.

To support this, we maintain a lightweight, adaptive training buffer that stores recent IMU samples during deployment. These samples are filtered and clustered via a Gaussian Mixture Model (GMM) based motion classifier to ensure diversity across motion types—e.g., stationary, forward motion, left turns, and right turns. The buffer actively reselects samples to avoid redundancy, enabling quick and stable updates with minimal compute.

Online adaptation illustration

Online adaptation results in an 8-shaped trajectory using only IMU data. By maintaining a balanced buffer across diverse motion segments, TartanIMU adapts quickly during deployment, improving trajectory accuracy over time.

SLAM and IMU fallback mechanism

Performance of Online Adaptation on Unseen Trajectory. The Tartan IMU model progressively learns unseen circular patterns through incremental training data. It can be seen that our model can learn new motion patterns within 90 seconds.

Interactive Demo (Coming Soon)

Try TartanIMU with our curated sample trajectories! Select a platform and trajectory below to test with our live model on Hugging Face.

Quadruped Robot Trajectories

Outdoor Navigation

Outdoor Navigation

Duration: 120s | Terrain: Rough outdoor

Spot Robot - Outdoor Exploration

Boston Dynamics Spot navigating challenging outdoor terrain with rocks and vegetation.

  • Environment: Rocky outdoor terrain
  • Challenges: Uneven ground, obstacles
  • Data: 200Hz IMU, 120 seconds
Stair Climbing

Stair Climbing

Duration: 90s | Terrain: Urban stairs

Spot Robot - Stair Navigation

Complex stair climbing and descending with dynamic gait adjustments.

  • Environment: Multi-level stairs
  • Challenges: Height changes, gait adaptation
  • Data: 200Hz IMU, 90 seconds
Indoor Navigation

Indoor Navigation

Duration: 150s | Environment: Office

Spot Robot - Indoor Office

Precise navigation through indoor office spaces with furniture obstacles.

  • Environment: Indoor office space
  • Challenges: Tight spaces, furniture
  • Data: 200Hz IMU, 150 seconds

Drone Flight Trajectories

3D Maneuvers

3D Maneuvers

Duration: 90s | Type: Complex flight

Quadcopter - Indoor 3D Flight

Complex 3D maneuvers including loops, spirals, and rapid direction changes.

  • Environment: Indoor flight space
  • Challenges: 3D motion, rapid acceleration
  • Data: 200Hz IMU, 90 seconds
Windy Conditions

Windy Conditions

Duration: 180s | Environment: Outdoor

Drone - Wind Disturbance

Outdoor flight in windy conditions with constant stabilization adjustments.

  • Environment: Outdoor with wind
  • Challenges: Wind disturbance, stability
  • Data: 200Hz IMU, 180 seconds
Precision Hover

Precision Hover

Duration: 60s | Type: Stationary

Drone - Precision Hovering

High-precision hovering with micro-adjustments and position holding.

  • Environment: Indoor controlled
  • Challenges: Micro-movements, stability
  • Data: 200Hz IMU, 60 seconds

Human Locomotion Trajectories

Urban Walking

Urban Walking

Duration: 180s | Environment: City

Human - Urban Sidewalk

Natural walking patterns on urban sidewalks with turns and speed changes.

  • Environment: Urban sidewalk
  • Challenges: Variable speed, direction changes
  • Data: 200Hz IMU, 180 seconds
Jogging

Jogging

Duration: 240s | Activity: Running

Human - Jogging Path

Continuous jogging with varying pace and directional changes along park paths.

  • Environment: Park jogging path
  • Challenges: Higher frequency motion, pace changes
  • Data: 200Hz IMU, 240 seconds
Indoor Navigation

Indoor Navigation

Duration: 120s | Environment: Building

Human - Building Navigation

Walking through multi-level building with stairs and corridor navigation.

  • Environment: Multi-story building
  • Challenges: Stairs, elevation changes
  • Data: 200Hz IMU, 120 seconds

UGV Navigation Trajectories

Forest Trail

Forest Trail

Duration: 200s | Terrain: Off-road

UGV - Forest Navigation

Off-road navigation through forest trails with varying terrain and obstacles.

  • Environment: Forest trail
  • Challenges: Bumpy terrain, speed variation
  • Data: 200Hz IMU, 200 seconds
Urban Street

Urban Street

Duration: 300s | Environment: City

UGV - City Navigation

Autonomous navigation through urban streets with traffic and intersections.

  • Environment: Urban streets
  • Challenges: Traffic, stop-and-go motion
  • Data: 200Hz IMU, 300 seconds
Parking Maneuvers

Parking Maneuvers

Duration: 90s | Type: Precision

UGV - Parking Precision

Complex parking maneuvers including parallel parking and tight turns.

  • Environment: Parking lot
  • Challenges: Precision maneuvers, tight spaces
  • Data: 200Hz IMU, 90 seconds

Select a trajectory above to get started

Choose any platform and trajectory to see detailed information and try it with our model.

How it works

  1. Select a platform (Quadruped, Drone, Human, or UGV)
  2. Choose a specific trajectory from the available options
  3. Click "Try with TartanIMU Model" to launch our Hugging Face demo
  4. The selected trajectory data will be automatically loaded
  5. See real-time pose estimation results and compare with ground truth

Upload your own data

Want to test with your own IMU data? Our Hugging Face demo also supports custom NPZ file uploads.

Required format: NPZ file containing IMU data at 200Hz with keys: 'acc', 'gyro', 'timestamp'

Open Hugging Face Demo

Dataset (Release Soon)

The TartanIMU dataset contains over 100 hours of diverse IMU data across multiple robotic platforms, environments, and motion patterns. This comprehensive collection enables robust foundation model training and evaluation.

Platform Statistics

Platform Robot Types Environments Trajectories Total Duration Data Rate Download
Quadruped Boston Dynamics Spot, ANYmal Indoor, Outdoor, Stairs, Rough Terrain 45 28.5 hours 200 Hz Hugging Face
Drone DJI M100, Custom Quadcopter Indoor Flight, Outdoor, Windy Conditions 38 22.7 hours 200 Hz Hugging Face
Human Handheld Device, Body-worn IMU Urban Walking, Jogging, Indoor Navigation 52 31.2 hours 200 Hz Hugging Face
UGV RC Car, Autonomous Vehicle, SubT Robot Off-road, Urban Streets, Forest Trails 42 25.4 hours 200 Hz Hugging Face
Total 12 Robot Types 15+ Environments 177 107.8 hours 200 Hz Complete Dataset

Data Format and Usage

File Format

NPZ files containing synchronized IMU data:

  • acc: 3D accelerometer data (m/s²)
  • gyro: 3D gyroscope data (rad/s)
  • timestamp: High-precision timestamps
  • pose_gt: Ground truth poses (when available)

Pre-processing

All data is:

  • Temporally synchronized across platforms
  • Calibrated and bias-corrected
  • Resampled to consistent 200Hz
  • Segmented into motion-coherent sequences

Quick Start

Load and use the data:

import numpy as np
data = np.load('trajectory.npz')
acc = data['acc']    # Shape: (N, 3)
gyro = data['gyro']  # Shape: (N, 3)
timestamps = data['timestamp']

Key Features

Multi-Environment

Indoor offices, outdoor terrains, underground caves, urban streets, forest trails

Cross-Platform

Legged robots, drones, ground vehicles, handheld devices

Synchronized

Hardware-synchronized IMU data at 200Hz across all platforms

Diverse Motion

Walking, flying, driving, climbing, hovering, maneuvering

Limitations

While TartanIMU exhibits strong generalization across ve- hicles, drones, and legged robots, it still cannot support ar- bitrary robotic platforms. However, our experiments show that the car motion head generalizes well to TartanDrive and SubT vehicles. We believe our categories—car, humanoid, quadruped, and drone—encompass most robots. For unseen platforms, introducing a new motion head or leveraging a mixture of existing experts (MoE) presents a promising future direction.

Future Research Directions

We are actively working to address these limitations through:

  • Multi-modal fusion: Integrating visual and LiDAR data for drift correction and scale recovery
  • Adaptive learning: Developing methods for continuous learning from deployment data
  • Hardware optimization: Creating efficient model variants for edge computing platforms
  • Robust estimation: Improving resilience to sensor failures and environmental disturbances
  • Extended datasets: Collecting data from more diverse platforms and challenging scenarios

Citations

@inproceedings{zhao2025tartan, title={Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics}, author={Zhao, Shibo and Zhou, Sifan and Blanchard, Raphael and Qiu, Yuheng and Wang, Wenshan and Scherer, Sebastian}, booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, pages={22520--22529}, year={2025} }