TartanIMU

Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics

Shibo Zhao^1†*, Sifan Zhou^1†*, Raphael Blanchard¹, Yuheng Qiu¹, Wenshan Wang¹, Sebastian Scherer¹

^†Equal contribution ¹Carnegie Mellon University ^*Corresponding Author

Paper

Poster Code (Coming soon...)

Dataset & Checkpoints

Results (Foundation Model Performance on Different Robot Platform)

UGV (Foundation Model)

Quadruped (Foundation Model)

Drone (Foundation Model)

Human (Foundation Model)

About TartanIMU

Despite recent advances in deep learning, most existing learning IMU odometry methods are trained on specific datasets, lack generalization, and are prone to overfitting, which limits their real-world application. To address these challenges, we present Tartan IMU, a foundation model designed for generalizable, IMU-based state estimation across diverse robotic platforms.

Our approach consists of three stages: First, a pre-trained foundation model leverages over 100 hours of multi-platform data to establish general motion knowledge, achieving 36% improvement in ATE over specialized models. Second, to adapt to previously unseen tasks, we use Low-Rank Adaptation (LoRA), allowing positive transfer with only 1.1 M trainable parameters. Finally, to support robotics deployment, we introduce online test-time adaptation, which eliminates the boundary between training and testing, allowing the model to continuously "learn as it operates" at 200 FPS in real-time.

Tartan IMU is to our knowledge the first open-source cross-robot foundation model for pose estimation using solely IMU data.

System Architecture

Figure 1: Three learning stages of TartanIMU. (a) Pretrained IMU Model features a shared backbone to capture generalizable IMU knowledge. (b) Efficient Fine-Tuning utilizes an adapter to enable positive transfer for new tasks. (c) Online Adaptation employs an adaptive memory buffer to support on-the-fly model updates during deployment.

Method

Stage 1: Pretrained IMU Model

▼

Our foundation model leverages a shared backbone architecture to capture generalizable IMU motion patterns across different robotic platforms. This stage establishes the core motion understanding that serves as the foundation for subsequent adaptation stages.

t-SNE visualization of the learned ResNet feature space. Cluster separation across platforms shows the model's ability to learn motion-specific dynamics.

Stage 2: Efficient Fine-Tuning

▼

Once the base TartanIMU model is pretrained, we adapt it to unseen robot motions or challenging deployment scenarios using Low-Rank Adaptation (LoRA). This technique introduces only a small number of trainable parameters while freezing the original model, preserving its robust general motion understanding.

LoRA achieves this by reparameterizing weight updates as a low-rank matrix decomposition:

\[ h = W_0 x + \Delta W x = W_0 x + B A x \tag{1} \]

Here, \(W_0\) is the pretrained weight, and \(A, B\) are the small matrices trained for the new task. This structure ensures that learning is efficient, allowing use even with very limited data.

Our LoRA-based finetuning improves accuracy on new motion tasks while keeping computational and data costs low.

One of the key benefits of LoRA adaptation is non-forgetting: the core representation remains stable across tasks. This enables lifelong learning capabilities and is particularly useful in robotics where new environments and tasks are continuously encountered.

Comparison of LoRA vs. full fine-tuning. LoRA retains prior knowledge, while full finetuning can degrade earlier performance.

Stage 3: Online Adaptation

▼

In the final stage of our TartanIMU pipeline, we enable real-time test-time adaptation through a novel online learning strategy. Unlike traditional pipelines that maintain a static model during deployment, we allow the model to evolve as it operates. This is critical in real-world robotics, where domain shifts such as speed, terrain, or motion patterns frequently occur.

To support this, we maintain a lightweight, adaptive training buffer that stores recent IMU samples during deployment. These samples are filtered and clustered via a Gaussian Mixture Model (GMM) based motion classifier to ensure diversity across motion types—e.g., stationary, forward motion, left turns, and right turns. The buffer actively reselects samples to avoid redundancy, enabling quick and stable updates with minimal compute.

Online adaptation results in an 8-shaped trajectory using only IMU data. By maintaining a balanced buffer across diverse motion segments, TartanIMU adapts quickly during deployment, improving trajectory accuracy over time.

Performance of Online Adaptation on Unseen Trajectory. The Tartan IMU model progressively learns unseen circular patterns through incremental training data. It can be seen that our model can learn new motion patterns within 90 seconds.

Interactive Demo (Coming Soon)

Try TartanIMU with our curated sample trajectories! Select a platform and trajectory below to test with our live model on Hugging Face.

Quadruped Robot Trajectories

Outdoor Navigation

Duration: 120s | Terrain: Rough outdoor

Spot Robot - Outdoor Exploration

Boston Dynamics Spot navigating challenging outdoor terrain with rocks and vegetation.

Environment: Rocky outdoor terrain
Challenges: Uneven ground, obstacles
Data: 200Hz IMU, 120 seconds

Stair Climbing

Duration: 90s | Terrain: Urban stairs

Spot Robot - Stair Navigation

Complex stair climbing and descending with dynamic gait adjustments.

Environment: Multi-level stairs
Challenges: Height changes, gait adaptation
Data: 200Hz IMU, 90 seconds

Indoor Navigation

Duration: 150s | Environment: Office

Spot Robot - Indoor Office

Precise navigation through indoor office spaces with furniture obstacles.

Environment: Indoor office space
Challenges: Tight spaces, furniture
Data: 200Hz IMU, 150 seconds

Drone Flight Trajectories

3D Maneuvers

Duration: 90s | Type: Complex flight

Quadcopter - Indoor 3D Flight

Complex 3D maneuvers including loops, spirals, and rapid direction changes.

Environment: Indoor flight space
Challenges: 3D motion, rapid acceleration
Data: 200Hz IMU, 90 seconds

Windy Conditions

Duration: 180s | Environment: Outdoor

Drone - Wind Disturbance

Outdoor flight in windy conditions with constant stabilization adjustments.

Environment: Outdoor with wind
Challenges: Wind disturbance, stability
Data: 200Hz IMU, 180 seconds

Precision Hover

Duration: 60s | Type: Stationary

Drone - Precision Hovering

High-precision hovering with micro-adjustments and position holding.

Environment: Indoor controlled
Challenges: Micro-movements, stability
Data: 200Hz IMU, 60 seconds

Human Locomotion Trajectories

Urban Walking

Duration: 180s | Environment: City

Human - Urban Sidewalk

Natural walking patterns on urban sidewalks with turns and speed changes.

Environment: Urban sidewalk
Challenges: Variable speed, direction changes
Data: 200Hz IMU, 180 seconds

Jogging

Duration: 240s | Activity: Running

Human - Jogging Path

Continuous jogging with varying pace and directional changes along park paths.

Environment: Park jogging path
Challenges: Higher frequency motion, pace changes
Data: 200Hz IMU, 240 seconds

Indoor Navigation

Duration: 120s | Environment: Building

Human - Building Navigation

Walking through multi-level building with stairs and corridor navigation.

Environment: Multi-story building
Challenges: Stairs, elevation changes
Data: 200Hz IMU, 120 seconds

UGV Navigation Trajectories

Forest Trail

Duration: 200s | Terrain: Off-road

UGV - Forest Navigation

Off-road navigation through forest trails with varying terrain and obstacles.

Environment: Forest trail
Challenges: Bumpy terrain, speed variation
Data: 200Hz IMU, 200 seconds

Urban Street

Duration: 300s | Environment: City

UGV - City Navigation

Autonomous navigation through urban streets with traffic and intersections.

Environment: Urban streets
Challenges: Traffic, stop-and-go motion
Data: 200Hz IMU, 300 seconds

Parking Maneuvers

Duration: 90s | Type: Precision

UGV - Parking Precision

Complex parking maneuvers including parallel parking and tight turns.

Environment: Parking lot
Challenges: Precision maneuvers, tight spaces
Data: 200Hz IMU, 90 seconds

Select a trajectory above to get started

Choose any platform and trajectory to see detailed information and try it with our model.

How it works

Select a platform (Quadruped, Drone, Human, or UGV)
Choose a specific trajectory from the available options
Click "Try with TartanIMU Model" to launch our Hugging Face demo
The selected trajectory data will be automatically loaded
See real-time pose estimation results and compare with ground truth

Upload your own data

Want to test with your own IMU data? Our Hugging Face demo also supports custom NPZ file uploads.

Required format: NPZ file containing IMU data at 200Hz with keys: 'acc', 'gyro', 'timestamp'

Open Hugging Face Demo

Dataset (Release Soon)

The TartanIMU dataset contains over 100 hours of diverse IMU data across multiple robotic platforms, environments, and motion patterns. This comprehensive collection enables robust foundation model training and evaluation.

Platform Statistics

Platform	Robot Types	Environments	Trajectories	Total Duration	Data Rate	Download
Quadruped	Boston Dynamics Spot, ANYmal	Indoor, Outdoor, Stairs, Rough Terrain	45	28.5 hours	200 Hz	Hugging Face
Drone	DJI M100, Custom Quadcopter	Indoor Flight, Outdoor, Windy Conditions	38	22.7 hours	200 Hz	Hugging Face
Human	Handheld Device, Body-worn IMU	Urban Walking, Jogging, Indoor Navigation	52	31.2 hours	200 Hz	Hugging Face
UGV	RC Car, Autonomous Vehicle, SubT Robot	Off-road, Urban Streets, Forest Trails	42	25.4 hours	200 Hz	Hugging Face
Total	12 Robot Types	15+ Environments	177	107.8 hours	200 Hz	Complete Dataset

Data Format and Usage

File Format

NPZ files containing synchronized IMU data:

acc: 3D accelerometer data (m/s²)
gyro: 3D gyroscope data (rad/s)
timestamp: High-precision timestamps
pose_gt: Ground truth poses (when available)

Pre-processing

All data is:

Temporally synchronized across platforms
Calibrated and bias-corrected
Resampled to consistent 200Hz
Segmented into motion-coherent sequences

Quick Start

Load and use the data:

import numpy as np
data = np.load('trajectory.npz')
acc = data['acc']    # Shape: (N, 3)
gyro = data['gyro']  # Shape: (N, 3)
timestamps = data['timestamp']

Key Features

Multi-Environment

Indoor offices, outdoor terrains, underground caves, urban streets, forest trails

Cross-Platform

Legged robots, drones, ground vehicles, handheld devices

Synchronized

Hardware-synchronized IMU data at 200Hz across all platforms

Diverse Motion

Walking, flying, driving, climbing, hovering, maneuvering

Limitations

While TartanIMU exhibits strong generalization across ve- hicles, drones, and legged robots, it still cannot support ar- bitrary robotic platforms. However, our experiments show that the car motion head generalizes well to TartanDrive and SubT vehicles. We believe our categories—car, humanoid, quadruped, and drone—encompass most robots. For unseen platforms, introducing a new motion head or leveraging a mixture of existing experts (MoE) presents a promising future direction.

Future Research Directions

We are actively working to address these limitations through:

Multi-modal fusion: Integrating visual and LiDAR data for drift correction and scale recovery
Adaptive learning: Developing methods for continuous learning from deployment data
Hardware optimization: Creating efficient model variants for edge computing platforms
Robust estimation: Improving resilience to sensor failures and environmental disturbances
Extended datasets: Collecting data from more diverse platforms and challenging scenarios

Citations

@inproceedings{zhao2025tartan, title={Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics}, author={Zhao, Shibo and Zhou, Sifan and Blanchard, Raphael and Qiu, Yuheng and Wang, Wenshan and Scherer, Sebastian}, booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, pages={22520--22529}, year={2025} }