Skip to Content

Overview

Neocortex Vision is the robot’s perception and visual processing system—transforming raw camera feeds, depth sensors, and LiDAR into actionable visual information for teleoperation, autonomy, and research. It provides synchronized multi-modal sensor data in various formats: video streams for remote operators, fused point clouds for navigation algorithms, height scans for locomotion policies, and structured recordings for machine learning datasets.

The system distributes workloads across N100 and Jetson compute boards, adapting its processing pipeline based on operational needs—whether prioritizing low-latency streaming for teleoperation or comprehensive sensor fusion for autonomous operation.


Features

For Research & Development

Time-Synchronized Sensor Data - Access time-aligned, properly formatted sensor data to integrate custom algorithms and models:

  • Position tracking from onboard SLAM for whole-body control policys
  • RGB-D camera feeds for vision-language-action models
  • Fused 3D point clouds for SLAM and world model research
  • Height scan grids for locomotion policy development
  • Multi-camera RGB streams with custom multi-camera 3A algorithm ensuring consistent exposure, white balance, and color across cameras for seamless stitching
  • IMU data for sensor fusion and odometry

Pluggable AI Models - Swap or customize processing algorithms:

  • Multi-camera 3A for consistent image quality
  • Custom VLA models for embodied AI research
  • Gaussian splatting or other 3D reconstruction methods

Core Capabilities

  • Multi-sensor fusion - Combines LiDAR, ToF depth, and camera data into unified representations
  • Multi-camera coordination - Synchronized capture with automated 3A across all cameras for consistent imaging
  • Flexible processing modes - Configurable pipelines optimized for teleoperation, data collection, or autonomous operation
  • Real-time streaming - Low-latency video encoding and transmission for remote control

Data Collection & Monitoring

  • Structured recordings saved as MCAP  for post-processing or RRD files for Rerun  visualization
  • Built-in profiling and logging across the entire vision pipeline

High-Level Architecture

The diagram below illustrates the complete Neocortex Vision pipeline—from raw sensor inputs at the top, through processing stages in the middle, to final outputs consumed by teleoperation interfaces and autonomous algorithms at the bottom.

Key data flows:

Teleoperation Path (left side): Raw camera feeds pass through video frame stitching to create immersive first-person and third-person views, which are rendered and streamed to the teleoperation UI for remote control.

Autonomy Path (right side): ToF sensors and LiDAR generate depth data and point clouds that are fused into a unified 3D representation. This feeds both the environment map for SLAM-based navigation and the height scan grid for locomotion policies.

Model Integration (center-left): The pipeline supports pluggable AI models—including Asimov’s multi-camera 3A algorithm, swappable VLAs, and custom user models—that can consume synchronized sensor data for research and deployment.

Legend:

  • Dark gray nodes: Hardware sensors (Camera Array, ToF, LiDAR, IMU)
  • Blue nodes: Final outputs consumed by applications (Teleoperation UI, Locomotion policy, Navigation)
  • Teal nodes with dashed borders: Pluggable AI models (can be swapped or customized)
  • Black nodes: Internal processing stages
  • Solid arrows: Main data flow
  • Dashed arrows: Optional or feedback connections

Available Data Formats

First-Person View - 180°

  • Source: 2 main cameras
  • What it provides: ~2:1 aspect ratio video stream that roughly corresponds to the robot’s left and right eye field of view at 30 FPS
  • How to access: RTP video stream
  • Processed on: N100 compute board using Vulkan

First-Person View - 360°

  • Source: 2 main cameras + 4 secondary cameras
  • What it provides: Equirectangular video covering close to 360° × 100° field of view at 30 FPS
  • How to access: RTP video stream
  • Processed on: Jetson compute board using CUDA pipeline (or PyTorch) space

Third-Person View

Synthetic third-person view for better situational awareness during teleoperation.

  • Source: First-Person View - 360°, Fused Point Cloud
  • What it provides: Third-person view video stream built from rendered 3D reconstruction of the surroundings at 10 FPS
  • How to access: RTP video stream, or Protobuf message via RML API

Fused Point Cloud

  • Source: LiDAR point cloud, ToF depth map, known camera poses
  • What it provides: Open3D compatible representation that combines 3D point clouds and depth information from different sensors covering different angles
  • Processed on: Jetson compute board (point cloud registration and noise filtering)

Height Map

Grid-shaped height map commonly used for locomotion policy.

  • Source: Fused Point Cloud
  • What it provides: 2D height map with 10cm resolution (adjustable) covering 2m x 1m area around the robot, updated at 10 FPS
  • How to access: Protobuf message via RML API
  • Processed on: Jetson compute board (resampling algorithm) height_scan

Raw ToF Depth Map

  • Source: ToF depth camera
  • What it provides: 640x480 depth image at 10 FPS with default hardware
  • How to access: Protobuf message via RML API
  • Processed on: Jetson compute board (denoising and hole filling)

Raw LiDAR Point Cloud

  • Source: LiDAR
  • What it provides: Point cloud at 10-20 FPS depending on the LiDAR model
  • How to access: Protobuf message via RML API
  • Processed on: Jetson compute board (filtering and downsampling)

Equirectangular Depth Map

High resolution depth map covering close to 360° × 100° field of view.

  • Source: First-Person View - 360°, Fused Point Cloud
  • What it provides: Equirectangular depth image at 10 FPS
  • How to access: Protobuf message via RML API
  • Processed on: Jetson compute board (depth completion model)

Operation Modes

Due to computational power and I/O bandwidth limitations, not all algorithms for processing sensor data can run simultaneously. Therefore, we divide all neocortex-vision functionality into 3 use cases:

  1. Teleop (real-time vision)
  2. Data collection
  3. Semi-autonomous/autonomous (machine vision)

Teleop Mode

  1. Use video frames from all cameras to create Third-Person View
  2. Use video frames from all cameras to create First-Person View - 180°/360°
  3. Encode stitched frames for streaming
  4. Process IMU data for orientation (raw data not required)

Data Collection Mode

  1. Capture raw frames (2 main cameras) for basic teleoperation
  2. Record and store all sensor inputs onboard
  3. Disable all data processing algorithms except those essential for locomotion policy

Semi-autonomous/Autonomous Mode

  1. Disable teleoperation-only data such as Third-Person View and First-Person View - 180°/360°
  2. Enable 3D-related input and data processing algorithms

Custom Modes

Users can create custom modes by selecting specific algorithms and data processing tasks based on their requirements. There is no guarantee that all algorithms will run smoothly together; this will require some experimentation on the user’s part.

Data Availability

DataTeleopData CollectionAutonomous
Robot Position
First-Person View
Third-Person View
Main Camera Feed
Secondary Camera Feed
Fused Point Cloud
ToF Depth Image
Equirectangular Video

Algorithm Availability

Which algorithms can be enabled for each operation mode:

AlgorithmTeleopData CollectionAutonomous
Point Cloud to Height Scan
Multi-Sensor Point Cloud Fusion
Multi-Camera 3A
First-Person View Stitching
Equirectangular Depth Model
Equirectangular Stitching
Third-Person View Rendering
Last updated on