Basanth Periyapatna Roopa Kumar — Data Scientist & Designer

The Problem With Data Visualization

Most data visualization tools make the same tradeoff: legibility over feeling. You get accurate charts, clean axes, correctly-labelled data points. You understand the data intellectually. But you don't feel it.

There's a category of insight that comes not from reading a chart but from navigating a space — from having scale, distance, and topology work on you over time. Cartographers understood this long before computers: a well-made map doesn't just communicate information, it creates a relationship between the reader and the territory.

Cartograph is an attempt to apply that philosophy to arbitrary datasets.

The Core Idea: Terrain as Data

The central interaction is this: drop in a CSV, and the data becomes topography. Numeric columns map to elevation. Categorical columns map to biome — color, material, and texture variations that give different data regions distinct visual identities. Time-series data animates as geological change, the landscape shifting and reforming as you scrub through.

The interface is a first-person camera navigating this terrain. You don't click on a data point to see its value — you walk toward it. Its absolute size in your field of view tells you something about its relative magnitude. Its material tells you which category it belongs to. Its neighborhood tells you which other points it correlates with.

Rendering Architecture

The Geometry Problem: 500K Points at 60fps

The naive approach to rendering a large dataset in WebGL is to create one mesh object per data point. At 1,000 points that works fine. At 100,000 points, you're making 100,000 draw calls per frame, and the CPU-to-GPU communication overhead destroys frame rate.

The solution is InstancedMesh — a Three.js primitive that submits all instances of a geometry in a single draw call, passing per-instance data (position, scale, color) through a single InstancedBufferAttribute. The GPU then renders all instances in parallel using its own internal parallelism.

For Cartograph, each data point becomes an instance. The vertex shader receives position (X=feature 1, Z=feature 2, Y=numeric magnitude), color (categorical encoding), and a scale factor. The fragment shader applies the biome material using a custom color map that interpolates between six predefined biome palettes.

// Vertex shader (simplified)
attribute vec3 instancePosition;
attribute float instanceScale;
attribute vec3 instanceColor;
varying vec3 vColor;
varying float vFogFactor;

void main() {
  vColor = instanceColor;
  vec4 worldPos = vec4(position * instanceScale + instancePosition, 1.0);
  
  // Height-based fog for depth perception
  float fogDist = length(worldPos.xyz - cameraPosition);
  vFogFactor = clamp((fogDist - 80.0) / 120.0, 0.0, 1.0);
  
  gl_Position = projectionMatrix * modelViewMatrix * worldPos;
}

Normal computation for the height-map terrain mesh — the continuous surface underneath the data points — runs in a WebAssembly module to avoid blocking the main thread during recalculation when datasets change.

D3 as Data Infrastructure, Not Renderer

Every Cartograph tutorial starts by treating D3 as a charting library — calling d3.select, appending SVG elements, binding data. That's not how Cartograph uses D3.

D3's actual value is in its mathematical and statistical utilities: scales, projections, voronoi diagrams, force simulations, color interpolation, contour generation. Cartograph uses D3 purely as a data transformation and coordinate mapping layer, with Three.js handling all rendering.

The pipeline for a new dataset:

Schema inference: D3's csvParse + type coercion detection classifies each column as numeric, categorical, temporal, or geographic
Scale construction: d3.scaleLinear, d3.scaleOrdinal, and d3.scaleSequential with domain/range computed from the data
Normalization: All numeric features normalized to [0, 1] for consistent spatial mapping
Outlier handling: Values beyond 3σ are clamped and visually flagged with an emission shader override
Coordinate computation: (x, y, z) world positions computed for all instances, written into Float32Array buffers
Buffer upload: Single BufferAttribute.set() call uploads all instance data to GPU memory

The full pipeline for a 100K-row dataset runs in ~180ms on modern hardware.

The Camera: Navigation as Analysis

A significant design investment went into the camera system, because navigation is the analysis in Cartograph. What you see as you approach a cluster, what you notice when you turn around, what emerges as you gain altitude — these are not incidental to the experience, they are the insight mechanism.

Three camera modes:

Exploration mode is a first-person WASD+mouse controller. You navigate the terrain like you'd explore a game environment. Momentum and inertia make movement feel physical. Raycasting detects when you're looking at a data point and surfaces its raw values in a minimal HUD overlay.

Cinematic mode is triggered when you select a data point or cluster. The camera executes a smooth Bezier-path fly-to animation, approaching from a calculated "best angle" that maximizes the visual separation of the selected point from its neighbors. This is the equivalent of a zoom-to in a traditional chart, but it preserves spatial context throughout the transition.

Orbit mode locks the camera to a user-defined focal point and lets you rotate around it freely. This is the primary mode for examining dense clusters — you can circle a group of related data points to understand their 3D distribution.

Post-Processing Pipeline

Raw WebGL rendering of the terrain looks correct but not cinematic. The post-processing stack is where the visual identity comes from:

SSAO (Screen-Space Ambient Occlusion): Adds subtle contact shadow where data points sit close to the terrain surface, grounding them visually
Bloom: Bright data points (high-magnitude outliers) emit a subtle luminescence, making them pop from the landscape
Depth of Field: Background elements blur when the camera focuses on a foreground cluster, using cinematic focus cueing to guide attention
Film grain: A subtle noise layer on the final output prevents the "too clean" look of raw GPU rendering

All post-processing runs through @react-three/postprocessing (EffectComposer) in a single render pass to minimize GPU overhead.

Results

50,000+ datasets have been processed through Cartograph since public beta launch. The use cases that emerged were more diverse than anticipated: academic researchers mapping citation networks, urban planners visualizing geospatial population data, a product team using it to navigate user behavior clusters that previously lived in flat CSV exports.

The feature that generated the most feedback was the temporal animation mode — watching a dataset's topology change over time as you scrub through a time series. Several users described it as "the first time I actually understood the trend" — because seeing the landscape physically shift is a different cognitive experience from reading a trend line.

Stack

Rendering: Three.js r160 + React-Three-Fiber 8 + Drei
Data: D3.js 7 (pipeline only — no SVG rendering)
Shaders: GLSL (custom vertex + fragment per material)
Post-processing: @react-three/postprocessing (SSAO, Bloom, DOF)
Performance: InstancedMesh + WebAssembly normal computation
Framework: React 18 + Next.js 14 (App Router)
State: Zustand + Jotai (camera/selection state split)

Cartograph: Data as Cinematic Terrain