Interactive Point Clouds: Real-Time Gaussian Splatting & Point-Cloud Art in Live Installations (2026)

Jocelyn Lecamus

Jocelyn Lecamus

Co-Founder, CEO of Utsubo

Jun 5th, 2026·14 min read
Interactive Point Clouds: Real-Time Gaussian Splatting & Point-Cloud Art in Live Installations (2026)

A point cloud you can fly through on a website is a finished object. A point cloud in a live installation is a performer — it has to render in real time, react to the people standing in front of it, and survive eight hours a day on a gallery floor.

This guide is about that second thing: using point clouds and Gaussian splats live and interactively in a physical space. Not a baked web fly-through, but real-time rendering driven by depth cameras, motion, and audio, and pushed out through projection mapping or LED. It's the intersection of two disciplines — 3D capture and interactive installation — and the constraints are very different from a web embed.

If you want the foundation — what Gaussian splatting is, how scenes are captured, and how to commission a static web viewer — read our Gaussian Splatting guide first. This article assumes that and goes straight to the live, in-the-room build.

Who this is for: Creative directors, museum and exhibition producers, experiential and event agencies, and technical artists evaluating point clouds or 3DGS for a live installation — gallery, museum, brand activation, stage, or projection-mapped space.


Key Takeaways

  • Live ≠ web. A static splat viewer streams a baked scene to a browser. A live installation renders point clouds on dedicated on-site hardware, in real time, reacting to the room — a different engineering problem entirely.
  • Interactivity comes from sensors. Depth cameras (Azure Kinect / Orbbec class), LiDAR, motion tracking, and audio analysis drive how the cloud moves, reveals, or deforms.
  • Output is physical. Projection mapping, LED walls/volumes, and large-format displays — each imposes its own brightness, resolution, and calibration demands.
  • Budget the frame, not the file. On-site you trade splat/point counts against a hard 60 FPS (or higher for some LED) frame budget. LOD and culling matter more than download size.
  • Two source paths. Pre-captured splats (cinematic, controllable) vs. live-captured point clouds from on-site depth sensors (the visitor is the content). Many installations blend both.
  • Cost range: A focused single-sensor projection piece starts around $25K–$60K; a multi-sensor, multi-surface signature installation runs $80K–$250K+, before hardware and venue.
  • Tooling: TouchDesigner and Notch dominate live point-cloud work; Unreal/Unity for heavier real-time scenes; WebGPU now makes browser-grade real-time splatting viable for kiosk-style pieces.

1. What "Interactive / Real-Time Point Cloud" Actually Means

A point cloud is a set of points in space, each with a position and color. A Gaussian splat is a richer cousin — each point carries an oriented, soft "blob" of color and opacity, which is why splats look photoreal where raw clouds look sparse. (For the full definition and capture workflow, see the Gaussian Splatting guide — we won't repeat it here.)

The word that changes everything in this article is live. There are three ways a point cloud can show up in an experience, and only the last is what we're building:

ModeWhat it isWhere it runsReacts to people?
Baked web viewerA pre-captured scene streamed to a browser to orbit/fly throughVisitor's deviceNo — fixed scene, user-controlled camera only
Real-time playbackA pre-captured cloud rendered live on-site, possibly animatedOn-site GPUIndirectly (timeline, triggers)
Interactive / sensor-drivenA cloud whose form, motion, or reveal is driven by live inputOn-site GPUYes — directly, in real time

When someone searches "create interactive point cloud for art installation projection mapping," they mean the bottom row. The cloud is not a thing you look at; it's a thing that responds.


2. Real-Time vs. Baked: The Core Tradeoff

This is the decision that shapes the whole project.

Baked (web/video)Real-time (live installation)
RenderingPre-computed or streamed; predictableRendered every frame on-site; must hit frame budget
InteractivityNone to camera-onlyFull — geometry can change per frame from sensor input
Quality ceilingVery high (offline render time available)Bounded by on-site GPU and frame budget
Failure modeSlow loadDropped frames, latency, sensor dropout — visible to the audience
ContentFixed at capture timeCan incorporate the live visitor
Best forMarketing, documentation, remote toursGallery, stage, activation, museum interactives

The honest tradeoff: baked buys you quality, real-time buys you reaction. A baked Unreal render of a splat scene will always out-shine the same scene rendered live. But it can never dissolve because you walked toward it. If the brief includes the word "responds," "reacts," or "the visitor," you are in real-time territory and every other decision flows from the frame budget.


3. How Interactivity Works: Driving the Cloud

Interactivity means a sensor produces a signal, and that signal drives a parameter of the point cloud — position, color, size, opacity, emission, or which points are even visible. The common input layers:

InputHardware (2026)What it's good atWatch-outs
Depth / skeletonAzure Kinect (EOL — stock/used), Orbbec Femto, Intel RealSenseBody position, gesture, presence; live point-cloud capture of visitorsKinect discontinued — plan sensor supply; IR interference between units
LiDAR / ToF2D safety LiDAR, solid-state ToFLarge-area presence, floor zones, crowd density2D gives position not pose; calibration to projection space
Camera / CVRGB + on-device ML (pose, optical flow, segmentation)Markerless tracking, silhouette extraction, hand trackingLighting-dependent; latency from inference
Motion / IMUPhones, wearables, tracked propsPer-visitor signal in a controlled flowOnboarding friction; battery/logistics
AudioMic / line-in + FFTAudio-reactive deformation, beat-synced reveals — cheap and robustCrowd noise; needs a clean source for music pieces

The dominant pattern for sensor-driven art is a depth camera feeding a real-time engine, where the visitor's body becomes a force field acting on the cloud — they push points aside, reveal hidden structure, or become the cloud as their own depth capture is splatted live. RGB-D depth sensing plus dynamic projection mapping is a well-established real-time pipeline; the art is in the mapping from signal to motion, not the sensing itself.

Latency is the whole game. A 16 ms (one-frame) lag feels alive; 80–100 ms feels broken. Budget your sensor read, processing, and render so total motion-to-photon stays under ~50 ms.


4. Output: Getting the Cloud Onto a Physical Surface

A live point cloud has to land somewhere in the room. The three workhorses:

OutputStrengthsConstraintsTypical use
Projection mappingScales to architecture; transforms existing surfaces; relatively cheap per m²Brightness vs. ambient light; needs dark-ish space; surface calibration; shadows from visitorsWalls, floors, objects, large rooms
LED wall / volumeBright, sharp, works in lit spaces; high contrast for point/particle lookCost per m²; pixel pitch limits fine detail; weight/riggingLobbies, stages, retail, daylight venues
Large-format displayPlug-and-play; reliable; calibration-freeBounded size; bezels for multi-panelKiosks, single-screen interactives

Point clouds and splats happen to love projection and LED: a sparse, glowing point field reads beautifully against black, and the additive, particle-like aesthetic hides the seams and falloff that plague textured video. For projection specifically, the same depth sensors that drive interactivity can also feed real-time surface calibration, keeping the mapping aligned as the scene or geometry shifts.

If projection mapping is new to you, our installation vs. projection-mapping comparison covers where each output medium fits.


5. Tooling & Pipeline

There is no single "point-cloud installation" app. You assemble a pipeline: a real-time engine, point-cloud/splat support, sensor I/O, and output/calibration. The current landscape:

ToolRolePoint-cloud / splat supportBest for
TouchDesignerNode-based real-time engine; the default for installationsNative point-cloud TOPs/POPs; Gaussian-splat rendering now availableSensor-driven, projection-mapped, audio-reactive pieces
NotchReal-time motion/VFX, strong in live eventsParticle & point-cloud systems; media-server integrationStage, broadcast, large LED shows
Unreal EngineHeavy real-time renderer3DGS plugins (e.g. XScene), Niagara particlesCinematic real-time, complex scenes, XR
UnityReal-time engineSplat packages, VFX Graph point cloudsCustom interactive apps, multi-platform
WebGPU (custom)Browser-grade real-time splattingReal-time 3DGS renderers — see belowKiosks, web-delivered interactives, lightweight on-site
openFrameworks / ProcessingCreative-coding frameworksRaw point clouds via depth SDKsBespoke, low-level control, research-y pieces

On WebGPU: real-time splatting in the browser crossed the viability line in 2025. Open renderers like web-splat report >200 FPS on an RTX 3090 (and ~130 FPS on an 8-year-old GPU), and newer WebGPU engines push ~2–16 ms/frame on millions of Gaussians — 60–135× faster than the WebGL generation. That makes a browser a legitimate on-site runtime for kiosk-scale interactives, not just a remote viewer. For a signature projection piece driven by multiple depth cameras, though, TouchDesigner or Notch remains the pragmatic choice.

For the deeper Three.js/WebGPU performance playbook, see our Three.js best practices.


6. On-Site Performance & Technical Constraints

This is where live installations live or die. A web viewer can drop to 30 FPS on a phone and nobody notices. A projected installation that stutters in front of a crowd has failed publicly.

The frame budget is the constraint. At 60 FPS you have 16.6 ms per frame for everything: sensor read, signal processing, simulation, sort, render, and output. Splats must be depth-sorted every frame as the camera or geometry moves — that sort is often the bottleneck, which is exactly why GPU compute (WebGPU / engine compute shaders) matters on-site.

Rough planning numbers for a single on-site GPU (RTX 4070/4080-class):

Scene scalePoints / splatsRealistic targetNotes
Intimate / single surfaceup to ~1M60 FPS comfortableRoom for heavy interactivity
Standard installation1–4M60 FPS with LOD + cullingMost projection pieces
Large / high-res LED4–8M60 FPS needs tuning, maybe dual-GPUSorting + fill-rate bound
Multi-surface / multi-output8M+Split across machines / media serversSync becomes the hard problem

Levers when you blow the budget: reduce point count (downsample), aggressive frustum/opacity culling, LOD tiers, lower the projection/render resolution, simplify the per-point shader, or split rendering across machines. What you don't do is ship raw 10M-splat captures to a live show — same lesson as web optimization, higher stakes.

Other on-site realities: dedicated machine per output is safest; plan thermals for 8-hour days; have a watchdog/auto-restart; and always build a graceful idle/attract loop and a sensor-dropout fallback so the piece never shows a frozen frame or an error.


7. Live-Capture vs. Pre-Captured Splats

Where do the points come from? Two philosophies, often combined:

Pre-captured splatsLive-captured point cloud
Source3DGS capture of a place/object, done in advanceOn-site depth cameras capturing the room/visitor in real time
LookPhotoreal, art-directed, stableRawer, sparser, lower per-frame fidelity
The visitorActs on the scene (force, reveal)Becomes the scene (their own cloud is rendered)
RiskCapture quality fixed up frontSensor noise, lighting, calibration live
Feels likeA world you enterA mirror that dissolves

The most striking pieces blend the two: a pre-captured, art-directed splat environment that the visitor's live depth capture disturbs, merges into, or reveals. Pre-captured gives you control and beauty; live capture gives you the uncanny "that's me" moment that makes people stay.


8. Cost & Effort

Live installations are priced by complexity — number of sensors, number of output surfaces, bespoke interaction design, and on-site commissioning — not by minutes of content. Indicative 2026 ranges (creative/engineering scope; hardware and venue separate):

TierScopeIndicative cost (USD)Timeline
FocusedOne sensor, one surface, one clear interaction$25K–$60K6–10 weeks
SignatureMultiple sensors, projection mapping or LED, bespoke interaction & content$80K–$250K12–20 weeks
FlagshipMulti-room, multi-surface, live + pre-captured blend, custom hardware$250K+20+ weeks

Hardware to budget separately: projectors or LED (often the single biggest line item), on-site GPUs/machines, depth sensors, mounts/rigging, and spares. Don't forget commissioning: on-site calibration, tuning interaction in the real room, and a contingency for the gap between "works in the studio" and "works on the floor" — typically 15–25% of effort and routinely underestimated.

For how installations earn back that spend, see interactive installation revenue models.


9. Real Installation Examples

A few touchstones for the live point-cloud aesthetic and what they demonstrate:

  • Ryoji Ikeda — the reference point for data- and point-driven audiovisual installations. Works like data-verse and the transfinite turn vast datasets into immersive point/particle fields synced to sound across monumental projection surfaces — the canonical "points as material" language.
  • Depth-camera body pieces — a lineage of installations (from openFrameworks/Kinect experiments onward) where a depth sensor captures a performer or visitor and renders them as a live point cloud, often projection-mapped. The technical pattern — RGB-D capture → real-time point cloud → projection — is well documented in both art and research.
  • TouchDesigner splat experiments — the community's 2025 work bringing Gaussian splats into TouchDesigner's point operators (documented here) shows the now-standard live pipeline: a captured splat scene, manipulated and rendered in real time alongside sensor input.

Use these as a vocabulary, not a template — the work that lands is the one whose interaction is specific to its space and audience.


10. Common Pitfalls

  • Treating it like a web viewer. Web optimization habits help, but on-site is a real-time rendering problem with a hard frame budget and physical output — plan for that, not for download size.
  • No fallback state. Sensors drop out, machines hiccup. Without an attract loop and a sensor-loss fallback, the audience sees a frozen frame or an error. Build these first, not last.
  • Underestimating commissioning. It always behaves differently in the real room — ambient light, surface, crowd shadows, IR interference. Budget on-site days and a contingency.
  • Latency creep. Each layer (sensor → processing → render → output) adds milliseconds. Measure motion-to-photon early; an interaction that lags doesn't feel interactive.
  • Over-scoping the cloud. A 10M-splat scene that drops frames is worse than a 2M-splat scene that holds 60 FPS. The room reads smoothness before it reads detail.
  • Single point of failure. One machine driving everything, no spares, no watchdog — fine in a demo, a liability in a month-long exhibition.

11. Why Utsubo

This article sits at the exact intersection of two things Utsubo does — Gaussian splatting and interactive installations — which is rarer than it sounds. Most splat shops build web viewers; most installation studios don't touch point clouds. We do both, which means we can take a captured splat scene and make it react in a physical room.

We're an Osaka-based creative studio with a foreign-led team, recognized with 12× FWA and 7× Awwwards for real-time and interactive work. Our interactive Hokusai installation at Expo 2025 Osaka is a live example of the pattern in this guide — a body-driven particle field where visitors control the motion in real time, exhibited on the floor of a world's fair.

If you're scoping a live point-cloud piece, we can help you make the real-time-vs-baked call, choose the sensor and output stack, and build something that survives a public floor.


Get Started with an Interactive Point-Cloud Installation

Whether you have a space and a brief, or just a sense that point clouds are the right material for your idea, we're happy to talk it through.

Book a 30-minute project discussion to pressure-test the concept, the interaction, and the technical approach, and to find out whether we're the right fit.


FAQs

What's the difference between an interactive point-cloud installation and a Gaussian splatting web viewer? A web viewer streams a pre-captured, baked scene to a browser, where the only interaction is moving the camera. An interactive installation renders point clouds live on dedicated on-site hardware, in real time, and changes the cloud itself in response to sensors — what visitors do physically alters what they see.

What hardware drives the interactivity? Most commonly depth cameras (Orbbec, Intel RealSense, or the now-discontinued Azure Kinect) for body and gesture, plus LiDAR for large-area presence, RGB cameras with on-device ML for markerless tracking, and audio input for sound-reactive pieces. The sensor produces a signal that drives a parameter of the point cloud.

Can I use Gaussian splats live, or only static point clouds? Yes, splats can run live. Engines like TouchDesigner, Notch, Unreal, and Unity now render Gaussian splats in real time, and WebGPU makes browser-based real-time splatting viable. The constraint is the per-frame depth sort and your on-site GPU, not the splat format itself.

How many points or splats can I render live? On a single modern GPU (RTX 4070/4080-class), plan for roughly 1–4 million points at a comfortable 60 FPS with level-of-detail and culling. Larger or higher-resolution scenes push 4–8M and need tuning or multiple machines. Smoothness matters more to the audience than raw point count.

Projection mapping or LED — which should I use? Projection mapping is cheaper per square meter and transforms existing architecture, but needs a darker space and careful calibration. LED is brighter, sharper, and works in lit venues, but costs more and has a pixel-pitch limit on fine detail. The point-cloud aesthetic reads well on both because it glows against black.

What does an interactive point-cloud installation cost? A focused single-sensor, single-surface piece typically runs $25K–$60K in creative and engineering scope. A multi-sensor signature installation with projection or LED runs $80K–$250K+. Hardware (projectors/LED, GPUs, sensors) and venue costs are separate, and on-site commissioning should be budgeted explicitly.

How long does a project take? A focused piece is roughly 6–10 weeks; a signature installation 12–20 weeks; a flagship multi-room piece 20+ weeks. On-site commissioning — calibrating and tuning the interaction in the real room — is a distinct phase that's routinely underestimated.

What happens if a sensor fails during the exhibition? A well-built installation never shows a frozen frame or an error. It runs an attract/idle loop when no one is present and falls back gracefully if a sensor drops out, with a watchdog that auto-restarts the runtime. These states should be designed first, not bolted on at the end.

Osaka Interactive Installation StudioOsaka Interactive Installation Studio

Real-time 3D × sensors × generative systems—built to earn attention, emotion, and sharing for brands, museums, and public spaces.

Learn more