Case Study

Rubin Anomalies

An anomaly explorer of data from one of the most powerful space telescopes on Earth.

An adjacent technical build focused on live API integration, Python process design, data modeling, and turning unfamiliar data into a queryable workflow.

Python Flask REST API ANTARES Stream Processing SQLite nginx systemd VPS Deploy

Type Personal project

Role Sole developer

Stack Python · Flask · ANTARES API · SQLite

The Problem

The Rubin Observatory and Zwicky Transient Facility (ZTF) generate massive alert streams: millions of astronomical events per night flagged as potentially interesting transients, variable stars, or anomalies. The ANTARES broker filters and annotates this stream via a live REST API, but the data is only useful if you know how to query it, what fields matter, and how to filter signal from noise.

The goal was to go beyond just making API calls: understand the ANTARES data model deeply enough to write meaningful filters, persist the right subset of fields into a queryable local database, and surface insights through a web interface that makes the data actually explorable.

The Pipeline

ANTARES API Integration

A persistent Python process authenticates against the live ANTARES REST API, queries for new loci using tag-based filters, and paginates through results. Understanding the API's data model — loci, tags, properties, timestamps — was the core challenge, not just making the calls.

Filtering and Storage

Anomaly candidates are identified by evaluating ANTARES tag combinations and property thresholds. Qualifying loci are written to SQLite with coordinates, tags, alert counts, and summary fields selected to support meaningful queries. A backfill script seeds historical candidates on first run.

Web Interface

Flask serves the candidate list and per-locus detail views, exposing data that would otherwise require direct API access and domain knowledge to retrieve. A JSON endpoint makes the local dataset queryable externally.

Deployment

nginx reverse proxy in front of gunicorn; both the web process and stream consumer run as independent systemd services on a DreamHost VPS.

What It Demonstrates

Real third-party API integration: authenticating against a live scientific database, understanding its data model, and writing filters that return meaningful results
Database design for insight: choosing which fields to persist and how to structure them so queries can surface patterns the raw API does not expose directly
Real-time pipeline design: continuous stream ingestion decoupled from the serving layer
Running two independent long-lived processes (consumer + web app) as managed services
Production deployment with nginx, gunicorn, systemd, and a repeatable deploy workflow

      The point is not that this is an astronomy project. The point is that working with an unfamiliar live
      API, learning its data model from scratch, and building a pipeline that turns it into something queryable
      is the same skill set required to integrate any new data source — marketing, sales, or otherwise.