← All Work

Rubin Anomalies

An anomaly explorer of data from one of the most powerful space telescopes on Earth.

An adjacent technical build focused on live API integration, Python process design, data modeling, and turning unfamiliar data into a queryable workflow.

Python Flask REST API ANTARES Stream Processing SQLite nginx systemd VPS Deploy
Type Personal project
Role Sole developer
Stack Python · Flask · ANTARES API · SQLite

The Problem

The Rubin Observatory and Zwicky Transient Facility (ZTF) generate massive alert streams: millions of astronomical events per night flagged as potentially interesting transients, variable stars, or anomalies. The ANTARES broker filters and annotates this stream via a live REST API, but the data is only useful if you know how to query it, what fields matter, and how to filter signal from noise.

The goal was to go beyond just making API calls: understand the ANTARES data model deeply enough to write meaningful filters, persist the right subset of fields into a queryable local database, and surface insights through a web interface that makes the data actually explorable.

The Pipeline

1
ANTARES API Integration
A persistent Python process authenticates against the live ANTARES REST API, queries for new loci using tag-based filters, and paginates through results. Understanding the API's data model — loci, tags, properties, timestamps — was the core challenge, not just making the calls.
2
Filtering and Storage
Anomaly candidates are identified by evaluating ANTARES tag combinations and property thresholds. Qualifying loci are written to SQLite with coordinates, tags, alert counts, and summary fields selected to support meaningful queries. A backfill script seeds historical candidates on first run.
3
Web Interface
Flask serves the candidate list and per-locus detail views, exposing data that would otherwise require direct API access and domain knowledge to retrieve. A JSON endpoint makes the local dataset queryable externally.
4
Deployment
nginx reverse proxy in front of gunicorn; both the web process and stream consumer run as independent systemd services on a DreamHost VPS.

What It Demonstrates

  • Real third-party API integration: authenticating against a live scientific database, understanding its data model, and writing filters that return meaningful results
  • Database design for insight: choosing which fields to persist and how to structure them so queries can surface patterns the raw API does not expose directly
  • Real-time pipeline design: continuous stream ingestion decoupled from the serving layer
  • Running two independent long-lived processes (consumer + web app) as managed services
  • Production deployment with nginx, gunicorn, systemd, and a repeatable deploy workflow
The point is not that this is an astronomy project. The point is that working with an unfamiliar live API, learning its data model from scratch, and building a pipeline that turns it into something queryable is the same skill set required to integrate any new data source — marketing, sales, or otherwise.