TimeDB Quickstart

TimeDB is a thin ClickHouse client for 3-dimensional time series. Every value carries a valid_time (the wall-clock timestamp it describes) and a knowledge_time (when the value was learned). That makes it trivial to store forecast revisions, audit corrections, and reconstruct what was known at any past instant.

This quickstart walks through the full 3-dimensional model:

  1. Setup

  2. Insert a forecast and read it back

  3. Insert a revised forecast — see the latest value win

  4. Read the full version history

  5. Visualize how forecast revisions evolved

  6. Correct an erroneous value (immutable storage; corrections are new rows)

  7. Audit the correction trail

1. Setup

TimeDBClient reads connection settings from TIMEDB_CH_URL. Series identity is supplied externally — TimeDB itself stores no metadata, only events keyed by series_id.

[1]:
try:
    import urllib.request
    import google.colab  # noqa: F401

    urllib.request.urlretrieve(
        "https://raw.githubusercontent.com/rebase-energy/timedb/main/examples/colab_setup.py", "/tmp/colab_setup.py"
    )
    exec(open("/tmp/colab_setup.py").read())
except ImportError:
    pass
[2]:
from datetime import UTC, datetime, timedelta

import polars as pl
from timedb import TimeDBClient

td = TimeDBClient()
td.delete()
td.create()
print("schema ready")

# Series identity is owned upstream (e.g. by energydb). For this demo we just
# pick an integer.
SERIES_ID = 1
base_vt = datetime(2026, 1, 1, tzinfo=UTC)
schema ready

2. Insert a forecast

The dataframe needs series_id, valid_time, and value. retention (short / medium / long) chooses the TTL tier and knowledge_time stamps when the forecast was issued.

[3]:
def make_forecast(kt: datetime, bias: float, n: int = 24) -> pl.DataFrame:
    return pl.DataFrame(
        {
            "series_id": [SERIES_ID] * n,
            "valid_time": [base_vt + timedelta(hours=i) for i in range(n)],
            "value": [50.0 + bias + 0.5 * i for i in range(n)],
        }
    )


kt_run1 = base_vt - timedelta(hours=12)
df_run1 = make_forecast(kt_run1, bias=0.0)

td.write(df_run1, retention="medium", knowledge_time=kt_run1)
print(f"wrote {df_run1.height} rows  knowledge_time={kt_run1}")
wrote 24 rows  knowledge_time=2025-12-31 12:00:00+00:00

3. Insert a revised forecast

Six hours later the producer issues a corrected forecast for the same valid_time window. TimeDB doesn’t overwrite anything — both runs are kept.

[4]:
kt_run2 = base_vt - timedelta(hours=6)
df_run2 = make_forecast(kt_run2, bias=2.5)

td.write(df_run2, retention="medium", knowledge_time=kt_run2)
print(f"wrote {df_run2.height} rows  knowledge_time={kt_run2}")
wrote 24 rows  knowledge_time=2025-12-31 18:00:00+00:00

4. Read the latest forecast

read() returns one row per valid_time — always the most recently issued forecast (highest knowledge_time). Run 2 wins for every hour.

[5]:
latest = td.read(series_ids=[SERIES_ID])
print(f"{latest.height} rows (one per valid_time)")
print(latest)
24 rows (one per valid_time)
shape: (24, 3)
┌───────────┬─────────────────────────┬───────┐
│ series_id ┆ valid_time              ┆ value │
│ ---       ┆ ---                     ┆ ---   │
│ u64       ┆ datetime[μs, UTC]       ┆ f64   │
╞═══════════╪═════════════════════════╪═══════╡
│ 1         ┆ 2026-01-01 00:00:00 UTC ┆ 52.5  │
│ 1         ┆ 2026-01-01 01:00:00 UTC ┆ 53.0  │
│ 1         ┆ 2026-01-01 02:00:00 UTC ┆ 53.5  │
│ 1         ┆ 2026-01-01 03:00:00 UTC ┆ 54.0  │
│ 1         ┆ 2026-01-01 04:00:00 UTC ┆ 54.5  │
│ …         ┆ …                       ┆ …     │
│ 1         ┆ 2026-01-01 19:00:00 UTC ┆ 62.0  │
│ 1         ┆ 2026-01-01 20:00:00 UTC ┆ 62.5  │
│ 1         ┆ 2026-01-01 21:00:00 UTC ┆ 63.0  │
│ 1         ┆ 2026-01-01 22:00:00 UTC ┆ 63.5  │
│ 1         ┆ 2026-01-01 23:00:00 UTC ┆ 64.0  │
└───────────┴─────────────────────────┴───────┘

5. Read the full revision history

include_knowledge_time=True returns one row per (series_id, knowledge_time, valid_time) — every forecast run side-by-side.

[6]:
history = td.read(series_ids=[SERIES_ID], include_knowledge_time=True)
print(f"{history.height} rows across {history['knowledge_time'].n_unique()} runs")
print(history)
48 rows across 2 runs
shape: (48, 4)
┌───────────┬─────────────────────────┬─────────────────────────┬───────┐
│ series_id ┆ knowledge_time          ┆ valid_time              ┆ value │
│ ---       ┆ ---                     ┆ ---                     ┆ ---   │
│ u64       ┆ datetime[μs, UTC]       ┆ datetime[μs, UTC]       ┆ f64   │
╞═══════════╪═════════════════════════╪═════════════════════════╪═══════╡
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 00:00:00 UTC ┆ 50.0  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 00:00:00 UTC ┆ 52.5  │
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 01:00:00 UTC ┆ 50.5  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 01:00:00 UTC ┆ 53.0  │
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 02:00:00 UTC ┆ 51.0  │
│ …         ┆ …                       ┆ …                       ┆ …     │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 21:00:00 UTC ┆ 63.0  │
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 22:00:00 UTC ┆ 61.0  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 22:00:00 UTC ┆ 63.5  │
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 23:00:00 UTC ┆ 61.5  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 23:00:00 UTC ┆ 64.0  │
└───────────┴─────────────────────────┴─────────────────────────┴───────┘

6. Visualize the 3-dimensional evolution

Each line is one forecast run; runs issued closer to the valid window converge toward the truth. The thick black line is what read() returns — the latest available value per valid_time.

[7]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(11, 4))
for kt, run_df in history.sort("valid_time").group_by("knowledge_time", maintain_order=True):
    ax.plot(
        run_df["valid_time"].to_list(),
        run_df["value"].to_list(),
        marker="o",
        markersize=3,
        alpha=0.7,
        label=f"kt={kt[0].strftime('%Y-%m-%d %H:%M')}",
    )

latest_sorted = latest.sort("valid_time")
ax.plot(
    latest_sorted["valid_time"].to_list(),
    latest_sorted["value"].to_list(),
    color="black",
    linewidth=2.5,
    label="latest (read)",
    zorder=10,
)

ax.set_xlabel("valid_time")
ax.set_ylabel("value")
ax.set_title("Forecast evolution — each run vs. latest read")
ax.legend(loc="upper left")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
../_images/notebooks_quickstart_13_0.png

7. Correct an erroneous value

ClickHouse is append-only, so corrections are just new rows with a later knowledge_time. Suppose hours 10–12 of run 2 had a sensor glitch. We re-write only those three rows with a fresh knowledge_time and the corrected values.

[8]:
correction_kt = base_vt
hours_to_fix = [10, 11, 12]

correction = pl.DataFrame(
    {
        "series_id": [SERIES_ID] * len(hours_to_fix),
        "valid_time": [base_vt + timedelta(hours=h) for h in hours_to_fix],
        # Replace the buggy values with something sensible
        "value": [60.0, 61.0, 62.0],
    }
)
td.write(correction, retention="medium", knowledge_time=correction_kt)
print(f"correction issued at kt={correction_kt}  ({correction.height} rows)")
correction issued at kt=2026-01-01 00:00:00+00:00  (3 rows)

8. Verify the correction won — and audit the trail

read() shows the corrected values at hours 10–12. The full history still preserves every prior version.

[9]:
latest = td.read(series_ids=[SERIES_ID]).sort("valid_time")
window = latest.filter(pl.col("valid_time").is_between(base_vt + timedelta(hours=9), base_vt + timedelta(hours=13)))
print("Latest values around the corrected window:")
print(window)
Latest values around the corrected window:
shape: (5, 3)
┌───────────┬─────────────────────────┬───────┐
│ series_id ┆ valid_time              ┆ value │
│ ---       ┆ ---                     ┆ ---   │
│ u64       ┆ datetime[μs, UTC]       ┆ f64   │
╞═══════════╪═════════════════════════╪═══════╡
│ 1         ┆ 2026-01-01 09:00:00 UTC ┆ 57.0  │
│ 1         ┆ 2026-01-01 10:00:00 UTC ┆ 60.0  │
│ 1         ┆ 2026-01-01 11:00:00 UTC ┆ 61.0  │
│ 1         ┆ 2026-01-01 12:00:00 UTC ┆ 62.0  │
│ 1         ┆ 2026-01-01 13:00:00 UTC ┆ 59.0  │
└───────────┴─────────────────────────┴───────┘
[10]:
history = td.read(series_ids=[SERIES_ID], include_knowledge_time=True).sort(["valid_time", "knowledge_time"])
audit = history.filter(pl.col("valid_time").is_between(base_vt + timedelta(hours=9), base_vt + timedelta(hours=13)))
print("Full audit — every (knowledge_time, valid_time) pair:")
print(audit)
Full audit — every (knowledge_time, valid_time) pair:
shape: (13, 4)
┌───────────┬─────────────────────────┬─────────────────────────┬───────┐
│ series_id ┆ knowledge_time          ┆ valid_time              ┆ value │
│ ---       ┆ ---                     ┆ ---                     ┆ ---   │
│ u64       ┆ datetime[μs, UTC]       ┆ datetime[μs, UTC]       ┆ f64   │
╞═══════════╪═════════════════════════╪═════════════════════════╪═══════╡
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 09:00:00 UTC ┆ 54.5  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 09:00:00 UTC ┆ 57.0  │
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 10:00:00 UTC ┆ 55.0  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 10:00:00 UTC ┆ 57.5  │
│ 1         ┆ 2026-01-01 00:00:00 UTC ┆ 2026-01-01 10:00:00 UTC ┆ 60.0  │
│ …         ┆ …                       ┆ …                       ┆ …     │
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 12:00:00 UTC ┆ 56.0  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 12:00:00 UTC ┆ 58.5  │
│ 1         ┆ 2026-01-01 00:00:00 UTC ┆ 2026-01-01 12:00:00 UTC ┆ 62.0  │
│ 1         ┆ 2025-12-31 12:00:00 UTC ┆ 2026-01-01 13:00:00 UTC ┆ 56.5  │
│ 1         ┆ 2025-12-31 18:00:00 UTC ┆ 2026-01-01 13:00:00 UTC ┆ 59.0  │
└───────────┴─────────────────────────┴─────────────────────────┴───────┘

Summary

Concept

Description

valid_time

The timestamp the value describes

knowledge_time

When the value was learned

td.write(df, retention=, knowledge_time=)

Append rows; corrections are just new rows with a later knowledge_time

td.read(series_ids=)

Latest value per valid_time

td.read(series_ids=, include_knowledge_time=True)

Every (knowledge_time, valid_time) pair — full audit

For richer hierarchies (sites, assets, edges, units, runs metadata) use `energydb <../../energydb/examples/quickstart.ipynb>`__ on top.