SDK Usage ========= The timedb SDK is a single class — :class:`~timedb.TimeDBClient` — that talks to ClickHouse via HTTP. It owns no metadata, no catalog, no fluent series API: callers identify series by integer ``series_id`` and manage any naming / labeling / unit handling themselves. Overview -------- TimeDB stores rows in two tables: - ``series_values`` — append-only time-series store. Sort key ``(series_id, valid_time, knowledge_time, change_time)`` covers both flat and overlapping reads. Partitioned by ``(retention, month)`` so TTLs drop whole partitions. - ``run_series`` — tiny ``(series_id → run_id)`` mapping. Lets "which runs touched this series" lookups skip the data table. A single row carries: - ``series_id`` — externally assigned integer identity - ``valid_time`` — when the value applies - ``knowledge_time`` — when the value became known (forecast issue time) - ``change_time`` — when the row was written - ``value`` — float64 - ``valid_time_end`` — interval end (default ``2200-01-01``, treat as +∞) - ``run_id`` — a UUID7 truncated to 63 bits (one per write batch unless overridden) - ``changed_by`` / ``annotation`` — optional audit text - ``retention`` — TTL tier: ``short`` / ``medium`` / ``long`` / ``forever`` Connecting ---------- .. code-block:: python from timedb import TimeDBClient td = TimeDBClient() # reads TIMEDB_CH_URL td = TimeDBClient(ch_url="http://localhost:8123/") # explicit Schema management ----------------- Both calls are idempotent. .. code-block:: python td.create() # CREATE TABLE IF NOT EXISTS for series_values + run_series td.delete() # DROP TABLE IF EXISTS for both — destroys all data Writing data ------------ ``write()`` accepts a Pandas or Polars DataFrame. Required columns are ``series_id``, ``valid_time``, ``value``. Everything else is optional; missing columns get safe defaults stamped per batch. .. code-block:: python import polars as pl from datetime import datetime, timezone kt = datetime(2025, 1, 1, 6, tzinfo=timezone.utc) df = pl.DataFrame({ "series_id": [42] * 24, "valid_time": [datetime(2025, 1, 1, h, tzinfo=timezone.utc) for h in range(24)], "value": [100.0 + i * 2 for i in range(24)], }) td.write(df, retention="medium", knowledge_time=kt) Optional column / kwarg defaults: - ``knowledge_time`` — kwarg, else ``datetime.now(UTC)`` for the batch. Cannot be passed as both kwarg *and* column simultaneously. - ``change_time`` — column or ``datetime.now(UTC)`` for the batch. - ``run_id`` — column or one client-generated UUID7 (top 63 bits) per batch. Time-sortable, and round-trips through ``Int64`` / ``UInt64`` cleanly. - ``valid_time_end`` — column or sentinel ``2200-01-01``. - ``changed_by`` / ``annotation`` — column or ``""``. - ``retention`` — kwarg, column, or default ``"forever"`` (no TTL). Also cannot be passed as both kwarg and column. All timestamp columns must be timezone-aware (``polars.Datetime`` with a non-null ``time_zone``). Naive timestamps raise ``ValueError``. Each ``write()`` call also writes one ``(series_id, run_id)`` row per unique pair into ``run_series`` so that downstream metadata layers can reverse-lookup runs without scanning ``series_values``. Forecast revisions are just additional rows with a later ``knowledge_time``: .. code-block:: python kt2 = datetime(2025, 1, 2, 6, tzinfo=timezone.utc) df2 = pl.DataFrame({ "series_id": [42] * 24, "valid_time": [datetime(2025, 1, 1, h, tzinfo=timezone.utc) for h in range(24)], "value": [105.0 + i * 2 for i in range(24)], }) td.write(df2, retention="medium", knowledge_time=kt2) Corrections to a *specific* forecast run use the same pattern: pass an older ``knowledge_time`` (the run being corrected) with a fresh ``change_time`` and the corrected ``value``. The reads documented below collapse correction chains automatically. Retention tiers --------------- The full set of valid retention values is exported as :data:`~timedb.RETENTION_TIERS`: .. code-block:: python from timedb import RETENTION_TIERS # frozenset({"short", "medium", "long", "forever"}) Mapping to actual TTL (defined inline in the DDL): - ``short`` → 180 days - ``medium`` → 1095 days (~3 years) - ``long`` → 1825 days (~5 years) - ``forever`` → no TTL (the default) TTL evaluation runs against ``valid_time`` and is partition-aligned, so expirations drop whole partitions rather than walking rows. Reading data ------------ ``read()`` returns a Polars DataFrame. The shape depends on two boolean flags: .. list-table:: :header-rows: 1 :widths: 25 25 50 * - ``include_knowledge_time`` - ``include_updates`` - Returned columns * - ``False`` (default) - ``False`` (default) - ``series_id, valid_time, value`` * - ``False`` - ``True`` - ``series_id, valid_time, change_time, value, changed_by, annotation`` * - ``True`` - ``False`` - ``series_id, knowledge_time, valid_time, value`` * - ``True`` - ``True`` - ``series_id, valid_time, knowledge_time, change_time, value, changed_by, annotation`` Each combination answers a different question: - *Default* — latest value per ``valid_time``, picking the row with the largest ``(knowledge_time, change_time)`` tuple. - ``include_knowledge_time=True`` — every forecast run for each ``valid_time``, side-by-side. Within a single run, the latest correction wins. - ``include_updates=True`` — full correction chain on the currently-winning forecast run, with ``changed_by`` / ``annotation`` per state transition. - Both — full 3D audit log: every state transition for every forecast run. .. code-block:: python # Latest values latest = td.read(series_ids=[42]) # Forecast history (one row per (knowledge_time, valid_time)) history = td.read(series_ids=[42], include_knowledge_time=True) # Time range filters (UTC datetimes) from datetime import datetime, timezone window = td.read( series_ids=[42, 43], start_valid=datetime(2025, 1, 1, tzinfo=timezone.utc), end_valid=datetime(2025, 2, 1, tzinfo=timezone.utc), ) # Filter by knowledge_time too recent_forecasts = td.read( series_ids=[42], start_known=datetime(2024, 12, 1, tzinfo=timezone.utc), end_known=datetime(2025, 1, 15, tzinfo=timezone.utc), include_knowledge_time=True, ) # Restrict to one or more retention tiers tier = td.read(series_ids=[42], retention="medium") tiers = td.read(series_ids=[42], retention=["medium", "long"]) Filtering by ``retention`` is a partition prune at the storage layer — reads restricted to one tier never touch the others. Per-window cutoffs (``read_relative``) -------------------------------------- For backtesting and day-ahead simulation, ``read_relative()`` returns — for each window — the latest forecast issued at or before a per-window cutoff. This simulates "what forecast was available at decision time". **Low-level mode** — explicit window length and offset: .. code-block:: python from datetime import datetime, timedelta, timezone df = td.read_relative( series_ids=[42], window_length=timedelta(hours=24), issue_offset=timedelta(hours=-12), # 12h before each window start start_window=datetime(2026, 2, 1, tzinfo=timezone.utc), start_valid=datetime(2026, 2, 1, tzinfo=timezone.utc), end_valid=datetime(2026, 3, 1, tzinfo=timezone.utc), ) **Daily shorthand** — fixed 1-day windows with a human-friendly cutoff (mirrors `Energy Quantified's instances.relative() `_): .. code-block:: python from datetime import time df = td.read_relative( series_ids=[42], days_ahead=1, # day-ahead time_of_day=time(6, 0), # by 06:00 on the issue day start_valid=datetime(2026, 2, 1, tzinfo=timezone.utc), end_valid=datetime(2026, 2, 28, tzinfo=timezone.utc), ) The two parameter sets are mutually exclusive — mixing them raises ``ValueError``. Returns ``(series_id, valid_time, value)`` — one row per cutoff-winning forecast. Run lookups ----------- Each ``write()`` call generates a single client-side ``run_id`` (a UUID7 truncated to 63 bits) unless ``run_id`` is supplied as a column on the DataFrame. ``run_series`` indexes the (series, run) pairs: .. code-block:: python # Newest run first. run_ids = td.read_run_series(series_id=42) Best practices -------------- 1. **Always use timezone-aware UTC datetimes.** Naive timestamps raise. .. code-block:: python from datetime import datetime, timezone good = datetime(2025, 1, 1, 12, tzinfo=timezone.utc) 2. **Use** ``knowledge_time`` **for forecast revisions.** The same ``(series_id, valid_time)`` can have many ``knowledge_time`` values; each represents a distinct forecast run. 3. **Append corrections, don't UPDATE.** A correction is a new row with the same ``(series_id, valid_time, knowledge_time)`` tuple, a fresh ``change_time``, and the new ``value``. Reads pick the latest ``change_time`` automatically. 4. **Pick a retention tier per series, not per write.** The DDL partitions on ``retention``; mixing tiers within a single series defeats the partition pruning. 5. **Hold series metadata externally.** TimeDB doesn't track series names, units, or labels. Use ``energydb`` (or your own catalog table) to keep that mapping. Error handling -------------- .. code-block:: python try: td.write(df, retention="bogus") except ValueError as e: print(e) # "Unknown retention 'bogus'. Valid values: [...]" The most common ``ValueError``\ s come from: - An unknown ``retention`` value (kwarg or column). - Naive timestamps in any time column. - Missing required columns (``series_id``, ``valid_time``, ``value``). - Both kwarg and column supplied for ``retention`` or ``knowledge_time``. Connection errors (ClickHouse unreachable, auth failure) propagate directly from ``clickhouse-connect``.