SDK Usage
The timedb SDK is a single class — TimeDBClient — that
talks to ClickHouse via HTTP. It owns no metadata, no catalog, no
fluent series API: callers identify series by integer series_id and
manage any naming / labeling / unit handling themselves.
Overview
TimeDB stores rows in two tables:
series_values— append-only time-series store. Sort key(series_id, valid_time, knowledge_time, change_time)covers both flat and overlapping reads. Partitioned by(retention, month)so TTLs drop whole partitions.run_series— tiny(series_id → run_id)mapping. Lets “which runs touched this series” lookups skip the data table.
A single row carries:
series_id— externally assigned integer identityvalid_time— when the value appliesknowledge_time— when the value became known (forecast issue time)change_time— when the row was writtenvalue— float64valid_time_end— interval end (default2200-01-01, treat as +∞)run_id— a UUID7 truncated to 63 bits (one per write batch unless overridden)changed_by/annotation— optional audit textretention— TTL tier:short/medium/long/forever
Connecting
from timedb import TimeDBClient
td = TimeDBClient() # reads TIMEDB_CH_URL
td = TimeDBClient(ch_url="http://localhost:8123/") # explicit
Schema management
Both calls are idempotent.
td.create() # CREATE TABLE IF NOT EXISTS for series_values + run_series
td.delete() # DROP TABLE IF EXISTS for both — destroys all data
Writing data
write() accepts a Pandas or Polars DataFrame. Required columns are
series_id, valid_time, value. Everything else is optional;
missing columns get safe defaults stamped per batch.
import polars as pl
from datetime import datetime, timezone
kt = datetime(2025, 1, 1, 6, tzinfo=timezone.utc)
df = pl.DataFrame({
"series_id": [42] * 24,
"valid_time": [datetime(2025, 1, 1, h, tzinfo=timezone.utc) for h in range(24)],
"value": [100.0 + i * 2 for i in range(24)],
})
td.write(df, retention="medium", knowledge_time=kt)
Optional column / kwarg defaults:
knowledge_time— kwarg, elsedatetime.now(UTC)for the batch. Cannot be passed as both kwarg and column simultaneously.change_time— column ordatetime.now(UTC)for the batch.run_id— column or one client-generated UUID7 (top 63 bits) per batch. Time-sortable, and round-trips throughInt64/UInt64cleanly.valid_time_end— column or sentinel2200-01-01.changed_by/annotation— column or"".retention— kwarg, column, or default"forever"(no TTL). Also cannot be passed as both kwarg and column.
All timestamp columns must be timezone-aware (polars.Datetime with a
non-null time_zone). Naive timestamps raise ValueError.
Each write() call also writes one (series_id, run_id) row per
unique pair into run_series so that downstream metadata layers can
reverse-lookup runs without scanning series_values.
Forecast revisions are just additional rows with a later
knowledge_time:
kt2 = datetime(2025, 1, 2, 6, tzinfo=timezone.utc)
df2 = pl.DataFrame({
"series_id": [42] * 24,
"valid_time": [datetime(2025, 1, 1, h, tzinfo=timezone.utc) for h in range(24)],
"value": [105.0 + i * 2 for i in range(24)],
})
td.write(df2, retention="medium", knowledge_time=kt2)
Corrections to a specific forecast run use the same pattern: pass an
older knowledge_time (the run being corrected) with a fresh
change_time and the corrected value. The reads documented below
collapse correction chains automatically.
Retention tiers
The full set of valid retention values is exported as
RETENTION_TIERS:
from timedb import RETENTION_TIERS
# frozenset({"short", "medium", "long", "forever"})
Mapping to actual TTL (defined inline in the DDL):
short→ 180 daysmedium→ 1095 days (~3 years)long→ 1825 days (~5 years)forever→ no TTL (the default)
TTL evaluation runs against valid_time and is partition-aligned, so
expirations drop whole partitions rather than walking rows.
Reading data
read() returns a Polars DataFrame. The shape depends on two boolean
flags:
|
|
Returned columns |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Each combination answers a different question:
Default — latest value per
valid_time, picking the row with the largest(knowledge_time, change_time)tuple.include_knowledge_time=True— every forecast run for eachvalid_time, side-by-side. Within a single run, the latest correction wins.include_updates=True— full correction chain on the currently-winning forecast run, withchanged_by/annotationper state transition.Both — full 3D audit log: every state transition for every forecast run.
# Latest values
latest = td.read(series_ids=[42])
# Forecast history (one row per (knowledge_time, valid_time))
history = td.read(series_ids=[42], include_knowledge_time=True)
# Time range filters (UTC datetimes)
from datetime import datetime, timezone
window = td.read(
series_ids=[42, 43],
start_valid=datetime(2025, 1, 1, tzinfo=timezone.utc),
end_valid=datetime(2025, 2, 1, tzinfo=timezone.utc),
)
# Filter by knowledge_time too
recent_forecasts = td.read(
series_ids=[42],
start_known=datetime(2024, 12, 1, tzinfo=timezone.utc),
end_known=datetime(2025, 1, 15, tzinfo=timezone.utc),
include_knowledge_time=True,
)
# Restrict to one or more retention tiers
tier = td.read(series_ids=[42], retention="medium")
tiers = td.read(series_ids=[42], retention=["medium", "long"])
Filtering by retention is a partition prune at the storage layer —
reads restricted to one tier never touch the others.
Per-window cutoffs (read_relative)
For backtesting and day-ahead simulation, read_relative() returns —
for each window — the latest forecast issued at or before a per-window
cutoff. This simulates “what forecast was available at decision time”.
Low-level mode — explicit window length and offset:
from datetime import datetime, timedelta, timezone
df = td.read_relative(
series_ids=[42],
window_length=timedelta(hours=24),
issue_offset=timedelta(hours=-12), # 12h before each window start
start_window=datetime(2026, 2, 1, tzinfo=timezone.utc),
start_valid=datetime(2026, 2, 1, tzinfo=timezone.utc),
end_valid=datetime(2026, 3, 1, tzinfo=timezone.utc),
)
Daily shorthand — fixed 1-day windows with a human-friendly cutoff (mirrors Energy Quantified’s instances.relative()):
from datetime import time
df = td.read_relative(
series_ids=[42],
days_ahead=1, # day-ahead
time_of_day=time(6, 0), # by 06:00 on the issue day
start_valid=datetime(2026, 2, 1, tzinfo=timezone.utc),
end_valid=datetime(2026, 2, 28, tzinfo=timezone.utc),
)
The two parameter sets are mutually exclusive — mixing them raises
ValueError. Returns (series_id, valid_time, value) — one row
per cutoff-winning forecast.
Run lookups
Each write() call generates a single client-side run_id (a UUID7
truncated to 63 bits) unless run_id is supplied as a column on the
DataFrame. run_series indexes the (series, run) pairs:
# Newest run first.
run_ids = td.read_run_series(series_id=42)
Best practices
Always use timezone-aware UTC datetimes. Naive timestamps raise.
from datetime import datetime, timezone good = datetime(2025, 1, 1, 12, tzinfo=timezone.utc)
Use
knowledge_timefor forecast revisions. The same(series_id, valid_time)can have manyknowledge_timevalues; each represents a distinct forecast run.Append corrections, don’t UPDATE. A correction is a new row with the same
(series_id, valid_time, knowledge_time)tuple, a freshchange_time, and the newvalue. Reads pick the latestchange_timeautomatically.Pick a retention tier per series, not per write. The DDL partitions on
retention; mixing tiers within a single series defeats the partition pruning.Hold series metadata externally. TimeDB doesn’t track series names, units, or labels. Use
energydb(or your own catalog table) to keep that mapping.
Error handling
try:
td.write(df, retention="bogus")
except ValueError as e:
print(e) # "Unknown retention 'bogus'. Valid values: [...]"
The most common ValueErrors come from:
An unknown
retentionvalue (kwarg or column).Naive timestamps in any time column.
Missing required columns (
series_id,valid_time,value).Both kwarg and column supplied for
retentionorknowledge_time.
Connection errors (ClickHouse unreachable, auth failure) propagate
directly from clickhouse-connect.