Services
Postgres
Docket Analyzer includes a custom PostgreSQL wrapper built on top of the Peewee ORM. The wrapper is designed to make it easy to work with existing tables or create "schemaless" tables on the fly. It also extends Peewee's query API with convenience methods for pandas support and batch operations.
Connecting
load_psql()
reads the connection URL from da configure postgres
and returns a
Database
instance. The object lazily introspects your database to discover
available tables.
Schemaless tables
You may create an empty table by name and then add columns dynamically. Tables loaded this way infer their schema via introspection each time they are accessed, so registering a model is recommended for performance.
# Create an empty table and add columns
from peewee import TextField, IntegerField
import pandas as pd
# create the table if it doesn't exist
db.create_table("users")
# access via db.t.<table_name>
users = db.t.users
users.add_column("email", column_type="TextField", unique=True)
users.add_column("age", column_type="IntegerField")
# insert data from a DataFrame
users.add_data(pd.DataFrame([
{"email": "alice@example.com", "age": 30},
{"email": "bob@example.com", "age": 25},
]))
Registering models
For better performance and type checking you can register a standard Peewee model with the database. Registered models bypass introspection and provide the same extended query API.
class User(DatabaseModel):
email = TextField(unique=True)
age = IntegerField()
class Meta:
table_name = "users"
# register and create the table
db.register_model(User)
db.create_table(User)
# queries now use the registered model
users = db.t.users
Extended query API
The table classes returned by db.t
inherit several helpers in addition to the
standard Peewee interface:
select()
automatically selects all columns if none are specified.where()
uses the extended select and returns a query object that supports pandas conversion.pandas(*cols, copy=False)
converts query results directly to apandas.DataFrame
. Settingcopy=True
uses the PostgreSQLCOPY
command for large transfers.sample(n)
returns a randomly sampled query.batch(n, verbose=True)
yields query results in chunks ofn
rows.- Instances provide
dict()
for easy serialization.
Example usage:
# get all rows as a DataFrame
all_rows = users.pandas(copy=True)
# filter and select specific columns
emails = users.where(users.age > 20).pandas("email")
# sample and batch processing
for batch in users.sample(100).batch(25, verbose=False):
process(batch.pandas())
# delete by condition
users.delete().where(users.email == "bob@example.com").execute()
Modifying tables
Tables support schema modifications without manually writing migrations:
add_column(name, column_type, **kwargs)
– add a new column using Peewee field type names.drop_column(name)
– remove a column (prompts for confirmation by default).reload()
– reload the model definition from the database after structural changes.add_data(df, copy=False, batch_size=1000)
– insert DataFrame rows using the high‑level helpers.