docketanalyzer
Docket Management
Pacer
Utility for downloading PACER data.
Convenience wrapper around Free Law Project's juriscraper for downloading dockets and documents from PACER.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pacer_username
|
str
|
PACER account username. If not provided, will use saved config or PACER_USERNAME from environment. |
None
|
pacer_password
|
str
|
PACER account password. If not provided, will use saved config or PACER_PASSWORD from environment. |
None
|
Attributes:
Name | Type | Description |
---|---|---|
pacer_username |
str
|
The PACER account username |
pacer_password |
str
|
The PACER account password |
cache |
dict
|
Internal cache for storing session and driver instances |
Source code in docketanalyzer/pacer/pacer.py
|
|
purchase_docket(docket_id, **kwargs)
Purchases a docket for a given docket ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docket_id
|
str
|
The docket ID to purchase. |
required |
**kwargs
|
Any
|
Additional query arguments to pass to juriscraper. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
tuple |
tuple[str, dict]
|
A tuple containing the raw HTML and the parsed docket JSON. |
Source code in docketanalyzer/pacer/pacer.py
purchase_document(pacer_case_id, pacer_doc_id, court)
Purchases a document for a given PACER case ID and document ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pacer_case_id
|
str
|
The PACER case ID to purchase the document from. |
required |
pacer_doc_id
|
str
|
The PACER document ID to purchase. |
required |
court
|
str
|
The court to purchase the document from. |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
tuple[bytes, str]
|
A tuple containing the PDF content and the status of the purchase. |
Source code in docketanalyzer/pacer/pacer.py
purchase_attachment(pacer_case_id, pacer_doc_id, attachment_number, court)
Purchases an attachment for a given PACER case ID and document ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pacer_case_id
|
str
|
The PACER case ID to purchase the attachment from. |
required |
pacer_doc_id
|
str
|
The PACER document ID to purchase the attachment from. |
required |
attachment_number
|
str
|
The attachment number to purchase. |
required |
court
|
str
|
The court to purchase the attachment from. |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
tuple[bytes, str]
|
A tuple containing the PDF content and the status of the purchase. |
Source code in docketanalyzer/pacer/pacer.py
parse(docket_html, court)
Parses the raw HTML of a docket and returns the parsed docket JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docket_html
|
str
|
The raw HTML of the docket. |
required |
court
|
str
|
The court to parse the docket from. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
The parsed docket JSON. |
Source code in docketanalyzer/pacer/pacer.py
find_candidate_cases(docket_id)
Finds candidate PACER cases for a given docket ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docket_id
|
str
|
The docket ID to search for. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list[dict[str, str]]
|
A list of candidate cases. |
Source code in docketanalyzer/pacer/pacer.py
Services
services
Database
A PostgreSQL database manager that provides high-level database operations.
This class handles database connections, table management, model registration, and provides an interface for table operations with schemaless tables through the Tables class.
Source code in docketanalyzer/services/psql.py
|
|
meta
property
Get database metadata including table and column information.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, dict[str, Any]]
|
Database metadata including table schemas and foreign keys |
__init__(connection=None, registered_models=None)
Initialize the database manager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
connection
|
str
|
PostgreSQL connection URL |
None
|
registered_models
|
list
|
List of model classes to register with the database |
None
|
Source code in docketanalyzer/services/psql.py
connect()
Establish connection to the PostgreSQL database using the connection URL.
Source code in docketanalyzer/services/psql.py
status()
Check if the database connection is working.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if connection is successful, False otherwise |
reload()
Reload the database metadata and registered models.
register_model(model)
Register a model class with the database manager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
type[DatabaseModel]
|
Peewee model class to register |
required |
Source code in docketanalyzer/services/psql.py
load_table_class(name, new=False)
Dynamically create a model class for a database table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of the table |
required |
new
|
bool
|
Whether this is a new table being created |
False
|
Returns:
Name | Type | Description |
---|---|---|
type |
type[DatabaseModel]
|
A new DatabaseModel subclass representing the table |
Raises:
Type | Description |
---|---|
KeyError
|
If table doesn't exist and new=False |
Source code in docketanalyzer/services/psql.py
create_table(name_or_model, exists_ok=True)
Create a new table in the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name_or_model
|
Union[str, Type[DatabaseModel]]
|
Name of the table to create or model class |
required |
exists_ok
|
bool
|
Whether to silently continue if table exists |
True
|
Raises:
Type | Description |
---|---|
ValueError
|
If table exists and exists_ok=False |
Source code in docketanalyzer/services/psql.py
drop_table(name, confirm=True)
Drop a table from the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of the table to drop |
required |
confirm
|
bool
|
Whether to prompt for confirmation before dropping |
True
|
Raises:
Type | Description |
---|---|
Exception
|
If confirmation is required and user does not confirm |
Source code in docketanalyzer/services/psql.py
DatabaseModel
Bases: DatabaseModelQueryMixin
, Model
A base model class that extends Peewee's Model with additional functionality.
This class provides enhanced database operations including pandas DataFrame conversion, batch processing, column management, and model reloading capabilities.
Source code in docketanalyzer/services/psql.py
|
|
drop_column(column_name, confirm=True)
classmethod
Drop a column from the database table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name
|
str
|
Name of the column to drop |
required |
confirm
|
bool
|
Whether to prompt for confirmation before dropping |
True
|
Source code in docketanalyzer/services/psql.py
add_column(column_name, column_type, null=True, overwrite=False, exists_ok=True, **kwargs)
classmethod
Add a new column to the database table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name
|
str
|
Name of the new column |
required |
column_type
|
str
|
Peewee field type for the column |
required |
null
|
bool
|
Whether the column can contain NULL values |
True
|
overwrite
|
bool
|
Whether to overwrite if column exists |
False
|
exists_ok
|
bool
|
Whether to silently continue if column exists |
True
|
**kwargs
|
Any
|
Additional field parameters passed to Peewee |
{}
|
Source code in docketanalyzer/services/psql.py
add_data(data, copy=False, batch_size=1000)
classmethod
Add data to the table from a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
DataFrame containing the data to insert |
required |
copy
|
bool
|
Whether to use Postgres COPY command for faster insertion |
False
|
batch_size
|
int
|
Number of records to insert in each batch when not using COPY |
1000
|
Source code in docketanalyzer/services/psql.py
reload()
classmethod
Reload the model class to reflect any changes in the database schema.
Source code in docketanalyzer/services/psql.py
S3
A class for syncing local data with an S3 bucket.
Attributes:
Name | Type | Description |
---|---|---|
data_dir |
Path
|
Local directory for data storage. |
bucket |
Path
|
S3 bucket name. |
endpoint_url |
Optional[str]
|
Custom S3 endpoint URL. |
client |
client
|
Boto3 S3 client for direct API interactions. |
Source code in docketanalyzer/services/s3.py
|
|
__init__(data_dir=None)
Initialize the S3 service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_dir
|
Optional[str]
|
Path to local data directory. If None, uses env.DATA_DIR. |
None
|
Source code in docketanalyzer/services/s3.py
push(path=None, from_path=None, to_path=None, **kwargs)
Push data from local storage to S3.
Syncs files from a local directory to an S3 bucket path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Optional[Union[str, Path]]
|
If provided, used as both from_path and to_path. |
None
|
from_path
|
Optional[Union[str, Path]]
|
Local source path to sync from. |
None
|
to_path
|
Optional[Union[str, Path]]
|
S3 destination path to sync to. |
None
|
**kwargs
|
Any
|
Additional arguments to pass to the AWS CLI s3 sync command. |
{}
|
Source code in docketanalyzer/services/s3.py
pull(path=None, from_path=None, to_path=None, **kwargs)
Pull data from S3 to local storage.
Syncs files from an S3 bucket path to a local directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Optional[Union[str, Path]]
|
If provided, used as both from_path and to_path. |
None
|
from_path
|
Optional[Union[str, Path]]
|
S3 source path to sync from. |
None
|
to_path
|
Optional[Union[str, Path]]
|
Local destination path to sync to. |
None
|
**kwargs
|
Any
|
Additional arguments to pass to the AWS CLI s3 sync command. |
{}
|
Source code in docketanalyzer/services/s3.py
download(s3_key, local_path=None)
Download a single file from S3 using the boto3 client.
This method downloads a specific file from S3 to a local path. If local_path is not provided, it will mirror the S3 path structure in the data directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s3_key
|
str
|
The key of the file in the S3 bucket. |
required |
local_path
|
Optional[Union[str, Path]]
|
The local path to save the file to. If None, the file will be saved to data_dir/s3_key. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the downloaded file. |
Raises:
Type | Description |
---|---|
ClientError
|
If the download fails. |
Source code in docketanalyzer/services/s3.py
upload(local_path, s3_key=None)
Upload a single file to S3 using the boto3 client.
This method uploads a specific file from a local path to S3. If s3_key is not provided, it will use the relative path from data_dir as the S3 key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_path
|
Union[str, Path]
|
The local path of the file to upload. |
required |
s3_key
|
Optional[str]
|
The key to use in the S3 bucket. If None, the relative path from data_dir will be used. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The S3 key of the uploaded file. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the local file does not exist. |
ClientError
|
If the upload fails. |
Source code in docketanalyzer/services/s3.py
delete(s3_key)
Delete a single file from S3 using the boto3 client.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s3_key
|
str
|
The key of the file in the S3 bucket to delete. |
required |
Raises:
Type | Description |
---|---|
ClientError
|
If the deletion fails. |
Source code in docketanalyzer/services/s3.py
load_elastic(**kwargs)
Load an Elasticsearch client with the configured connection URL.
Run da configure elastic
to set the connection URL.
load_psql()
Load a Database object using the connection url in your config.
Run da configure postgres
to set your PostgreSQL connection URL.
load_redis(**kwargs)
Load a Redis client with the configured connection URL.
Run da configure elastic
to set the connection URL.
load_s3(data_dir=None)
Load the S3 service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_dir
|
Optional[Union[str, Path]]
|
Path to local data directory. If None, uses env.DATA_DIR. |
None
|
Returns:
Name | Type | Description |
---|---|---|
S3 |
S3
|
An instance of the S3 class. |
Source code in docketanalyzer/services/s3.py
load_psql()
Load a Database object using the connection url in your config.
Run da configure postgres
to set your PostgreSQL connection URL.
load_redis(**kwargs)
Load a Redis client with the configured connection URL.
Run da configure elastic
to set the connection URL.
load_s3(data_dir=None)
Load the S3 service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_dir
|
Optional[Union[str, Path]]
|
Path to local data directory. If None, uses env.DATA_DIR. |
None
|
Returns:
Name | Type | Description |
---|---|---|
S3 |
S3
|
An instance of the S3 class. |
Source code in docketanalyzer/services/s3.py
Database
A PostgreSQL database manager that provides high-level database operations.
This class handles database connections, table management, model registration, and provides an interface for table operations with schemaless tables through the Tables class.
Source code in docketanalyzer/services/psql.py
|
|
__init__(connection=None, registered_models=None)
Initialize the database manager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
connection
|
str
|
PostgreSQL connection URL |
None
|
registered_models
|
list
|
List of model classes to register with the database |
None
|
Source code in docketanalyzer/services/psql.py
connect()
Establish connection to the PostgreSQL database using the connection URL.
Source code in docketanalyzer/services/psql.py
create_table(name_or_model, exists_ok=True)
Create a new table in the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name_or_model
|
Union[str, Type[DatabaseModel]]
|
Name of the table to create or model class |
required |
exists_ok
|
bool
|
Whether to silently continue if table exists |
True
|
Raises:
Type | Description |
---|---|
ValueError
|
If table exists and exists_ok=False |
Source code in docketanalyzer/services/psql.py
register_model(model)
Register a model class with the database manager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
type[DatabaseModel]
|
Peewee model class to register |
required |
Source code in docketanalyzer/services/psql.py
DatabaseModel
Bases: DatabaseModelQueryMixin
, Model
A base model class that extends Peewee's Model with additional functionality.
This class provides enhanced database operations including pandas DataFrame conversion, batch processing, column management, and model reloading capabilities.
Source code in docketanalyzer/services/psql.py
|
|
add_column(column_name, column_type, null=True, overwrite=False, exists_ok=True, **kwargs)
classmethod
Add a new column to the database table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name
|
str
|
Name of the new column |
required |
column_type
|
str
|
Peewee field type for the column |
required |
null
|
bool
|
Whether the column can contain NULL values |
True
|
overwrite
|
bool
|
Whether to overwrite if column exists |
False
|
exists_ok
|
bool
|
Whether to silently continue if column exists |
True
|
**kwargs
|
Any
|
Additional field parameters passed to Peewee |
{}
|
Source code in docketanalyzer/services/psql.py
add_data(data, copy=False, batch_size=1000)
classmethod
Add data to the table from a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
DataFrame containing the data to insert |
required |
copy
|
bool
|
Whether to use Postgres COPY command for faster insertion |
False
|
batch_size
|
int
|
Number of records to insert in each batch when not using COPY |
1000
|
Source code in docketanalyzer/services/psql.py
drop_column(column_name, confirm=True)
classmethod
Drop a column from the database table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name
|
str
|
Name of the column to drop |
required |
confirm
|
bool
|
Whether to prompt for confirmation before dropping |
True
|
Source code in docketanalyzer/services/psql.py
reload()
classmethod
Reload the model class to reflect any changes in the database schema.
Source code in docketanalyzer/services/psql.py
S3
A class for syncing local data with an S3 bucket.
Attributes:
Name | Type | Description |
---|---|---|
data_dir |
Path
|
Local directory for data storage. |
bucket |
Path
|
S3 bucket name. |
endpoint_url |
Optional[str]
|
Custom S3 endpoint URL. |
client |
client
|
Boto3 S3 client for direct API interactions. |
Source code in docketanalyzer/services/s3.py
|
|
__init__(data_dir=None)
Initialize the S3 service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_dir
|
Optional[str]
|
Path to local data directory. If None, uses env.DATA_DIR. |
None
|
Source code in docketanalyzer/services/s3.py
push(path=None, from_path=None, to_path=None, **kwargs)
Push data from local storage to S3.
Syncs files from a local directory to an S3 bucket path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Optional[Union[str, Path]]
|
If provided, used as both from_path and to_path. |
None
|
from_path
|
Optional[Union[str, Path]]
|
Local source path to sync from. |
None
|
to_path
|
Optional[Union[str, Path]]
|
S3 destination path to sync to. |
None
|
**kwargs
|
Any
|
Additional arguments to pass to the AWS CLI s3 sync command. |
{}
|
Source code in docketanalyzer/services/s3.py
pull(path=None, from_path=None, to_path=None, **kwargs)
Pull data from S3 to local storage.
Syncs files from an S3 bucket path to a local directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Optional[Union[str, Path]]
|
If provided, used as both from_path and to_path. |
None
|
from_path
|
Optional[Union[str, Path]]
|
S3 source path to sync from. |
None
|
to_path
|
Optional[Union[str, Path]]
|
Local destination path to sync to. |
None
|
**kwargs
|
Any
|
Additional arguments to pass to the AWS CLI s3 sync command. |
{}
|
Source code in docketanalyzer/services/s3.py
upload(local_path, s3_key=None)
Upload a single file to S3 using the boto3 client.
This method uploads a specific file from a local path to S3. If s3_key is not provided, it will use the relative path from data_dir as the S3 key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_path
|
Union[str, Path]
|
The local path of the file to upload. |
required |
s3_key
|
Optional[str]
|
The key to use in the S3 bucket. If None, the relative path from data_dir will be used. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The S3 key of the uploaded file. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the local file does not exist. |
ClientError
|
If the upload fails. |
Source code in docketanalyzer/services/s3.py
download(s3_key, local_path=None)
Download a single file from S3 using the boto3 client.
This method downloads a specific file from S3 to a local path. If local_path is not provided, it will mirror the S3 path structure in the data directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s3_key
|
str
|
The key of the file in the S3 bucket. |
required |
local_path
|
Optional[Union[str, Path]]
|
The local path to save the file to. If None, the file will be saved to data_dir/s3_key. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the downloaded file. |
Raises:
Type | Description |
---|---|
ClientError
|
If the download fails. |
Source code in docketanalyzer/services/s3.py
delete(s3_key)
Delete a single file from S3 using the boto3 client.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s3_key
|
str
|
The key of the file in the S3 bucket to delete. |
required |
Raises:
Type | Description |
---|---|
ClientError
|
If the deletion fails. |
Source code in docketanalyzer/services/s3.py
utils
extension_required
Context manager extension imports.
Source code in docketanalyzer/utils/utils.py
__init__(extension)
__enter__()
__exit__(exc_type, exc_val, exc_tb)
Handle import errors with helpful messages.
Source code in docketanalyzer/utils/utils.py
timeit
Context manager for timing things.
Usage: with timeit("Task"): # do something do_something()
This will print the time taken to execute the block of code.
Source code in docketanalyzer/utils/utils.py
__init__(description='Task')
__enter__()
__exit__(exc_type, exc_val, exc_tb)
Print the execution time.
parse_docket_id(docket_id)
Parse a docket ID into a court and docket number.
construct_docket_id(court, docket_number)
Construct a docket ID from a court and docket number.
json_default(obj)
Default JSON serializer for datetime and date objects.
notabs(text)
download_file(url, path, description='Downloading')
Download file from URL to local path with progress bar.
Source code in docketanalyzer/utils/utils.py
generate_hash(data, salt=None, length=None)
Generate a hash for some data with optional salt.
Source code in docketanalyzer/utils/utils.py
generate_code(length=16)
Generate a random code of specified length.
pd_save_or_append(data, path, **kwargs)
Save or append a DataFrame to a CSV file.
Source code in docketanalyzer/utils/utils.py
datetime_utcnow()
list_to_array(data)
to_date(value)
Convert a value to a date if possible.