STJLib Modules

STJLib: Standard Transcription JSON Format Handler

A comprehensive implementation of the Standard Transcription JSON (STJ) format for representing transcribed audio and video data.

Module Organization:
  • Type definitions and type hints

  • Exception classes for error handling

  • Main STJ handler class

  • Helper functions and utilities

Key Features:
  • Complete STJ format implementation

  • Load and save STJ files with robust error handling

  • Comprehensive validation against specification

  • Type-safe data structures

  • Extensible architecture

  • Full support for all STJ components:
    • Metadata and source information

    • Transcript content and timing

    • Speaker identification

    • Word-level timing and confidence

    • Text formatting and styles

    • Custom extensions

Example

```python from stjlib import StandardTranscriptionJSON

# Create new transcript stj = StandardTranscriptionJSON(

metadata=Metadata(

transcriber=Transcriber(name=”MyTranscriber”, version=”1.0”)

)

)

# Adding content stj.add_speaker(“s1”, “John”) stj.add_segment(

text=”Hello world”, start=0.0, end=1.5, speaker_id=”s1”

)

# Loading existing data existing = StandardTranscriptionJSON.from_file(“transcript.stjson”) ```

Note

For detailed format information, see: https://github.com/yaniv-golan/STJ

Format Structure:

The STJ format wraps content in a root “stj” object: {

“stj”: {

“version”: “0.6.1”, “metadata”: {

“transcriber”: {“name”: str, “version”: str}, “created_at”: str, # ISO 8601 UTC timestamp …

}, “transcript”: {

“segments”: […], “speakers”: […], …

}

}

}

exception stjlib.stj.STJError

Bases: Exception

Base class for exceptions in the STJ module.

This class serves as the parent class for all STJ-specific exceptions, allowing for specific error handling of STJ-related issues.

Example

```python try:

stj = StandardTranscriptionJSON.from_file(“invalid.json”)

except STJError as e:

print(f”STJ error occurred: {e}”)

```

class stjlib.stj.StandardTranscriptionJSON(metadata=None, transcript=None, validate=False)

Bases: object

Handler for Standard Transcription JSON (STJ) format.

This class implements version 0.6.1 of the STJ specification, providing a high-level interface for working with STJ documents.

Format Structure:

The STJ format wraps content in a root “stj” object: {

“stj”: {

“version”: “0.6.1”, “metadata”: {

“transcriber”: {“name”: str, “version”: str}, “created_at”: str, # ISO 8601 UTC timestamp …

}, “transcript”: {

“segments”: […], “speakers”: […], …

}

}

}

Features:
  • Create new transcripts

  • Load existing STJ files

  • Add/modify transcript content

  • Validate STJ data

  • Save transcripts to files

Example

>>> stj = StandardTranscriptionJSON(
...     metadata=Metadata(
...         transcriber=Transcriber(name="MyTranscriber", version="1.0")
...     )
... )
>>> stj.add_speaker("s1", "John")
>>> stj.add_segment(text="Hello", start=0.0, end=1.5, speaker_id="s1")

Note

The STJ format is a standardized way to represent transcribed audio/video content with support for timing, speakers, and metadata.

Parameters:
  • metadata (Metadata | None)

  • transcript (Transcript | None)

  • validate (bool)

add_segment(text, start=None, end=None, *, speaker_id=None, language=None, **kwargs)

Add a new transcript segment.

Parameters:
  • text (str) – The transcribed text

  • start (float | None) – Start time in seconds

  • end (float | None) – End time in seconds

  • speaker_id (str | None) – Optional ID of the speaker

  • language (str | None) – Optional language code

  • **kwargs – Additional segment properties

Raises:

ValueError – If text is empty or if end time is before start time

Return type:

None

add_speaker(id, name=None)

Add a new speaker.

Parameters:
  • id (str) – Unique speaker identifier

  • name (str | None) – Optional display name for the speaker

Raises:

ValueError – If id is empty or if speaker with same id already exists

Return type:

None

clear_segments()

Remove all segments from the transcript.

Return type:

None

classmethod create_from_stj(stj)

Internal method to create instance from STJ object.

This method is used internally by: - from_dict() - from_file()

For new instances, use the constructor.

Parameters:

stj (STJ) – An existing STJ object

Returns:

New instance wrapping the STJ object

Return type:

StandardTranscriptionJSON

Note

This is an internal method and should not be used directly. Use the constructor for creating new instances.

classmethod from_dict(data, validate=False)

Creates a StandardTranscriptionJSON instance from a dictionary.

Parameters:
  • data (Dict[str, Any]) – Dictionary containing STJ data

  • validate (bool) – Whether to validate the data (defaults to False)

Returns:

A new StandardTranscriptionJSON instance

Return type:

StandardTranscriptionJSON

Raises:

ValidationError – If validate=True and the data fails validation

classmethod from_file(filename, validate=False)

Creates a StandardTranscriptionJSON instance from a JSON file.

Parameters:
  • filename (str) – Path to the JSON file to load

  • validate (bool) – Whether to validate the loaded data

Returns:

New instance with loaded data

Return type:

StandardTranscriptionJSON

Raises:
  • FileNotFoundError – If file doesn’t exist

  • ValidationError – If validation fails or data structure is invalid

get_segments_by_speaker(speaker_id)

Get all segments for a specific speaker.

Parameters:

speaker_id (str) – ID of the speaker to find segments for

Returns:

List of segments with matching speaker_id. Empty list if none found.

Return type:

List[Segment]

Raises:

ValueError – If speaker_id is empty or whitespace

get_speaker(speaker_id)

Get speaker by ID.

Parameters:

speaker_id (str) – ID of the speaker to find

Returns:

Speaker if found, None if not found

Return type:

Optional[Speaker]

Raises:

ValueError – If speaker_id is empty or whitespace

property metadata: Metadata | None

Access to the STJ metadata.

Returns:

The metadata object or None if not present

Return type:

Optional[Metadata]

Raises:

ValueError – If STJ instance is not properly initialized

to_dict()

Convert to STJ format dictionary.

Returns:

Dictionary in the STJ format with structure: {

”stj”: {

“version”: str, “metadata”: Optional[Dict[str, Any]], “transcript”: Dict[str, Any]

}

}

Return type:

STJDict

to_file(filename)

Saves the STJ instance to a JSON file.

Parameters:

filename (str) – Path where the JSON file should be written

Raises:

IOError – If there’s an error writing to the file

Return type:

None

property transcript: Transcript

Access to the STJ transcript.

Returns:

The transcript object containing all content

Return type:

Transcript

Raises:

ValueError – If STJ instance is not properly initialized

validate(raise_exception=True)

Validates the STJ data according to specification requirements.

Parameters:

raise_exception (bool) – If True, raises ValidationError for any issues.

Returns:

List of validation issues if found, None if valid.

Return type:

Optional[ValidationIssues]

Raises:

ValidationError – If validation fails and raise_exception is True.

exception stjlib.stj.ValidationError(issues)

Bases: STJError

Exception raised when STJ validation fails.

This exception includes a list of validation issues that describe the specific problems found during validation.

Parameters:

issues (List[ValidationIssue])

issues

List of validation issues found

Type:

List[ValidationIssue]

Example

```python try:

stj = StandardTranscriptionJSON.from_file(

“transcript.json”, validate=True

)

except ValidationError as e:

print(“Validation failed:”) for issue in e.issues:

print(f”{issue.severity}: {issue}”)

```

__str__()

Returns a formatted string of all validation issues.

Returns:

Multi-line string containing all validation issues

Return type:

str