API Reference

StandardTranscriptionJSON

class stjlib.StandardTranscriptionJSON(metadata=None, transcript=None, validate=False)

Bases: object

Handler for Standard Transcription JSON (STJ) format.

This class implements version 0.6.1 of the STJ specification, providing a high-level interface for working with STJ documents.

Format Structure:

The STJ format wraps content in a root “stj” object: {

“stj”: {

“version”: “0.6.1”, “metadata”: {

“transcriber”: {“name”: str, “version”: str}, “created_at”: str, # ISO 8601 UTC timestamp …

}, “transcript”: {

“segments”: […], “speakers”: […], …

}

}

}

Features:
  • Create new transcripts

  • Load existing STJ files

  • Add/modify transcript content

  • Validate STJ data

  • Save transcripts to files

Example

>>> stj = StandardTranscriptionJSON(
...     metadata=Metadata(
...         transcriber=Transcriber(name="MyTranscriber", version="1.0")
...     )
... )
>>> stj.add_speaker("s1", "John")
>>> stj.add_segment(text="Hello", start=0.0, end=1.5, speaker_id="s1")

Note

The STJ format is a standardized way to represent transcribed audio/video content with support for timing, speakers, and metadata.

Parameters:
  • metadata (Metadata | None)

  • transcript (Transcript | None)

  • validate (bool)

add_segment(text, start=None, end=None, *, speaker_id=None, language=None, **kwargs)

Add a new transcript segment.

Parameters:
  • text (str) – The transcribed text

  • start (float | None) – Start time in seconds

  • end (float | None) – End time in seconds

  • speaker_id (str | None) – Optional ID of the speaker

  • language (str | None) – Optional language code

  • **kwargs – Additional segment properties

Raises:

ValueError – If text is empty or if end time is before start time

Return type:

None

add_speaker(id, name=None)

Add a new speaker.

Parameters:
  • id (str) – Unique speaker identifier

  • name (str | None) – Optional display name for the speaker

Raises:

ValueError – If id is empty or if speaker with same id already exists

Return type:

None

clear_segments()

Remove all segments from the transcript.

Return type:

None

classmethod create_from_stj(stj)

Internal method to create instance from STJ object.

This method is used internally by: - from_dict() - from_file()

For new instances, use the constructor.

Parameters:

stj (STJ) – An existing STJ object

Returns:

New instance wrapping the STJ object

Return type:

StandardTranscriptionJSON

Note

This is an internal method and should not be used directly. Use the constructor for creating new instances.

classmethod from_dict(data, validate=False)

Creates a StandardTranscriptionJSON instance from a dictionary.

Parameters:
  • data (Dict[str, Any]) – Dictionary containing STJ data

  • validate (bool) – Whether to validate the data (defaults to False)

Returns:

A new StandardTranscriptionJSON instance

Return type:

StandardTranscriptionJSON

Raises:

ValidationError – If validate=True and the data fails validation

classmethod from_file(filename, validate=False)

Creates a StandardTranscriptionJSON instance from a JSON file.

Parameters:
  • filename (str) – Path to the JSON file to load

  • validate (bool) – Whether to validate the loaded data

Returns:

New instance with loaded data

Return type:

StandardTranscriptionJSON

Raises:
  • FileNotFoundError – If file doesn’t exist

  • ValidationError – If validation fails or data structure is invalid

get_segments_by_speaker(speaker_id)

Get all segments for a specific speaker.

Parameters:

speaker_id (str) – ID of the speaker to find segments for

Returns:

List of segments with matching speaker_id. Empty list if none found.

Return type:

List[Segment]

Raises:

ValueError – If speaker_id is empty or whitespace

get_speaker(speaker_id)

Get speaker by ID.

Parameters:

speaker_id (str) – ID of the speaker to find

Returns:

Speaker if found, None if not found

Return type:

Optional[Speaker]

Raises:

ValueError – If speaker_id is empty or whitespace

property metadata: Metadata | None

Access to the STJ metadata.

Returns:

The metadata object or None if not present

Return type:

Optional[Metadata]

Raises:

ValueError – If STJ instance is not properly initialized

to_dict()

Convert to STJ format dictionary.

Returns:

Dictionary in the STJ format with structure: {

”stj”: {

“version”: str, “metadata”: Optional[Dict[str, Any]], “transcript”: Dict[str, Any]

}

}

Return type:

STJDict

to_file(filename)

Saves the STJ instance to a JSON file.

Parameters:

filename (str) – Path where the JSON file should be written

Raises:

IOError – If there’s an error writing to the file

Return type:

None

property transcript: Transcript

Access to the STJ transcript.

Returns:

The transcript object containing all content

Return type:

Transcript

Raises:

ValueError – If STJ instance is not properly initialized

validate(raise_exception=True)

Validates the STJ data according to specification requirements.

Parameters:

raise_exception (bool) – If True, raises ValidationError for any issues.

Returns:

List of validation issues if found, None if valid.

Return type:

Optional[ValidationIssues]

Raises:

ValidationError – If validation fails and raise_exception is True.

Data Classes

Metadata

class stjlib.Metadata(transcriber=None, created_at=None, source=None, languages=None, confidence_threshold=None, extensions=None, _invalid_type=None)

Metadata for the Standard Transcription JSON (STJ).

This class contains various metadata fields that provide context and information about the transcription process and content.

Parameters:
  • transcriber (TypeAliasForwardRef('stjlib.stj.Transcriber') | None)

  • created_at (datetime | None)

  • source (TypeAliasForwardRef('stjlib.stj.Source') | None)

  • languages (List[str] | None)

  • confidence_threshold (float | None)

  • extensions (Dict[str, Any] | None)

  • _invalid_type (str | None)

transcriber

Information about the transcription system

Type:

Optional[Transcriber]

created_at

Timestamp when transcription was created

Type:

Optional[datetime]

source

Information about the source media

Type:

Optional[Source]

languages

List of languages in the transcription

Type:

Optional[List[str]]

confidence_threshold

Minimum confidence score for words

Type:

Optional[float]

extensions

Additional metadata key-value pairs

Type:

Optional[Dict[str, Any]]

Example

```python # Create metadata with transcriber and timestamp metadata = Metadata(

transcriber=Transcriber(

name=”AutoTranscribe”, version=”2.1.0”

), created_at=datetime.now(timezone.utc), languages=[“en”, “es”], confidence_threshold=0.8

)

# Access metadata information print(f”Created: {metadata.created_at.isoformat()}”) print(f”Languages: {’, ‘.join(metadata.languages)}”) ```

Note

  • All fields are optional

  • created_at must be timezone-aware if present

  • confidence_threshold must be between 0.0 and 1.0

  • languages must be valid ISO 639-1 or ISO 639-3 codes

  • extensions can contain arbitrary metadata

confidence_threshold: float | None = None
created_at: datetime | None = None
extensions: Dict[str, Any] | None = None
classmethod from_dict(data)

Creates a Metadata instance from a dictionary.

Handles timestamp parsing but preserves all other data as-is without validation.

Parameters:

data (Dict[str, Any]) – Dictionary containing metadata fields

Returns:

A new Metadata instance

Return type:

Metadata

Example

```python data = {

“transcriber”: {“name”: “AutoTranscribe”, “version”: “2.1.0”}, “created_at”: “2023-01-01T12:00:00Z”, “languages”: [“en”, “es”]

} metadata = Metadata.from_dict(data) ```

languages: List[str] | None = None
source: Source | None = None
to_dict()

Converts the Metadata instance to a dictionary.

Returns:

Dictionary representation of the metadata. Returns None if no fields are set.

Return type:

Dict[str, Any]

Example

```python metadata = Metadata(

transcriber=Transcriber(name=”AutoTranscribe”), created_at=datetime.now(timezone.utc)

) data = metadata.to_dict() ```

Note

  • created_at is converted to UTC and ISO format with ‘Z’ suffix

  • Empty optional fields are omitted from the output

transcriber: Transcriber | None = None

Transcript

class stjlib.Transcript(segments=<factory>, speakers=<factory>, styles=None, _invalid_segments_type=None, _invalid_type=None, _invalid_speakers_type=None, _invalid_styles_type=None)

Main content of the transcription.

This class contains the core transcription data, including speakers, segments, and optional style information. It represents the complete transcribed content with timing and formatting.

Parameters:
  • segments (List[Segment])

  • speakers (List[Speaker])

  • styles (List[Style] | None)

  • _invalid_segments_type (str | None)

  • _invalid_type (str | None)

  • _invalid_speakers_type (str | None)

  • _invalid_styles_type (str | None)

speakers

List of speakers in the transcript. Each speaker has a unique ID and optional metadata. Empty list indicates speaker identification was attempted but found none. Non-empty list indicates speakers were found.

Type:

List[Speaker]

segments

List of transcript segments. Segments contain the actual transcribed content with timing. Must not be empty.

Type:

List[Segment]

styles

Optional list of text formatting styles. Styles can be referenced by segments for formatting. Can be: - None: Style processing was not attempted - Empty list: Style processing performed but no styles defined - List of styles: One or more styles defined

Type:

Optional[List[Style]]

Example

```python # Create a basic transcript with one speaker and segment transcript = Transcript(

speakers=[

Speaker(id=”S1”, name=”John”)

], segments=[

Segment(

text=”Hello world”, start=0.0, end=1.5, speaker_id=”S1”

)

]

)

# Create a transcript where speaker identification found none transcript = Transcript(

speakers=[], # Empty list indicates attempted but none found segments=[

Segment(text=”Hello world”)

]

)

# Create a basic transcript (implicitly attempts speaker identification) transcript = Transcript(

segments=[

Segment(text=”Hello world”)

]

)

# Create a transcript with multiple speakers and styles transcript = Transcript(

speakers=[

Speaker(id=”S1”, name=”John”), Speaker(id=”S2”, name=”Jane”)

], segments=[

Segment(

text=”How are you?”, start=0.0, end=1.5, speaker_id=”S1”, style_id=”question”

), Segment(

text=”I’m fine, thanks!”, start=1.6, end=3.0, speaker_id=”S2”

)

], styles=[

Style(

id=”question”, text={“color”: “#0000FF”}

)

]

)

Note

  • segments list must not be empty

  • speakers list will be empty if none found, non-empty if speakers found

  • styles can be: - None: style processing not attempted - Empty list: styles were processed but none defined - List with items: styles were defined

  • segments must be ordered by time and must not overlap

  • all IDs must be unique within their respective lists

  • if speakers/styles are included, references to them must be valid

classmethod from_dict(data)

Creates a Transcript instance from a dictionary.

Parameters:

data (Dict[str, Any]) –

Dictionary containing transcript data with fields: - speakers (optional): List of speaker data. Treated as empty list

if key missing (indicating speaker identification attempted but none found)

  • segments (required): List of segment data

  • styles (optional): List of style data. If key missing, treated as “not attempted”. If present but empty, treated as “none defined”

Returns:

A new Transcript instance

Return type:

Transcript

Example

```python # Transcript with speakers found data = {

“speakers”: [{“id”: “S1”, “name”: “John”}], “segments”: [{

“text”: “Hello”, “start”: 0.0, “end”: 1.0, “speaker_id”: “S1”

}]

}

# Transcript where speaker identification found none data = {

“speakers”: [], # Empty list = none found “segments”: [{

“text”: “Hello”, “start”: 0.0, “end”: 1.0

}]

}

# Basic transcript (implicitly attempts speaker identification) data = {

“segments”: [{

“text”: “Hello”, “start”: 0.0, “end”: 1.0

}]

}

transcript = Transcript.from_dict(data) ```

segments: List[Segment]
speakers: List[Speaker]
styles: List[Style] | None = None
to_dict()

Converts the Transcript instance to a dictionary.

Returns:

Dictionary containing transcript data. Always includes segments and speakers (even if empty). Styles included only if style processing was attempted.

Return type:

Dict[str, Any]

Example

```python # Basic transcript (implicitly attempted speaker identification) transcript = Transcript(

segments=[Segment(text=”Hello”)]

) # Result includes empty speakers list: # {“speakers”: [], “segments”: […]}

# Transcript with speakers found transcript = Transcript(

speakers=[Speaker(id=”S1”, name=”John”)], segments=[

Segment(text=”Hello”, start=0.0, end=1.0, speaker_id=”S1”)

]

) # Result includes non-empty speakers list: # {“speakers”: [{“id”: “S1”, “name”: “John”}], “segments”: […]} ```

Segment

class stjlib.Segment(text, start=None, end=None, is_zero_duration=None, speaker_id=None, confidence=<object object>, language=None, style_id=None, word_timing_mode=None, words=None, extensions=None, _invalid_type=None, _invalid_words_type=None)

Timed segment in the transcript with optional word-level detail.

A segment represents a continuous portion of the transcript with its own timing, speaker, and optional word-level information.

Parameters:
  • text (str | None)

  • start (float | None)

  • end (float | None)

  • is_zero_duration (bool | None)

  • speaker_id (str | None)

  • confidence (float | None)

  • language (str | None)

  • style_id (str | None)

  • word_timing_mode (WordTimingMode | str | None)

  • words (List[Word] | None)

  • extensions (Dict[str, Any] | None)

  • _invalid_type (str | None)

  • _invalid_words_type (str | None)

text

The transcribed text content

Type:

str

start

Start time in seconds from beginning of media

Type:

Optional[float]

end

End time in seconds from beginning of media

Type:

Optional[float]

is_zero_duration

Flag for zero-duration segments

Type:

Optional[bool]

speaker_id

Reference to a speaker.id

Type:

Optional[str]

confidence

Confidence score between 0.0 and 1.0

Type:

Optional[float]

language

ISO 639-1 or ISO 639-3 language code

Type:

Optional[str]

style_id

Reference to a style.id

Type:

Optional[str]

word_timing_mode

Word timing completeness

Type:

Optional[WordTimingMode]

words

Word-level timing and text information

Type:

Optional[List[Word]]

extensions

Additional segment metadata

Type:

Optional[Dict[str, Any]]

Example

```python # Create a basic segment segment = Segment(

text=”Hello world”, start=0.0, end=1.5, speaker_id=”speaker-1”

)

# Create a segment with word timing segment = Segment(

text=”Hello world”, start=0.0, end=1.5, words=[

Word(text=”Hello”, start=0.0, end=0.8), Word(text=”world”, start=0.9, end=1.5)

], word_timing_mode=WordTimingMode.COMPLETE

)

Note

  • Segments must not overlap with each other

  • start and end must both be present or both absent

  • start must be >= 0 and end must be >= start

  • confidence must be between 0.0 and 1.0 if present

  • speaker_id must reference a valid speaker

  • style_id must reference a valid style

  • language must be a valid ISO code if present

confidence: float | None = <object object>
end: float | None = None
extensions: Dict[str, Any] | None = None
classmethod from_dict(data)

Creates a Segment instance from a dictionary.

Parameters:

data (Dict[str, Any]) – Dictionary containing segment data with fields: - text (required): Segment text content - start (optional): Start time in seconds - end (optional): End time in seconds - is_zero_duration (optional): Zero duration flag - speaker_id (optional): Reference to speaker - confidence (optional): Confidence score - language (optional): Language code - style_id (optional): Reference to style - word_timing_mode (optional): Word timing mode - words (optional): List of word data - extensions (optional): Additional metadata

Returns:

A new Segment instance

Return type:

Segment

Example

```python data = {

“text”: “Hello world”, “start”: 0.0, “end”: 1.5, “speaker_id”: “speaker-1”

} segment = Segment.from_dict(data) ```

is_zero_duration: bool | None = None
language: str | None = None
speaker_id: str | None = None
start: float | None = None
style_id: str | None = None
text: str | None
to_dict()

Converts the Segment instance to a dictionary.

Returns:

Dictionary containing segment data. Only includes non-None and non-empty fields.

Return type:

Dict[str, Any]

Example

```python segment = Segment(

text=”Hello world”, start=0.0, end=1.5, speaker_id=”speaker-1”

) data = segment.to_dict() ```

word_timing_mode: WordTimingMode | str | None = None
words: List[Word] | None = None

Enumerations

WordTimingMode

class stjlib.WordTimingMode(*values)

Word timing modes for transcript segments.

This enum defines the possible modes for word-level timing information within a segment. It indicates the completeness of timing data for words in the segment.

Values:
COMPLETE: All words in the segment have timing information.

Use this when every word has start and end times.

PARTIAL: Some words in the segment have timing information.

Use this when only some words have timing data.

NONE: No words in the segment have timing information.

Use this when words array exists but has no timing data.

Example

```python # Create a segment with complete word timing segment = Segment(

text=”Hello world”, word_timing_mode=WordTimingMode.COMPLETE, words=[

Word(text=”Hello”, start=0.0, end=0.5), Word(text=”world”, start=0.6, end=1.0)

]

)

# Create a segment with partial word timing segment = Segment(

text=”Hello world”, word_timing_mode=WordTimingMode.PARTIAL, words=[

Word(text=”Hello”, start=0.0, end=0.5), Word(text=”world”) # No timing for this word

]

)

# Create a segment with no word timing segment = Segment(

text=”Hello world”, word_timing_mode=WordTimingMode.NONE, words=[

Word(text=”Hello”), Word(text=”world”)

]

)

Note

  • The word_timing_mode field affects validation requirements

  • COMPLETE mode requires all words to have timing data

  • PARTIAL mode allows mixed timing presence

  • NONE mode requires no timing data

  • The mode must match the actual timing data presence

COMPLETE = 'complete'
NONE = 'none'
PARTIAL = 'partial'

Exceptions

class stjlib.STJError

Bases: Exception

Base class for exceptions in the STJ module.

This class serves as the parent class for all STJ-specific exceptions, allowing for specific error handling of STJ-related issues.

Example

```python try:

stj = StandardTranscriptionJSON.from_file(“invalid.json”)

except STJError as e:

print(f”STJ error occurred: {e}”)

```

class stjlib.ValidationError(issues)

Bases: STJError

Exception raised when STJ validation fails.

This exception includes a list of validation issues that describe the specific problems found during validation.

Parameters:

issues (List[ValidationIssue])

issues

List of validation issues found

Type:

List[ValidationIssue]

Example

```python try:

stj = StandardTranscriptionJSON.from_file(

“transcript.json”, validate=True

)

except ValidationError as e:

print(“Validation failed:”) for issue in e.issues:

print(f”{issue.severity}: {issue}”)

```

__str__()

Returns a formatted string of all validation issues.

Returns:

Multi-line string containing all validation issues

Return type:

str