API Reference

StandardTranscriptionJSON

class stjlib.StandardTranscriptionJSON(metadata=None, transcript=None, validate=False)

Bases: object

Handler for Standard Transcription JSON (STJ) format.

This class implements version 0.6.1 of the STJ specification, providing a high-level interface for working with STJ documents.

Format Structure:

The STJ format wraps content in a root “stj” object: {

“stj”: {
“version”: “0.6.1”, “metadata”: {

“transcriber”: {“name”: str, “version”: str}, “created_at”: str, # ISO 8601 UTC timestamp …

}, “transcript”: {

“segments”: […], “speakers”: […], …

}

}

}

Features:

Create new transcripts
Load existing STJ files
Add/modify transcript content
Validate STJ data
Save transcripts to files

Example

>>> stj = StandardTranscriptionJSON(
...     metadata=Metadata(
...         transcriber=Transcriber(name="MyTranscriber", version="1.0")
...     )
... )
>>> stj.add_speaker("s1", "John")
>>> stj.add_segment(text="Hello", start=0.0, end=1.5, speaker_id="s1")

Note

The STJ format is a standardized way to represent transcribed audio/video content with support for timing, speakers, and metadata.

Parameters:

metadata (Metadata | None)
transcript (Transcript | None)
validate (bool)

add_segment(text, start=None, end=None, *, speaker_id=None, language=None, **kwargs)

Add a new transcript segment.

Parameters:

text (str) – The transcribed text
start (float | None) – Start time in seconds
end (float | None) – End time in seconds
speaker_id (str | None) – Optional ID of the speaker
language (str | None) – Optional language code
**kwargs – Additional segment properties

Raises:

ValueError – If text is empty or if end time is before start time

Return type:

None

add_speaker(id, name=None)

Add a new speaker.

Parameters:

id (str) – Unique speaker identifier
name (str | None) – Optional display name for the speaker

Raises:

ValueError – If id is empty or if speaker with same id already exists

Return type:

None

clear_segments()

Remove all segments from the transcript.

Return type:: None

classmethod create_from_stj(stj)

Internal method to create instance from STJ object.

This method is used internally by: - from_dict() - from_file()

For new instances, use the constructor.

Parameters:: stj (STJ) – An existing STJ object
Returns:: New instance wrapping the STJ object
Return type:: StandardTranscriptionJSON

Note

This is an internal method and should not be used directly. Use the constructor for creating new instances.

classmethod from_dict(data, validate=False)

Creates a StandardTranscriptionJSON instance from a dictionary.

Parameters:

data (Dict[str, Any]) – Dictionary containing STJ data
validate (bool) – Whether to validate the data (defaults to False)

Returns:

A new StandardTranscriptionJSON instance

Return type:

StandardTranscriptionJSON

Raises:

ValidationError – If validate=True and the data fails validation

classmethod from_file(filename, validate=False)

Creates a StandardTranscriptionJSON instance from a JSON file.

Parameters:

filename (str) – Path to the JSON file to load
validate (bool) – Whether to validate the loaded data

Returns:

New instance with loaded data

Return type:

StandardTranscriptionJSON

Raises:

FileNotFoundError – If file doesn’t exist
ValidationError – If validation fails or data structure is invalid

get_segments_by_speaker(speaker_id)

Get all segments for a specific speaker.

Parameters:: speaker_id (str) – ID of the speaker to find segments for
Returns:: List of segments with matching speaker_id. Empty list if none found.
Return type:: List[Segment]
Raises:: ValueError – If speaker_id is empty or whitespace

get_speaker(speaker_id)

Get speaker by ID.

Parameters:: speaker_id (str) – ID of the speaker to find
Returns:: Speaker if found, None if not found
Return type:: Optional[Speaker]
Raises:: ValueError – If speaker_id is empty or whitespace

property metadata: Metadata | None

Access to the STJ metadata.

Returns:: The metadata object or None if not present
Return type:: Optional[Metadata]
Raises:: ValueError – If STJ instance is not properly initialized

to_dict()

Convert to STJ format dictionary.

Returns:

Dictionary in the STJ format with structure: {

”stj”: {
“version”: str, “metadata”: Optional[Dict[str, Any]], “transcript”: Dict[str, Any]

}

}

Return type:

STJDict

to_file(filename)

Saves the STJ instance to a JSON file.

Parameters:: filename (str) – Path where the JSON file should be written
Raises:: IOError – If there’s an error writing to the file
Return type:: None

property transcript: Transcript

Access to the STJ transcript.

Returns:: The transcript object containing all content
Return type:: Transcript
Raises:: ValueError – If STJ instance is not properly initialized

validate(raise_exception=True)

Validates the STJ data according to specification requirements.

Parameters:: raise_exception (bool) – If True, raises ValidationError for any issues.
Returns:: List of validation issues if found, None if valid.
Return type:: Optional[ValidationIssues]
Raises:: ValidationError – If validation fails and raise_exception is True.

Data Classes

Metadata

class stjlib.Metadata(transcriber=None, created_at=None, source=None, languages=None, confidence_threshold=None, extensions=None, _invalid_type=None)

Metadata for the Standard Transcription JSON (STJ).

This class contains various metadata fields that provide context and information about the transcription process and content.

Parameters:

transcriber (TypeAliasForwardRef('stjlib.stj.Transcriber') | None)
created_at (datetime | None)
source (TypeAliasForwardRef('stjlib.stj.Source') | None)
languages (List[str] | None)
confidence_threshold (float | None)
extensions (Dict[str, Any] | None)
_invalid_type (str | None)

transcriber

Information about the transcription system

Type:: Optional[Transcriber]

created_at

Timestamp when transcription was created

Type:: Optional[datetime]

source

Information about the source media

Type:: Optional[Source]

languages

List of languages in the transcription

Type:: Optional[List[str]]

confidence_threshold

Minimum confidence score for words

Type:: Optional[float]

extensions

Additional metadata key-value pairs

Type:: Optional[Dict[str, Any]]

Example

```python # Create metadata with transcriber and timestamp metadata = Metadata(

transcriber=Transcriber(
name=”AutoTranscribe”, version=”2.1.0”

), created_at=datetime.now(timezone.utc), languages=[“en”, “es”], confidence_threshold=0.8

)

# Access metadata information print(f”Created: {metadata.created_at.isoformat()}”) print(f”Languages: {’, ‘.join(metadata.languages)}”) ```

Note

All fields are optional
created_at must be timezone-aware if present
confidence_threshold must be between 0.0 and 1.0
languages must be valid ISO 639-1 or ISO 639-3 codes
extensions can contain arbitrary metadata

confidence_threshold: float | None = None

created_at: datetime | None = None

extensions: Dict[str, Any] | None = None

classmethod from_dict(data)

Creates a Metadata instance from a dictionary.

Handles timestamp parsing but preserves all other data as-is without validation.

Parameters:: data (Dict[str, Any]) – Dictionary containing metadata fields
Returns:: A new Metadata instance
Return type:: Metadata

Example

```python data = {

“transcriber”: {“name”: “AutoTranscribe”, “version”: “2.1.0”}, “created_at”: “2023-01-01T12:00:00Z”, “languages”: [“en”, “es”]

} metadata = Metadata.from_dict(data) ```

languages: List[str] | None = None

source: Source | None = None

to_dict()

Converts the Metadata instance to a dictionary.

Returns:: Dictionary representation of the metadata. Returns None if no fields are set.
Return type:: Dict[str, Any]

Example

```python metadata = Metadata(

transcriber=Transcriber(name=”AutoTranscribe”), created_at=datetime.now(timezone.utc)

) data = metadata.to_dict() ```

Note

created_at is converted to UTC and ISO format with ‘Z’ suffix
Empty optional fields are omitted from the output

transcriber: Transcriber | None = None

Transcript

class stjlib.Transcript(segments=<factory>, speakers=<factory>, styles=None, _invalid_segments_type=None, _invalid_type=None, _invalid_speakers_type=None, _invalid_styles_type=None)

Main content of the transcription.

This class contains the core transcription data, including speakers, segments, and optional style information. It represents the complete transcribed content with timing and formatting.

Parameters:

segments (List[Segment])
speakers (List[Speaker])
styles (List[Style] | None)
_invalid_segments_type (str | None)
_invalid_type (str | None)
_invalid_speakers_type (str | None)
_invalid_styles_type (str | None)

speakers

List of speakers in the transcript. Each speaker has a unique ID and optional metadata. Empty list indicates speaker identification was attempted but found none. Non-empty list indicates speakers were found.

Type:: List[Speaker]

segments

List of transcript segments. Segments contain the actual transcribed content with timing. Must not be empty.

Type:: List[Segment]

styles

Optional list of text formatting styles. Styles can be referenced by segments for formatting. Can be: - None: Style processing was not attempted - Empty list: Style processing performed but no styles defined - List of styles: One or more styles defined

Type:: Optional[List[Style]]

Example

```python # Create a basic transcript with one speaker and segment transcript = Transcript(

speakers=[
Speaker(id=”S1”, name=”John”)

], segments=[

Segment(
text=”Hello world”, start=0.0, end=1.5, speaker_id=”S1”

)

]

)

# Create a transcript where speaker identification found none transcript = Transcript(

speakers=[], # Empty list indicates attempted but none found segments=[

Segment(text=”Hello world”)

]

)

# Create a basic transcript (implicitly attempts speaker identification) transcript = Transcript(

segments=[
Segment(text=”Hello world”)

]

)

# Create a transcript with multiple speakers and styles transcript = Transcript(

speakers=[
Speaker(id=”S1”, name=”John”), Speaker(id=”S2”, name=”Jane”)

], segments=[

Segment(
text=”How are you?”, start=0.0, end=1.5, speaker_id=”S1”, style_id=”question”

), Segment(

text=”I’m fine, thanks!”, start=1.6, end=3.0, speaker_id=”S2”

)

], styles=[

Style(
id=”question”, text={“color”: “#0000FF”}

)

]

)

Note

segments list must not be empty
speakers list will be empty if none found, non-empty if speakers found
styles can be: - None: style processing not attempted - Empty list: styles were processed but none defined - List with items: styles were defined
segments must be ordered by time and must not overlap
all IDs must be unique within their respective lists
if speakers/styles are included, references to them must be valid

classmethod from_dict(data)

Creates a Transcript instance from a dictionary.

Parameters:

data (Dict[str, Any]) –

Dictionary containing transcript data with fields: - speakers (optional): List of speaker data. Treated as empty list

if key missing (indicating speaker identification attempted but none found)

segments (required): List of segment data
styles (optional): List of style data. If key missing, treated as “not attempted”. If present but empty, treated as “none defined”

Returns:

A new Transcript instance

Return type:

Transcript

Example

```python # Transcript with speakers found data = {

“speakers”: [{“id”: “S1”, “name”: “John”}], “segments”: [{

“text”: “Hello”, “start”: 0.0, “end”: 1.0, “speaker_id”: “S1”

}]

}

# Transcript where speaker identification found none data = {

“speakers”: [], # Empty list = none found “segments”: [{

“text”: “Hello”, “start”: 0.0, “end”: 1.0

}]

}

# Basic transcript (implicitly attempts speaker identification) data = {

“segments”: [{
“text”: “Hello”, “start”: 0.0, “end”: 1.0

}]

}

transcript = Transcript.from_dict(data) ```

segments: List[Segment]

speakers: List[Speaker]

styles: List[Style] | None = None

to_dict()

Converts the Transcript instance to a dictionary.

Returns:: Dictionary containing transcript data. Always includes segments and speakers (even if empty). Styles included only if style processing was attempted.
Return type:: Dict[str, Any]

Example

```python # Basic transcript (implicitly attempted speaker identification) transcript = Transcript(

segments=[Segment(text=”Hello”)]

) # Result includes empty speakers list: # {“speakers”: [], “segments”: […]}

# Transcript with speakers found transcript = Transcript(

speakers=[Speaker(id=”S1”, name=”John”)], segments=[

Segment(text=”Hello”, start=0.0, end=1.0, speaker_id=”S1”)

]

) # Result includes non-empty speakers list: # {“speakers”: [{“id”: “S1”, “name”: “John”}], “segments”: […]} ```

Segment

class stjlib.Segment(text, start=None, end=None, is_zero_duration=None, speaker_id=None, confidence=<object object>, language=None, style_id=None, word_timing_mode=None, words=None, extensions=None, _invalid_type=None, _invalid_words_type=None)

Timed segment in the transcript with optional word-level detail.

A segment represents a continuous portion of the transcript with its own timing, speaker, and optional word-level information.

Parameters:

text (str | None)
start (float | None)
end (float | None)
is_zero_duration (bool | None)
speaker_id (str | None)
confidence (float | None)
language (str | None)
style_id (str | None)
word_timing_mode (WordTimingMode | str | None)
words (List[Word] | None)
extensions (Dict[str, Any] | None)
_invalid_type (str | None)
_invalid_words_type (str | None)

text

The transcribed text content

Type:: str

start

Start time in seconds from beginning of media

Type:: Optional[float]

end

End time in seconds from beginning of media

Type:: Optional[float]

is_zero_duration

Flag for zero-duration segments

Type:: Optional[bool]

speaker_id

Reference to a speaker.id

Type:: Optional[str]

confidence

Confidence score between 0.0 and 1.0

Type:: Optional[float]

language

ISO 639-1 or ISO 639-3 language code

Type:: Optional[str]

style_id

Reference to a style.id

Type:: Optional[str]

word_timing_mode

Word timing completeness

Type:: Optional[WordTimingMode]

words

Word-level timing and text information

Type:: Optional[List[Word]]

extensions

Additional segment metadata

Type:: Optional[Dict[str, Any]]

Example

```python # Create a basic segment segment = Segment(

text=”Hello world”, start=0.0, end=1.5, speaker_id=”speaker-1”

)

# Create a segment with word timing segment = Segment(

text=”Hello world”, start=0.0, end=1.5, words=[

Word(text=”Hello”, start=0.0, end=0.8), Word(text=”world”, start=0.9, end=1.5)

], word_timing_mode=WordTimingMode.COMPLETE

)

Note

Segments must not overlap with each other
start and end must both be present or both absent
start must be >= 0 and end must be >= start
confidence must be between 0.0 and 1.0 if present
speaker_id must reference a valid speaker
style_id must reference a valid style
language must be a valid ISO code if present

confidence: float | None = <object object>

end: float | None = None

extensions: Dict[str, Any] | None = None

classmethod from_dict(data)

Creates a Segment instance from a dictionary.

Parameters:: data (Dict[str, Any]) – Dictionary containing segment data with fields: - text (required): Segment text content - start (optional): Start time in seconds - end (optional): End time in seconds - is_zero_duration (optional): Zero duration flag - speaker_id (optional): Reference to speaker - confidence (optional): Confidence score - language (optional): Language code - style_id (optional): Reference to style - word_timing_mode (optional): Word timing mode - words (optional): List of word data - extensions (optional): Additional metadata
Returns:: A new Segment instance
Return type:: Segment

Example

```python data = {

“text”: “Hello world”, “start”: 0.0, “end”: 1.5, “speaker_id”: “speaker-1”

} segment = Segment.from_dict(data) ```

is_zero_duration: bool | None = None

language: str | None = None

speaker_id: str | None = None

start: float | None = None

style_id: str | None = None

text: str | None

to_dict()

Converts the Segment instance to a dictionary.

Returns:: Dictionary containing segment data. Only includes non-None and non-empty fields.
Return type:: Dict[str, Any]

Example

```python segment = Segment(

text=”Hello world”, start=0.0, end=1.5, speaker_id=”speaker-1”

) data = segment.to_dict() ```

word_timing_mode: WordTimingMode | str | None = None

words: List[Word] | None = None

Enumerations

WordTimingMode

class stjlib.WordTimingMode(*values)

Word timing modes for transcript segments.

This enum defines the possible modes for word-level timing information within a segment. It indicates the completeness of timing data for words in the segment.

Values:

COMPLETE: All words in the segment have timing information.: Use this when every word has start and end times.
PARTIAL: Some words in the segment have timing information.: Use this when only some words have timing data.
NONE: No words in the segment have timing information.: Use this when words array exists but has no timing data.

Example

```python # Create a segment with complete word timing segment = Segment(

text=”Hello world”, word_timing_mode=WordTimingMode.COMPLETE, words=[

Word(text=”Hello”, start=0.0, end=0.5), Word(text=”world”, start=0.6, end=1.0)

]

)

# Create a segment with partial word timing segment = Segment(

text=”Hello world”, word_timing_mode=WordTimingMode.PARTIAL, words=[

Word(text=”Hello”, start=0.0, end=0.5), Word(text=”world”) # No timing for this word

]

)

# Create a segment with no word timing segment = Segment(

text=”Hello world”, word_timing_mode=WordTimingMode.NONE, words=[

Word(text=”Hello”), Word(text=”world”)

]

)

Note

The word_timing_mode field affects validation requirements
COMPLETE mode requires all words to have timing data
PARTIAL mode allows mixed timing presence
NONE mode requires no timing data
The mode must match the actual timing data presence

COMPLETE = 'complete'

NONE = 'none'

PARTIAL = 'partial'

Exceptions

class stjlib.STJError

Bases: Exception

Base class for exceptions in the STJ module.

This class serves as the parent class for all STJ-specific exceptions, allowing for specific error handling of STJ-related issues.

Example

```python try:

stj = StandardTranscriptionJSON.from_file(“invalid.json”)

except STJError as e:: print(f”STJ error occurred: {e}”)

```

class stjlib.ValidationError(issues)

Bases: STJError

Exception raised when STJ validation fails.

This exception includes a list of validation issues that describe the specific problems found during validation.

Parameters:: issues (List[ValidationIssue])

issues

List of validation issues found

Type:: List[ValidationIssue]

Example

```python try:

stj = StandardTranscriptionJSON.from_file(
“transcript.json”, validate=True

)

except ValidationError as e:: print(“Validation failed:”) for issue in e.issues:

print(f”{issue.severity}: {issue}”)

```

__str__()

Returns a formatted string of all validation issues.

Returns:: Multi-line string containing all validation issues
Return type:: str