API Reference
StandardTranscriptionJSON
- class stjlib.StandardTranscriptionJSON(metadata=None, transcript=None, validate=False)
Bases:
objectHandler for Standard Transcription JSON (STJ) format.
This class implements version 0.6.1 of the STJ specification, providing a high-level interface for working with STJ documents.
- Format Structure:
The STJ format wraps content in a root “stj” object: {
- “stj”: {
“version”: “0.6.1”, “metadata”: {
“transcriber”: {“name”: str, “version”: str}, “created_at”: str, # ISO 8601 UTC timestamp …
}, “transcript”: {
“segments”: […], “speakers”: […], …
}
}
}
- Features:
Create new transcripts
Load existing STJ files
Add/modify transcript content
Validate STJ data
Save transcripts to files
Example
>>> stj = StandardTranscriptionJSON( ... metadata=Metadata( ... transcriber=Transcriber(name="MyTranscriber", version="1.0") ... ) ... ) >>> stj.add_speaker("s1", "John") >>> stj.add_segment(text="Hello", start=0.0, end=1.5, speaker_id="s1")
Note
The STJ format is a standardized way to represent transcribed audio/video content with support for timing, speakers, and metadata.
- Parameters:
metadata (Metadata | None)
transcript (Transcript | None)
validate (bool)
- add_segment(text, start=None, end=None, *, speaker_id=None, language=None, **kwargs)
Add a new transcript segment.
- Parameters:
text (str) – The transcribed text
start (float | None) – Start time in seconds
end (float | None) – End time in seconds
speaker_id (str | None) – Optional ID of the speaker
language (str | None) – Optional language code
**kwargs – Additional segment properties
- Raises:
ValueError – If text is empty or if end time is before start time
- Return type:
None
- add_speaker(id, name=None)
Add a new speaker.
- Parameters:
id (str) – Unique speaker identifier
name (str | None) – Optional display name for the speaker
- Raises:
ValueError – If id is empty or if speaker with same id already exists
- Return type:
None
- clear_segments()
Remove all segments from the transcript.
- Return type:
None
- classmethod create_from_stj(stj)
Internal method to create instance from STJ object.
This method is used internally by: - from_dict() - from_file()
For new instances, use the constructor.
- Parameters:
stj (STJ) – An existing STJ object
- Returns:
New instance wrapping the STJ object
- Return type:
StandardTranscriptionJSON
Note
This is an internal method and should not be used directly. Use the constructor for creating new instances.
- classmethod from_dict(data, validate=False)
Creates a StandardTranscriptionJSON instance from a dictionary.
- Parameters:
data (Dict[str, Any]) – Dictionary containing STJ data
validate (bool) – Whether to validate the data (defaults to False)
- Returns:
A new StandardTranscriptionJSON instance
- Return type:
StandardTranscriptionJSON
- Raises:
ValidationError – If validate=True and the data fails validation
- classmethod from_file(filename, validate=False)
Creates a StandardTranscriptionJSON instance from a JSON file.
- Parameters:
filename (str) – Path to the JSON file to load
validate (bool) – Whether to validate the loaded data
- Returns:
New instance with loaded data
- Return type:
StandardTranscriptionJSON
- Raises:
FileNotFoundError – If file doesn’t exist
ValidationError – If validation fails or data structure is invalid
- get_segments_by_speaker(speaker_id)
Get all segments for a specific speaker.
- Parameters:
speaker_id (str) – ID of the speaker to find segments for
- Returns:
List of segments with matching speaker_id. Empty list if none found.
- Return type:
List[Segment]
- Raises:
ValueError – If speaker_id is empty or whitespace
- get_speaker(speaker_id)
Get speaker by ID.
- Parameters:
speaker_id (str) – ID of the speaker to find
- Returns:
Speaker if found, None if not found
- Return type:
Optional[Speaker]
- Raises:
ValueError – If speaker_id is empty or whitespace
- property metadata: Metadata | None
Access to the STJ metadata.
- Returns:
The metadata object or None if not present
- Return type:
Optional[Metadata]
- Raises:
ValueError – If STJ instance is not properly initialized
- to_dict()
Convert to STJ format dictionary.
- Returns:
Dictionary in the STJ format with structure: {
- ”stj”: {
“version”: str, “metadata”: Optional[Dict[str, Any]], “transcript”: Dict[str, Any]
}
}
- Return type:
STJDict
- to_file(filename)
Saves the STJ instance to a JSON file.
- Parameters:
filename (str) – Path where the JSON file should be written
- Raises:
IOError – If there’s an error writing to the file
- Return type:
None
- property transcript: Transcript
Access to the STJ transcript.
- Returns:
The transcript object containing all content
- Return type:
Transcript
- Raises:
ValueError – If STJ instance is not properly initialized
- validate(raise_exception=True)
Validates the STJ data according to specification requirements.
- Parameters:
raise_exception (bool) – If True, raises ValidationError for any issues.
- Returns:
List of validation issues if found, None if valid.
- Return type:
Optional[ValidationIssues]
- Raises:
ValidationError – If validation fails and raise_exception is True.
Data Classes
Metadata
- class stjlib.Metadata(transcriber=None, created_at=None, source=None, languages=None, confidence_threshold=None, extensions=None, _invalid_type=None)
Metadata for the Standard Transcription JSON (STJ).
This class contains various metadata fields that provide context and information about the transcription process and content.
- Parameters:
transcriber (TypeAliasForwardRef('stjlib.stj.Transcriber') | None)
created_at (datetime | None)
source (TypeAliasForwardRef('stjlib.stj.Source') | None)
languages (List[str] | None)
confidence_threshold (float | None)
extensions (Dict[str, Any] | None)
_invalid_type (str | None)
- transcriber
Information about the transcription system
- Type:
Optional[Transcriber]
- created_at
Timestamp when transcription was created
- Type:
Optional[datetime]
- source
Information about the source media
- Type:
Optional[Source]
- languages
List of languages in the transcription
- Type:
Optional[List[str]]
- confidence_threshold
Minimum confidence score for words
- Type:
Optional[float]
- extensions
Additional metadata key-value pairs
- Type:
Optional[Dict[str, Any]]
Example
```python # Create metadata with transcriber and timestamp metadata = Metadata(
- transcriber=Transcriber(
name=”AutoTranscribe”, version=”2.1.0”
), created_at=datetime.now(timezone.utc), languages=[“en”, “es”], confidence_threshold=0.8
)
# Access metadata information print(f”Created: {metadata.created_at.isoformat()}”) print(f”Languages: {’, ‘.join(metadata.languages)}”) ```
Note
All fields are optional
created_at must be timezone-aware if present
confidence_threshold must be between 0.0 and 1.0
languages must be valid ISO 639-1 or ISO 639-3 codes
extensions can contain arbitrary metadata
- confidence_threshold: float | None = None
- created_at: datetime | None = None
- extensions: Dict[str, Any] | None = None
- classmethod from_dict(data)
Creates a Metadata instance from a dictionary.
Handles timestamp parsing but preserves all other data as-is without validation.
- Parameters:
data (Dict[str, Any]) – Dictionary containing metadata fields
- Returns:
A new Metadata instance
- Return type:
Metadata
Example
“transcriber”: {“name”: “AutoTranscribe”, “version”: “2.1.0”}, “created_at”: “2023-01-01T12:00:00Z”, “languages”: [“en”, “es”]
- languages: List[str] | None = None
- source: Source | None = None
- to_dict()
Converts the Metadata instance to a dictionary.
- Returns:
Dictionary representation of the metadata. Returns None if no fields are set.
- Return type:
Dict[str, Any]
Example
```python metadata = Metadata(
transcriber=Transcriber(name=”AutoTranscribe”), created_at=datetime.now(timezone.utc)
) data = metadata.to_dict() ```
Note
created_at is converted to UTC and ISO format with ‘Z’ suffix
Empty optional fields are omitted from the output
- transcriber: Transcriber | None = None
Transcript
- class stjlib.Transcript(segments=<factory>, speakers=<factory>, styles=None, _invalid_segments_type=None, _invalid_type=None, _invalid_speakers_type=None, _invalid_styles_type=None)
Main content of the transcription.
This class contains the core transcription data, including speakers, segments, and optional style information. It represents the complete transcribed content with timing and formatting.
- Parameters:
segments (List[Segment])
speakers (List[Speaker])
styles (List[Style] | None)
_invalid_segments_type (str | None)
_invalid_type (str | None)
_invalid_speakers_type (str | None)
_invalid_styles_type (str | None)
- speakers
List of speakers in the transcript. Each speaker has a unique ID and optional metadata. Empty list indicates speaker identification was attempted but found none. Non-empty list indicates speakers were found.
- Type:
List[Speaker]
- segments
List of transcript segments. Segments contain the actual transcribed content with timing. Must not be empty.
- Type:
List[Segment]
- styles
Optional list of text formatting styles. Styles can be referenced by segments for formatting. Can be: - None: Style processing was not attempted - Empty list: Style processing performed but no styles defined - List of styles: One or more styles defined
- Type:
Optional[List[Style]]
Example
```python # Create a basic transcript with one speaker and segment transcript = Transcript(
- speakers=[
Speaker(id=”S1”, name=”John”)
], segments=[
- Segment(
text=”Hello world”, start=0.0, end=1.5, speaker_id=”S1”
)
]
)
# Create a transcript where speaker identification found none transcript = Transcript(
speakers=[], # Empty list indicates attempted but none found segments=[
Segment(text=”Hello world”)
]
)
# Create a basic transcript (implicitly attempts speaker identification) transcript = Transcript(
- segments=[
Segment(text=”Hello world”)
]
)
# Create a transcript with multiple speakers and styles transcript = Transcript(
- speakers=[
Speaker(id=”S1”, name=”John”), Speaker(id=”S2”, name=”Jane”)
], segments=[
- Segment(
text=”How are you?”, start=0.0, end=1.5, speaker_id=”S1”, style_id=”question”
), Segment(
text=”I’m fine, thanks!”, start=1.6, end=3.0, speaker_id=”S2”
)
], styles=[
- Style(
id=”question”, text={“color”: “#0000FF”}
)
]
)
Note
segments list must not be empty
speakers list will be empty if none found, non-empty if speakers found
styles can be: - None: style processing not attempted - Empty list: styles were processed but none defined - List with items: styles were defined
segments must be ordered by time and must not overlap
all IDs must be unique within their respective lists
if speakers/styles are included, references to them must be valid
- classmethod from_dict(data)
Creates a Transcript instance from a dictionary.
- Parameters:
data (Dict[str, Any]) –
Dictionary containing transcript data with fields: - speakers (optional): List of speaker data. Treated as empty list
if key missing (indicating speaker identification attempted but none found)
segments (required): List of segment data
styles (optional): List of style data. If key missing, treated as “not attempted”. If present but empty, treated as “none defined”
- Returns:
A new Transcript instance
- Return type:
Transcript
Example
```python # Transcript with speakers found data = {
“speakers”: [{“id”: “S1”, “name”: “John”}], “segments”: [{
“text”: “Hello”, “start”: 0.0, “end”: 1.0, “speaker_id”: “S1”
}]
}
# Transcript where speaker identification found none data = {
“speakers”: [], # Empty list = none found “segments”: [{
“text”: “Hello”, “start”: 0.0, “end”: 1.0
}]
}
# Basic transcript (implicitly attempts speaker identification) data = {
- “segments”: [{
“text”: “Hello”, “start”: 0.0, “end”: 1.0
}]
}
- segments: List[Segment]
- speakers: List[Speaker]
- styles: List[Style] | None = None
- to_dict()
Converts the Transcript instance to a dictionary.
- Returns:
Dictionary containing transcript data. Always includes segments and speakers (even if empty). Styles included only if style processing was attempted.
- Return type:
Dict[str, Any]
Example
```python # Basic transcript (implicitly attempted speaker identification) transcript = Transcript(
segments=[Segment(text=”Hello”)]
) # Result includes empty speakers list: # {“speakers”: [], “segments”: […]}
# Transcript with speakers found transcript = Transcript(
speakers=[Speaker(id=”S1”, name=”John”)], segments=[
Segment(text=”Hello”, start=0.0, end=1.0, speaker_id=”S1”)
]
) # Result includes non-empty speakers list: # {“speakers”: [{“id”: “S1”, “name”: “John”}], “segments”: […]} ```
Segment
- class stjlib.Segment(text, start=None, end=None, is_zero_duration=None, speaker_id=None, confidence=<object object>, language=None, style_id=None, word_timing_mode=None, words=None, extensions=None, _invalid_type=None, _invalid_words_type=None)
Timed segment in the transcript with optional word-level detail.
A segment represents a continuous portion of the transcript with its own timing, speaker, and optional word-level information.
- Parameters:
text (str | None)
start (float | None)
end (float | None)
is_zero_duration (bool | None)
speaker_id (str | None)
confidence (float | None)
language (str | None)
style_id (str | None)
word_timing_mode (WordTimingMode | str | None)
words (List[Word] | None)
extensions (Dict[str, Any] | None)
_invalid_type (str | None)
_invalid_words_type (str | None)
- text
The transcribed text content
- Type:
str
- start
Start time in seconds from beginning of media
- Type:
Optional[float]
- end
End time in seconds from beginning of media
- Type:
Optional[float]
- is_zero_duration
Flag for zero-duration segments
- Type:
Optional[bool]
- speaker_id
Reference to a speaker.id
- Type:
Optional[str]
- confidence
Confidence score between 0.0 and 1.0
- Type:
Optional[float]
- language
ISO 639-1 or ISO 639-3 language code
- Type:
Optional[str]
- style_id
Reference to a style.id
- Type:
Optional[str]
- word_timing_mode
Word timing completeness
- Type:
Optional[WordTimingMode]
- words
Word-level timing and text information
- Type:
Optional[List[Word]]
- extensions
Additional segment metadata
- Type:
Optional[Dict[str, Any]]
Example
```python # Create a basic segment segment = Segment(
text=”Hello world”, start=0.0, end=1.5, speaker_id=”speaker-1”
)
# Create a segment with word timing segment = Segment(
text=”Hello world”, start=0.0, end=1.5, words=[
Word(text=”Hello”, start=0.0, end=0.8), Word(text=”world”, start=0.9, end=1.5)
], word_timing_mode=WordTimingMode.COMPLETE
)
Note
Segments must not overlap with each other
start and end must both be present or both absent
start must be >= 0 and end must be >= start
confidence must be between 0.0 and 1.0 if present
speaker_id must reference a valid speaker
style_id must reference a valid style
language must be a valid ISO code if present
- confidence: float | None = <object object>
- end: float | None = None
- extensions: Dict[str, Any] | None = None
- classmethod from_dict(data)
Creates a Segment instance from a dictionary.
- Parameters:
data (Dict[str, Any]) – Dictionary containing segment data with fields: - text (required): Segment text content - start (optional): Start time in seconds - end (optional): End time in seconds - is_zero_duration (optional): Zero duration flag - speaker_id (optional): Reference to speaker - confidence (optional): Confidence score - language (optional): Language code - style_id (optional): Reference to style - word_timing_mode (optional): Word timing mode - words (optional): List of word data - extensions (optional): Additional metadata
- Returns:
A new Segment instance
- Return type:
Segment
Example
“text”: “Hello world”, “start”: 0.0, “end”: 1.5, “speaker_id”: “speaker-1”
- is_zero_duration: bool | None = None
- language: str | None = None
- speaker_id: str | None = None
- start: float | None = None
- style_id: str | None = None
- text: str | None
- to_dict()
Converts the Segment instance to a dictionary.
- Returns:
Dictionary containing segment data. Only includes non-None and non-empty fields.
- Return type:
Dict[str, Any]
Example
text=”Hello world”, start=0.0, end=1.5, speaker_id=”speaker-1”
- word_timing_mode: WordTimingMode | str | None = None
- words: List[Word] | None = None
Enumerations
WordTimingMode
- class stjlib.WordTimingMode(*values)
Word timing modes for transcript segments.
This enum defines the possible modes for word-level timing information within a segment. It indicates the completeness of timing data for words in the segment.
- Values:
- COMPLETE: All words in the segment have timing information.
Use this when every word has start and end times.
- PARTIAL: Some words in the segment have timing information.
Use this when only some words have timing data.
- NONE: No words in the segment have timing information.
Use this when words array exists but has no timing data.
Example
```python # Create a segment with complete word timing segment = Segment(
text=”Hello world”, word_timing_mode=WordTimingMode.COMPLETE, words=[
Word(text=”Hello”, start=0.0, end=0.5), Word(text=”world”, start=0.6, end=1.0)
]
)
# Create a segment with partial word timing segment = Segment(
text=”Hello world”, word_timing_mode=WordTimingMode.PARTIAL, words=[
Word(text=”Hello”, start=0.0, end=0.5), Word(text=”world”) # No timing for this word
]
)
# Create a segment with no word timing segment = Segment(
text=”Hello world”, word_timing_mode=WordTimingMode.NONE, words=[
Word(text=”Hello”), Word(text=”world”)
]
)
Note
The word_timing_mode field affects validation requirements
COMPLETE mode requires all words to have timing data
PARTIAL mode allows mixed timing presence
NONE mode requires no timing data
The mode must match the actual timing data presence
- COMPLETE = 'complete'
- NONE = 'none'
- PARTIAL = 'partial'
Exceptions
- class stjlib.STJError
Bases:
ExceptionBase class for exceptions in the STJ module.
This class serves as the parent class for all STJ-specific exceptions, allowing for specific error handling of STJ-related issues.
Example
stj = StandardTranscriptionJSON.from_file(“invalid.json”)
- except STJError as e:
print(f”STJ error occurred: {e}”)
- class stjlib.ValidationError(issues)
Bases:
STJErrorException raised when STJ validation fails.
This exception includes a list of validation issues that describe the specific problems found during validation.
- Parameters:
issues (List[ValidationIssue])
- issues
List of validation issues found
- Type:
List[ValidationIssue]
Example
- stj = StandardTranscriptionJSON.from_file(
“transcript.json”, validate=True
)
- except ValidationError as e:
print(“Validation failed:”) for issue in e.issues:
print(f”{issue.severity}: {issue}”)
- __str__()
Returns a formatted string of all validation issues.
- Returns:
Multi-line string containing all validation issues
- Return type:
str