STJLib Modules
STJLib: Standard Transcription JSON Format Handler
A comprehensive implementation of the Standard Transcription JSON (STJ) format for representing transcribed audio and video data.
- Module Organization:
Type definitions and type hints
Exception classes for error handling
Main STJ handler class
Helper functions and utilities
- Key Features:
Complete STJ format implementation
Load and save STJ files with robust error handling
Comprehensive validation against specification
Type-safe data structures
Extensible architecture
- Full support for all STJ components:
Metadata and source information
Transcript content and timing
Speaker identification
Word-level timing and confidence
Text formatting and styles
Custom extensions
Example
```python from stjlib import StandardTranscriptionJSON
# Create new transcript stj = StandardTranscriptionJSON(
- metadata=Metadata(
transcriber=Transcriber(name=”MyTranscriber”, version=”1.0”)
)
)
# Adding content stj.add_speaker(“s1”, “John”) stj.add_segment(
text=”Hello world”, start=0.0, end=1.5, speaker_id=”s1”
)
# Loading existing data existing = StandardTranscriptionJSON.from_file(“transcript.stjson”) ```
Note
For detailed format information, see: https://github.com/yaniv-golan/STJ
- Format Structure:
The STJ format wraps content in a root “stj” object: {
- “stj”: {
“version”: “0.6.1”, “metadata”: {
“transcriber”: {“name”: str, “version”: str}, “created_at”: str, # ISO 8601 UTC timestamp …
}, “transcript”: {
“segments”: […], “speakers”: […], …
}
}
}
- exception stjlib.stj.STJError
Bases:
ExceptionBase class for exceptions in the STJ module.
This class serves as the parent class for all STJ-specific exceptions, allowing for specific error handling of STJ-related issues.
Example
stj = StandardTranscriptionJSON.from_file(“invalid.json”)
- except STJError as e:
print(f”STJ error occurred: {e}”)
- class stjlib.stj.StandardTranscriptionJSON(metadata=None, transcript=None, validate=False)
Bases:
objectHandler for Standard Transcription JSON (STJ) format.
This class implements version 0.6.1 of the STJ specification, providing a high-level interface for working with STJ documents.
- Format Structure:
The STJ format wraps content in a root “stj” object: {
- “stj”: {
“version”: “0.6.1”, “metadata”: {
“transcriber”: {“name”: str, “version”: str}, “created_at”: str, # ISO 8601 UTC timestamp …
}, “transcript”: {
“segments”: […], “speakers”: […], …
}
}
}
- Features:
Create new transcripts
Load existing STJ files
Add/modify transcript content
Validate STJ data
Save transcripts to files
Example
>>> stj = StandardTranscriptionJSON( ... metadata=Metadata( ... transcriber=Transcriber(name="MyTranscriber", version="1.0") ... ) ... ) >>> stj.add_speaker("s1", "John") >>> stj.add_segment(text="Hello", start=0.0, end=1.5, speaker_id="s1")
Note
The STJ format is a standardized way to represent transcribed audio/video content with support for timing, speakers, and metadata.
- Parameters:
metadata (Metadata | None)
transcript (Transcript | None)
validate (bool)
- add_segment(text, start=None, end=None, *, speaker_id=None, language=None, **kwargs)
Add a new transcript segment.
- Parameters:
text (str) – The transcribed text
start (float | None) – Start time in seconds
end (float | None) – End time in seconds
speaker_id (str | None) – Optional ID of the speaker
language (str | None) – Optional language code
**kwargs – Additional segment properties
- Raises:
ValueError – If text is empty or if end time is before start time
- Return type:
None
- add_speaker(id, name=None)
Add a new speaker.
- Parameters:
id (str) – Unique speaker identifier
name (str | None) – Optional display name for the speaker
- Raises:
ValueError – If id is empty or if speaker with same id already exists
- Return type:
None
- clear_segments()
Remove all segments from the transcript.
- Return type:
None
- classmethod create_from_stj(stj)
Internal method to create instance from STJ object.
This method is used internally by: - from_dict() - from_file()
For new instances, use the constructor.
- Parameters:
stj (STJ) – An existing STJ object
- Returns:
New instance wrapping the STJ object
- Return type:
StandardTranscriptionJSON
Note
This is an internal method and should not be used directly. Use the constructor for creating new instances.
- classmethod from_dict(data, validate=False)
Creates a StandardTranscriptionJSON instance from a dictionary.
- Parameters:
data (Dict[str, Any]) – Dictionary containing STJ data
validate (bool) – Whether to validate the data (defaults to False)
- Returns:
A new StandardTranscriptionJSON instance
- Return type:
StandardTranscriptionJSON
- Raises:
ValidationError – If validate=True and the data fails validation
- classmethod from_file(filename, validate=False)
Creates a StandardTranscriptionJSON instance from a JSON file.
- Parameters:
filename (str) – Path to the JSON file to load
validate (bool) – Whether to validate the loaded data
- Returns:
New instance with loaded data
- Return type:
StandardTranscriptionJSON
- Raises:
FileNotFoundError – If file doesn’t exist
ValidationError – If validation fails or data structure is invalid
- get_segments_by_speaker(speaker_id)
Get all segments for a specific speaker.
- Parameters:
speaker_id (str) – ID of the speaker to find segments for
- Returns:
List of segments with matching speaker_id. Empty list if none found.
- Return type:
List[Segment]
- Raises:
ValueError – If speaker_id is empty or whitespace
- get_speaker(speaker_id)
Get speaker by ID.
- Parameters:
speaker_id (str) – ID of the speaker to find
- Returns:
Speaker if found, None if not found
- Return type:
Optional[Speaker]
- Raises:
ValueError – If speaker_id is empty or whitespace
- property metadata: Metadata | None
Access to the STJ metadata.
- Returns:
The metadata object or None if not present
- Return type:
Optional[Metadata]
- Raises:
ValueError – If STJ instance is not properly initialized
- to_dict()
Convert to STJ format dictionary.
- Returns:
Dictionary in the STJ format with structure: {
- ”stj”: {
“version”: str, “metadata”: Optional[Dict[str, Any]], “transcript”: Dict[str, Any]
}
}
- Return type:
STJDict
- to_file(filename)
Saves the STJ instance to a JSON file.
- Parameters:
filename (str) – Path where the JSON file should be written
- Raises:
IOError – If there’s an error writing to the file
- Return type:
None
- property transcript: Transcript
Access to the STJ transcript.
- Returns:
The transcript object containing all content
- Return type:
Transcript
- Raises:
ValueError – If STJ instance is not properly initialized
- validate(raise_exception=True)
Validates the STJ data according to specification requirements.
- Parameters:
raise_exception (bool) – If True, raises ValidationError for any issues.
- Returns:
List of validation issues if found, None if valid.
- Return type:
Optional[ValidationIssues]
- Raises:
ValidationError – If validation fails and raise_exception is True.
- exception stjlib.stj.ValidationError(issues)
Bases:
STJErrorException raised when STJ validation fails.
This exception includes a list of validation issues that describe the specific problems found during validation.
- Parameters:
issues (List[ValidationIssue])
- issues
List of validation issues found
- Type:
List[ValidationIssue]
Example
- stj = StandardTranscriptionJSON.from_file(
“transcript.json”, validate=True
)
- except ValidationError as e:
print(“Validation failed:”) for issue in e.issues:
print(f”{issue.severity}: {issue}”)
- __str__()
Returns a formatted string of all validation issues.
- Returns:
Multi-line string containing all validation issues
- Return type:
str