Standard for transcribing phone calls into text

This document presents a standard recommended by our company for transcribing phone calls into text.

This feature is currently in BETA. You can submit your suggestions to our development team via the Hipcall Development Form.

Version

Semantic versioning is used for versioning. The current version is v0.1.0.

JSON Structure

The JSON consists of four nodes.

  • version: (string) The version of the document. Example: “1.0.0”
  • id: (string/uuid) The ID of the call. Example: “e8945fd1-5ae0-4615-868c-29b2d50bd41b”
  • started_at: (string) The time the call started, according to the UTC/ETC time zone. Example: “2024-07-10T17:28:29.902987Z”
  • output: (object) The output of the translation

Output standard

  • language: (string) The language consisting of two letters. For example: “en”
  • segments: (array)
    • segment: (object)
      • id: (string) A string in the form of a UUID.
      • end: (number) The second the speech ended
      • text: (string) Speech text
      • start: (number) The second the speech began
      • speaker: Speaker
      • avg_logprob: (number) Probability/confidence score
      • sentiment: (string) Speaker sentiment analysis (can be positive, neutral, negative, or empty)
      • words:
        • id: (string) UUID in string format.
        • end: (number) The second the word starts.
        • word: (string) The word itself
        • start: (number) The second the word ends
        • probability: (number) Probability/confidence score
  • num_speakers: (integer) Number of users in the conversation. For example: 2

JSON Example

{
"version": "0.1.0",
"id": "e8945fd1-5ae0-4615-868c-29b2d50bd41b",
"started_at": "2024-07-10T17:28:29.902987Z",
"output": {
"language": "tr",
"segments": [
{
"id": "9ffcb83c-b5d3-46dc-9cd6-d79429b03e9b",
"end": 1443.91,
"text": "Merhaba, Hipcall'dan Onur ben",
"start": 1410.3,
"words": [
{
"id": "a5fdf544-efb7-49c9-b79c-ef7231f09ce5",
"end": 1410.3,
"word": "Merhaba",
"start": 1410.3,
"probability": 0.8740234375
},
{
"id": "8f063802-b7d7-49ca-a266-0dcb94b4d506",
"end": 1410.44,
"word": "Hipcall'dan",
"start": 1410.3,
"probability": 0.37744140625
}
],
"speaker": "SPEAKER_00",
"avg_logprob": -0.32476635959660893,
"sentiment": "positive"
}
],
"num_speakers": 2
}
}