Synapses: Connecting to Language Model Models (LLMs)

Synapses in IntelliBricks are your gateway to the world of Language Model Models (LLMs). A Synapse acts as a smart connector, handling the complexities of interacting with different LLM providers and models in a consistent and Pythonic way. It abstracts away the low-level API details, allowing you to focus on what you want to achieve with LLMs, rather than how to communicate with them.

Core Concepts of Synapses

  • Abstraction Layer: Synapses provide a unified interface for interacting with various LLM providers (e.g., Google Gemini, OpenAI, Groq, Cerebras, DeepInfra). You can switch between different models or providers with minimal code changes.

  • Model Configuration: Synapses encapsulate the configuration for a specific LLM model, including model name, API keys, and provider-specific settings.

  • Text Completion and Chat: Synapses offer methods for both simple text completion (complete, complete_async) and chat-based interactions (chat, chat_async).

  • Structured Output Handling: Synapses seamlessly handle structured outputs when you specify a response_model. They ensure that LLMs return data in your desired format, automatically parsing the responses.

  • Fault Tolerance (Synapse Cascade): IntelliBricks provides SynapseCascade to create fault-tolerant applications. A Synapse Cascade allows you to specify a list of Synapses; if one fails, the system automatically falls back to the next one in the cascade.

  • Text Transcription Synapse: For audio processing, TextTranscriptionSynapse is specialized for audio transcription tasks, connecting to transcription services like OpenAI Whisper or Groq Whisper.

Initializing Synapses

Synapses are typically initialized using the static factory method Synapse.of(), which simplifies configuration based on model identifiers.

Basic Synapse Initialization

To initialize a Synapse for Google Gemini Pro:

from intellibricks.llms import Synapse

gemini_synapse = Synapse.of("google/genai/gemini-pro-experimental")

# Ensure you have set your GOOGLE_API_KEY environment variable.
# For Vertex AI models, project and location might also be needed.

To initialize a Synapse for OpenAI’s gpt-4o model:

openai_synapse = Synapse.of("openai/api/gpt-4o")

# Ensure you have set your OPENAI_API_KEY environment variable.

To initialize a Synapse for Groq’s mixtral-8x7b-32768 model:

groq_synapse = Synapse.of("groq/api/mixtral-8x7b-32768"8)

# Ensure you have set your GROQ_API_KEY environment variable.

Synapse Methods: ``complete`` and ``chat``

Synapses provide two primary methods for interacting with LLMs:

  1. complete(prompt, **kwargs) / complete_async(prompt, **kwargs): For simple text completion tasks. You provide a prompt, and the Synapse returns a completion.

  2. chat(messages, **kwargs) / chat_async(messages, **kwargs): For chat-based interactions. You provide a list of Message objects representing the conversation history, and the Synapse returns a chat response.

Using ``complete`` for Text Generation

Let’s use the complete method to generate a short story with the Gemini Synapse:

completion_response = gemini_synapse.complete("Write a short story about a robot learning to love.")
print(completion_response.text)

Key parameters for complete (and complete_async):

  • prompt (str | Prompt | PartType | Sequence[PartType]): The prompt for text generation. Can be a simple string, a Prompt object for structured prompts, or a PartType or sequence of PartType for multimodal prompts.

  • response_model (Optional[Type[S]]): An optional msgspec.Struct class to define the structure of the expected output. If provided, IntelliBricks will attempt to parse the LLM response into this structure.

  • Other generation parameters: temperature, max_completion_tokens, top_p, top_k, stop_sequences, etc., to control the LLM’s generation behavior.

Using ``chat`` for Conversational Interactions

For chat-based interactions, use the chat method. You need to provide a sequence of Message objects to represent the conversation history.

from intellibricks.llms import UserMessage, AssistantMessage

chat_messages = [
    UserMessage.from_text("Hello, are you there?"),
    AssistantMessage.from_text("Yes, I am here. How can I help you today?"),
    UserMessage.from_text("Tell me a joke."),
]

chat_response = gemini_synapse.chat(chat_messages)
print(chat_response.text)

Key parameters for chat (and chat_async):

  • messages (Sequence[Message]): A list of Message objects representing the conversation history. IntelliBricks provides UserMessage, AssistantMessage, and DeveloperMessage message types.

  • response_model (Optional[Type[S]]): Similar to complete, you can provide a response_model for structured chat responses.

  • tools (Optional[Sequence[ToolInputType]]): A list of tools that the LLM can use during the chat interaction (function calling).

  • Other generation parameters: temperature, max_completion_tokens, top_p, top_k, stop_sequences, etc.

Synapse Cascade for Fault Tolerance

To enhance the reliability of your application, you can use SynapseCascade. It allows you to specify a list of Synapses, and if the first one fails (e.g., due to API issues, rate limits), it automatically tries the next one in the list.

from intellibricks.llms import SynapseCascade

synapse_cascade = SynapseCascade(
    synapses=[
        Synapse.of("openai/api/gpt-4o"),       # Primary Synapse
        Synapse.of("google/genai/gemini-1.5-flash"),   # Fallback Synapse 1
        Synapse.of("cerebras/api/llama-3.3-70b"), # Fallback Synapse 2
    ]
)

# Use synapse_cascade just like a regular Synapse
try:
    response = synapse_cascade.complete("Translate 'Hello' to Spanish.")
    print(response.text)
except Exception as e:
    print(f"All synapses failed: {e}")

SynapseCascade attempts to use the Synapses in the order they are listed. If a Synapse call fails, it catches the exception and tries the next Synapse in the cascade. If all Synapses fail, it raises the last encountered exception.

TextTranscriptionSynapse for Audio Transcription

For audio transcription tasks, use TextTranscriptionSynapse.

from intellibricks.llms import TextTranscriptionSynapse

whisper_synapse = TextTranscriptionSynapse.of("groq/api/distil-whisper-large-v3-en") # Or "openai/api/whisper-1"

audio_file_path = "path/to/your/audiofile.mp3" # Replace with your audio file

try:
    transcription_response = whisper_synapse.transcribe(audio_file_path)
    print("Transcription Text:")
    print(transcription_response.text)
    print("\nTranscription Segments (first 3):")
    for segment in transcription_response.segments[:3]:
        print(f"- Segment {segment.id}: {segment.sentence} ({segment.start:.2f}s - {segment.end:.2f}s)")

    # You can also get SRT subtitles
    srt_subtitles = transcription_response.srt
    print("\nSRT Subtitles (first few lines):")
    print(srt_subtitles[:200] + "...") # Print first 200 chars of SRT

except Exception as e:
    print(f"Transcription failed: {e}")

Key methods for TextTranscriptionSynapse:

  • transcribe(audio, **kwargs) / transcribe_async(audio, **kwargs): Transcribes audio content. The audio parameter can be a file path, bytes data, or a file-like object.

TextTranscriptionsSynapseCascade for Fault-Tolerant Transcriptions

Similar to SynapseCascade, IntelliBricks offers TextTranscriptionsSynapseCascade for fault-tolerant audio transcriptions.

from intellibricks.llms import TextTranscriptionsSynapseCascade

transcription_cascade = TextTranscriptionsSynapseCascade(
    synapses=[
        TextTranscriptionSynapse.of("groq/api/whisper-large-v3-turbo"), # Primary transcription service
        # Add other transcription synapses as fallbacks if needed
    ]
)

# Use transcription_cascade just like TextTranscriptionSynapse for enhanced reliability

Summary

IntelliBricks Synapses provide a robust and flexible way to interact with a wide range of LLMs and transcription services. They simplify model configuration, handle API communication, and offer advanced features like structured outputs and fault tolerance. By leveraging Synapses, you can easily integrate the power of LLMs into your intelligent applications without getting bogged down in low-level complexities.

API Reference