Google Gen AI

Set up Respan

Sign up — Create an account at platform.respan.ai
Create an API key — Generate one on the API keys page
Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

What is Google Gen AI SDK?

Respan is compatible with the official Google Gen AI SDK, enabling you to use Google’s Gemini models through our gateway with full observability, monitoring, and advanced features.

This integration is for the Respan gateway.

Resources

Steps to use

Python
TypeScript

Step 1: Install the SDK

Install the official Google Gen AI SDK for Python.

pip install google-GenAI

Step 2: Initialize the client

Initialize the client with your Respan API key and set the base URL to Respan’s endpoint.

from google import GenAI
import os

client = GenAI.Client(
    api_key=os.environ.get("RESPAN_API_KEY"),
    http_options={
        "base_url": "https://api.respan.ai/api/google/gemini",
    }
)

The base_url can be either https://api.respan.ai/api/google/gemini or https://endpoint.respan.ai/api/google/gemini.

Step 3: Make your first request

Now you can use the client to make requests to Google’s models.

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Hello, world!",
)

print(response.text)

Step 4: Switch models

To switch between different Google models, simply change the model parameter.

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents="Tell me a joke.",
)

Step 5: Configure parameters

Use GenerateContentConfig to control model behavior with various parameters.

from google.GenAI import types

config = types.GenerateContentConfig(
    temperature=0.9,
    top_k=1,
    top_p=1,
    max_output_tokens=2048,
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is the capital of France?",
    config=config,
)

Step 6: Advanced configuration

Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.

from google import GenAI
from google.GenAI import types
import os

client = GenAI.Client(
    api_key=os.environ.get("RESPAN_API_KEY"),
    http_options={
        "base_url": "https://api.respan.ai/api/google/gemini",
    }
)

# Example: Configure tools for grounding
grounding_tool = types.Tool(
    google_search=types.GoogleSearch()
)

# Example: Comprehensive GenerateContentConfig showcasing various parameters
config = types.GenerateContentConfig(
    # System instruction to guide the model's behavior
    system_instruction="You are a helpful assistant that provides accurate, concise information about sports events.",
    
    # Sampling parameters
    temperature=0.7,  # Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative
    top_p=0.95,  # Nucleus sampling. Tokens with cumulative probability up to this value are considered
    top_k=40,  # Top-k sampling. Considers this many top tokens at each step
    
    # Output controls
    max_output_tokens=1024,  # Maximum number of tokens in the response
    stop_sequences=["\n\n\n"],  # Sequences that will stop generation
    
    # Tools and function calling
    tools=[grounding_tool],  # Enable Google Search grounding
    
    # Thinking configuration (for models that support it)
    thinking_config=types.ThinkingConfig(thinking_budget=0),  # Disables thinking mode
    
    # Response format options
    # response_mime_type="application/json",  # Uncomment for JSON output
    # response_schema=types.Schema(  # Uncomment to enforce structured output
    #     type=types.Type.OBJECT,
    #     properties={
    #         "winner": types.Schema(type=types.Type.STRING),
    #         "year": types.Schema(type=types.Type.INTEGER)
    #     }
    # ),
    
    # Safety settings
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ],
    
    # Diversity controls
    presence_penalty=0.0,  # Penalize tokens based on presence in text (-2.0 to 2.0)
    frequency_penalty=0.0,  # Penalize tokens based on frequency in text (-2.0 to 2.0)
    
    # Reproducibility
    # seed=42,  # Uncomment to make responses more deterministic
    
    # Logprobs (for token analysis)
    # response_logprobs=True,  # Uncomment to get log probabilities
    # logprobs=5,  # Number of top candidate tokens to return logprobs for
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Who won the euro 2024?",
    config=config,
)

print(response.text)

Step 1: Install the SDK

Install the official Google Gen AI SDK for TypeScript.

npm install @google/GenAI

Step 2: Initialize the client

Initialize the client with your Respan API key and set the base URL to Respan’s endpoint.

import { GoogleGenAI } from "@google/GenAI";

const GenAI = new GoogleGenAI({
  apiKey: process.env.RESPAN_API_KEY,
  httpOptions: {
    baseUrl: "https://api.respan.ai/api/google/gemini",
  },
});

The baseUrl can be either https://api.respan.ai/api/google/gemini or https://endpoint.respan.ai/api/google/gemini.

Step 3: Make your first request

Now you can use the client to make requests to Google’s models.

async function run() {
  const result = await GenAI.models.generateContent({
    model: "gemini-2.5-flash",
    contents: [{ role: "user", parts: [{ text: "Hello, world!" }] }]
  });

  console.log(result.text);
}

run();

Step 4: Switch models

To switch between different Google models, simply change the model parameter.

const result = await GenAI.models.generateContent({
  model: "gemini-2.0-flash-exp",
  contents: [{ role: "user", parts: [{ text: "Tell me a joke." }] }]
});

Step 5: Configure parameters

Use the config parameter to control model behavior with various parameters.

const result = await GenAI.models.generateContent({
  model: "gemini-2.5-flash",
  contents: [{ role: "user", parts: [{ text: "What is the capital of France?" }] }],
  config: {
    temperature: 0.9,
    topK: 1,
    topP: 1,
    maxOutputTokens: 2048,
  },
});

Step 6: Advanced configuration

Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.

import { GoogleGenAI } from "@google/GenAI";

const GenAI = new GoogleGenAI({
  apiKey: process.env.RESPAN_API_KEY,
  httpOptions: {
    baseUrl: "https://api.respan.ai/api/google/gemini",
  },
});

// Example: Comprehensive GenerateContentConfig showcasing various parameters
async function run() {
  const response = await GenAI.models.generateContent({
    model: "gemini-2.5-flash",
    contents: [
      { 
        role: "user", 
        parts: [{ text: "Who won the euro 2024?" }] 
      }
    ],
    config: {
      // System instruction to guide the model's behavior
      systemInstruction: "You are a helpful assistant that provides accurate, concise information about sports events.",
      
      // Sampling parameters
      temperature: 0.7,  // Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative
      topP: 0.95,  // Nucleus sampling. Tokens with cumulative probability up to this value are considered
      topK: 40,  // Top-k sampling. Considers this many top tokens at each step
      
      // Output controls
      maxOutputTokens: 1024,  // Maximum number of tokens in the response
      stopSequences: ["\n\n\n"],  // Sequences that will stop generation
      
      // Tools and function calling
      tools: [
        {
          googleSearch: {}  // Enable Google Search grounding
        }
      ],
      
      // Thinking configuration (for models that support it)
      thinkingConfig: {
        thinkingBudget: 0  // Disables thinking mode
      },
      
      // Response format options
      // responseMimeType: "application/json",  // Uncomment for JSON output
      // responseSchema: {  // Uncomment to enforce structured output
      //   type: "OBJECT",
      //   properties: {
      //     winner: { type: "STRING" },
      //     year: { type: "INTEGER" }
      //   }
      // },
      
      // Safety settings
      safetySettings: [
        {
          category: "HARM_CATEGORY_HATE_SPEECH",
          threshold: "BLOCK_MEDIUM_AND_ABOVE"
        },
        {
          category: "HARM_CATEGORY_DANGEROUS_CONTENT",
          threshold: "BLOCK_MEDIUM_AND_ABOVE"
        }
      ],
      
      // Diversity controls
      presencePenalty: 0.0,  // Penalize tokens based on presence in text (-2.0 to 2.0)
      frequencyPenalty: 0.0,  // Penalize tokens based on frequency in text (-2.0 to 2.0)
      
      // Reproducibility
      // seed: 42,  // Uncomment to make responses more deterministic
      
      // Logprobs (for token analysis)
      // responseLogprobs: true,  // Uncomment to get log probabilities
      // logprobs: 5,  // Number of top candidate tokens to return logprobs for
    },
  });

  console.log(response.text);
}

run();

Configuration Parameters

The GenerateContentConfig supports a wide range of parameters to control model behavior:

System Instructions

system_instruction: Sets the role and behavior guidelines for the model. This helps maintain consistent personality and response style throughout the conversation.

Sampling Parameters

temperature (0.0-1.0): Controls randomness in responses. Lower values (0.0-0.3) make output more focused and deterministic, while higher values (0.7-1.0) increase creativity and variation.
top_p (0.0-1.0): Nucleus sampling parameter. The model considers tokens with cumulative probability up to this value. Lower values make responses more focused.
top_k: Limits the number of highest probability tokens considered at each step. Helps balance between creativity and coherence.

Output Controls

max_output_tokens: Maximum number of tokens in the generated response. Helps control response length and costs.
stop_sequences: Array of strings that will stop generation when encountered. Useful for controlling output format.

Tools and Grounding

tools: Array of tools the model can use, such as Google Search for grounding responses in real-time information.
google_search: Enables the model to search the web for up-to-date information before generating responses.

Thinking Configuration

thinking_config: Controls the model’s internal reasoning process for models that support thinking mode.
thinking_budget: Amount of tokens allocated for internal reasoning. Set to 0 to disable thinking mode.

Structured Output

response_mime_type: Specify the output format (e.g., “application/json” for JSON responses).
response_schema: Define the exact structure of JSON output using a schema. Ensures responses follow a specific format.

Safety Settings

safety_settings: Array of safety configurations to filter harmful content across different categories:
- HARM_CATEGORY_HATE_SPEECH: Hate speech and discriminatory content
- HARM_CATEGORY_DANGEROUS_CONTENT: Dangerous or harmful instructions
- HARM_CATEGORY_HARASSMENT: Harassment and bullying
- HARM_CATEGORY_SEXUALLY_EXPLICIT: Sexually explicit content

Threshold options:

BLOCK_NONE: Don’t block any content
BLOCK_ONLY_HIGH: Block only high-severity content
BLOCK_MEDIUM_AND_ABOVE: Block medium and high-severity content
BLOCK_LOW_AND_ABOVE: Block low, medium, and high-severity content

Diversity Controls

presence_penalty (-2.0 to 2.0): Penalizes tokens based on whether they appear in the text. Positive values encourage the model to talk about new topics.
frequency_penalty (-2.0 to 2.0): Penalizes tokens based on their frequency in the text. Positive values reduce repetition.

Reproducibility

seed: Integer value for deterministic output. Using the same seed with identical inputs will produce similar outputs (not guaranteed to be exactly identical due to model updates).

Token Analysis

response_logprobs: When enabled, returns log probabilities for generated tokens. Useful for analyzing model confidence.
logprobs: Number of top candidate tokens to return log probabilities for at each position.

Ecosystem

Respan native

Agent frameworks

LLM SDKs

Memory

Structured output

Analytics

Coding agents

Search

Voice

Automation

Migrate

Providers

Google Gen AI

What is Google Gen AI SDK?

Resources

Steps to use

Configuration Parameters

System Instructions

Sampling Parameters

Output Controls

Tools and Grounding

Thinking Configuration

Structured Output

Safety Settings

Diversity Controls

Reproducibility

Token Analysis

Ecosystem

Respan native

Agent frameworks

LLM SDKs

Memory

Structured output

Analytics

Coding agents

Search

Voice

Automation

Migrate

Providers

​What is Google Gen AI SDK?

​Resources

​Steps to use

​Configuration Parameters

​System Instructions

​Sampling Parameters

​Output Controls

​Tools and Grounding

​Thinking Configuration

​Structured Output

​Safety Settings

​Diversity Controls

​Reproducibility

​Token Analysis

What is Google Gen AI SDK?

Resources

Steps to use

Configuration Parameters

System Instructions

Sampling Parameters

Output Controls

Tools and Grounding

Thinking Configuration

Structured Output

Safety Settings

Diversity Controls

Reproducibility

Token Analysis