Respan is compatible with the official Google Gen AI SDK, enabling you to use Google’s Gemini models through our gateway with full observability, monitoring, and advanced features.
To switch between different Google models, simply change the model parameter.
Copy
response = client.models.generate_content( model="gemini-2.0-flash-exp", contents="Tell me a joke.",)
5
Step 5: Configure parameters
Use GenerateContentConfig to control model behavior with various parameters.
Copy
from google.GenAI import typesconfig = types.GenerateContentConfig( temperature=0.9, top_k=1, top_p=1, max_output_tokens=2048,)response = client.models.generate_content( model="gemini-2.5-flash", contents="What is the capital of France?", config=config,)
6
Step 6: Advanced configuration
Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.
Copy
from google import GenAIfrom google.GenAI import typesimport osclient = GenAI.Client( api_key=os.environ.get("RESPAN_API_KEY"), http_options={ "base_url": "https://api.respan.ai/api/google/gemini", })# Example: Configure tools for groundinggrounding_tool = types.Tool( google_search=types.GoogleSearch())# Example: Comprehensive GenerateContentConfig showcasing various parametersconfig = types.GenerateContentConfig( # System instruction to guide the model's behavior system_instruction="You are a helpful assistant that provides accurate, concise information about sports events.", # Sampling parameters temperature=0.7, # Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative top_p=0.95, # Nucleus sampling. Tokens with cumulative probability up to this value are considered top_k=40, # Top-k sampling. Considers this many top tokens at each step # Output controls max_output_tokens=1024, # Maximum number of tokens in the response stop_sequences=["\n\n\n"], # Sequences that will stop generation # Tools and function calling tools=[grounding_tool], # Enable Google Search grounding # Thinking configuration (for models that support it) thinking_config=types.ThinkingConfig(thinking_budget=0), # Disables thinking mode # Response format options # response_mime_type="application/json", # Uncomment for JSON output # response_schema=types.Schema( # Uncomment to enforce structured output # type=types.Type.OBJECT, # properties={ # "winner": types.Schema(type=types.Type.STRING), # "year": types.Schema(type=types.Type.INTEGER) # } # ), # Safety settings safety_settings=[ types.SafetySetting( category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH, threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE ), types.SafetySetting( category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE ) ], # Diversity controls presence_penalty=0.0, # Penalize tokens based on presence in text (-2.0 to 2.0) frequency_penalty=0.0, # Penalize tokens based on frequency in text (-2.0 to 2.0) # Reproducibility # seed=42, # Uncomment to make responses more deterministic # Logprobs (for token analysis) # response_logprobs=True, # Uncomment to get log probabilities # logprobs=5, # Number of top candidate tokens to return logprobs for)response = client.models.generate_content( model="gemini-2.5-flash", contents="Who won the euro 2024?", config=config,)print(response.text)
1
Step 1: Install the SDK
Install the official Google Gen AI SDK for TypeScript.
Copy
npm install @google/GenAI
2
Step 2: Initialize the client
Initialize the client with your Respan API key and set the base URL to Respan’s endpoint.
Copy
import { GoogleGenAI } from "@google/GenAI";const GenAI = new GoogleGenAI({ apiKey: process.env.RESPAN_API_KEY, httpOptions: { baseUrl: "https://api.respan.ai/api/google/gemini", },});
The baseUrl can be either https://api.respan.ai/api/google/gemini or https://endpoint.respan.ai/api/google/gemini.
3
Step 3: Make your first request
Now you can use the client to make requests to Google’s models.
To switch between different Google models, simply change the model parameter.
Copy
const result = await GenAI.models.generateContent({ model: "gemini-2.0-flash-exp", contents: [{ role: "user", parts: [{ text: "Tell me a joke." }] }]});
5
Step 5: Configure parameters
Use the config parameter to control model behavior with various parameters.
Copy
const result = await GenAI.models.generateContent({ model: "gemini-2.5-flash", contents: [{ role: "user", parts: [{ text: "What is the capital of France?" }] }], config: { temperature: 0.9, topK: 1, topP: 1, maxOutputTokens: 2048, },});
6
Step 6: Advanced configuration
Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.
Copy
import { GoogleGenAI } from "@google/GenAI";const GenAI = new GoogleGenAI({ apiKey: process.env.RESPAN_API_KEY, httpOptions: { baseUrl: "https://api.respan.ai/api/google/gemini", },});// Example: Comprehensive GenerateContentConfig showcasing various parametersasync function run() { const response = await GenAI.models.generateContent({ model: "gemini-2.5-flash", contents: [ { role: "user", parts: [{ text: "Who won the euro 2024?" }] } ], config: { // System instruction to guide the model's behavior systemInstruction: "You are a helpful assistant that provides accurate, concise information about sports events.", // Sampling parameters temperature: 0.7, // Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative topP: 0.95, // Nucleus sampling. Tokens with cumulative probability up to this value are considered topK: 40, // Top-k sampling. Considers this many top tokens at each step // Output controls maxOutputTokens: 1024, // Maximum number of tokens in the response stopSequences: ["\n\n\n"], // Sequences that will stop generation // Tools and function calling tools: [ { googleSearch: {} // Enable Google Search grounding } ], // Thinking configuration (for models that support it) thinkingConfig: { thinkingBudget: 0 // Disables thinking mode }, // Response format options // responseMimeType: "application/json", // Uncomment for JSON output // responseSchema: { // Uncomment to enforce structured output // type: "OBJECT", // properties: { // winner: { type: "STRING" }, // year: { type: "INTEGER" } // } // }, // Safety settings safetySettings: [ { category: "HARM_CATEGORY_HATE_SPEECH", threshold: "BLOCK_MEDIUM_AND_ABOVE" }, { category: "HARM_CATEGORY_DANGEROUS_CONTENT", threshold: "BLOCK_MEDIUM_AND_ABOVE" } ], // Diversity controls presencePenalty: 0.0, // Penalize tokens based on presence in text (-2.0 to 2.0) frequencyPenalty: 0.0, // Penalize tokens based on frequency in text (-2.0 to 2.0) // Reproducibility // seed: 42, // Uncomment to make responses more deterministic // Logprobs (for token analysis) // responseLogprobs: true, // Uncomment to get log probabilities // logprobs: 5, // Number of top candidate tokens to return logprobs for }, }); console.log(response.text);}run();
system_instruction: Sets the role and behavior guidelines for the model. This helps maintain consistent personality and response style throughout the conversation.
temperature (0.0-1.0): Controls randomness in responses. Lower values (0.0-0.3) make output more focused and deterministic, while higher values (0.7-1.0) increase creativity and variation.
top_p (0.0-1.0): Nucleus sampling parameter. The model considers tokens with cumulative probability up to this value. Lower values make responses more focused.
top_k: Limits the number of highest probability tokens considered at each step. Helps balance between creativity and coherence.
presence_penalty (-2.0 to 2.0): Penalizes tokens based on whether they appear in the text. Positive values encourage the model to talk about new topics.
frequency_penalty (-2.0 to 2.0): Penalizes tokens based on their frequency in the text. Positive values reduce repetition.
seed: Integer value for deterministic output. Using the same seed with identical inputs will produce similar outputs (not guaranteed to be exactly identical due to model updates).