Google Gen AI
Use Google Gen AI SDK with Respan
Set up Respan
- Sign up — Create an account at platform.respan.ai
- Create an API key — Generate one on the API keys page
- Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page
Use AI
Add the Docs MCP to your AI coding tool to get help building with Respan. No API key needed.
What is Google Gen AI SDK?
Respan is compatible with the official Google Gen AI SDK, enabling you to use Google’s Gemini models through our gateway with full observability, monitoring, and advanced features.
This integration is for the Respan gateway.
Check
Example projects:
Quickstart
Python
TypeScript
Step 2: Initialize the client
Initialize the client with your Respan API key and set the base URL to Respan’s endpoint.
The base_url can be either https://api.respan.ai/api/google/gemini or https://endpoint.respan.ai/api/google/gemini.
Configuration Parameters
The GenerateContentConfig supports a wide range of parameters to control model behavior:
System Instructions
system_instruction: Sets the role and behavior guidelines for the model. This helps maintain consistent personality and response style throughout the conversation.
Sampling Parameters
temperature(0.0-1.0): Controls randomness in responses. Lower values (0.0-0.3) make output more focused and deterministic, while higher values (0.7-1.0) increase creativity and variation.top_p(0.0-1.0): Nucleus sampling parameter. The model considers tokens with cumulative probability up to this value. Lower values make responses more focused.top_k: Limits the number of highest probability tokens considered at each step. Helps balance between creativity and coherence.
Output Controls
max_output_tokens: Maximum number of tokens in the generated response. Helps control response length and costs.stop_sequences: Array of strings that will stop generation when encountered. Useful for controlling output format.
Tools and Grounding
tools: Array of tools the model can use, such as Google Search for grounding responses in real-time information.google_search: Enables the model to search the web for up-to-date information before generating responses.
Thinking Configuration
thinking_config: Controls the model’s internal reasoning process for models that support thinking mode.thinking_budget: Amount of tokens allocated for internal reasoning. Set to 0 to disable thinking mode.
Structured Output
response_mime_type: Specify the output format (e.g., “application/json” for JSON responses).response_schema: Define the exact structure of JSON output using a schema. Ensures responses follow a specific format.
Safety Settings
safety_settings: Array of safety configurations to filter harmful content across different categories:HARM_CATEGORY_HATE_SPEECH: Hate speech and discriminatory contentHARM_CATEGORY_DANGEROUS_CONTENT: Dangerous or harmful instructionsHARM_CATEGORY_HARASSMENT: Harassment and bullyingHARM_CATEGORY_SEXUALLY_EXPLICIT: Sexually explicit content
Threshold options:
BLOCK_NONE: Don’t block any contentBLOCK_ONLY_HIGH: Block only high-severity contentBLOCK_MEDIUM_AND_ABOVE: Block medium and high-severity contentBLOCK_LOW_AND_ABOVE: Block low, medium, and high-severity content
Diversity Controls
presence_penalty(-2.0 to 2.0): Penalizes tokens based on whether they appear in the text. Positive values encourage the model to talk about new topics.frequency_penalty(-2.0 to 2.0): Penalizes tokens based on their frequency in the text. Positive values reduce repetition.
Reproducibility
seed: Integer value for deterministic output. Using the same seed with identical inputs will produce similar outputs (not guaranteed to be exactly identical due to model updates).
Token Analysis
response_logprobs: When enabled, returns log probabilities for generated tokens. Useful for analyzing model confidence.logprobs: Number of top candidate tokens to return log probabilities for at each position.