Process images and documents with state-of-the-art AI models including GPT-4 Vision, Claude 3, and Google Cloud Vision.
Get a list of all available AI models from OpenRouter.
model_type (optional) - Filter models by type:
vision - Only vision-capable modelschat - Only chat modelsall - All models (default)curl "http://localhost:8000/api/available_models?model_type=vision"
{
"models": [
{
"id": "anthropic/claude-3-haiku-20240307",
"name": "Claude 3 Haiku",
"description": "Fast and affordable version of Claude 3",
"context_length": 200000,
"pricing": {
"prompt": 0.00025,
"completion": 0.00125
},
"capabilities": {
"vision": true,
"chat": true
}
},
// ... more models
],
"count": 5,
"type": "vision"
}
Process an image using any available vision model from OpenRouter.
file - Image file to process (multipart/form-data)model (optional) - Model ID from /api/available_models (defaults to Claude 3 Haiku)curl -X POST "http://localhost:8000/api/process_image_openrouter?model=anthropic/claude-3-haiku-20240307" \
-H "Content-Type: multipart/form-data" \
-F "file=@image.jpg"
Process a PDF file using any available chat model from OpenRouter.
file - PDF file to process (multipart/form-data)model (optional) - Model ID from /api/available_models (defaults to Gemini Pro)curl -X POST "http://localhost:8000/api/process_pdf_openrouter?model=anthropic/claude-3-haiku-20240307" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"
Process PDF using Google Cloud Vision OCR and Vertex AI
curl -X POST "http://localhost:8000/api/process_pdf" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"
Process a PDF file using OpenRouter's models for text analysis and summarization.
file - PDF file to process (multipart/form-data)model (optional) - Model to use for processing text:
google/gemini-pro (default)anthropic/claude-3-haiku-20240307anthropic/claude-3-sonnet-20240229openai/gpt-4-vision-previewcurl -X POST "http://localhost:8000/api/process_pdf_openrouter?model=anthropic/claude-3-haiku-20240307" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"
Process a single image using OpenRouter's vision models
curl -X POST "http://localhost:8000/api/process_image_openrouter?model=anthropic/claude-3-haiku-20240307" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@image.png"
{
"text_content": "Extracted text from the image",
"key_details": "Important information found",
"document_type": "Type of document detected",
"model_used": "anthropic/claude-3-haiku-20240307"
}
Process multiple images using OpenRouter's vision models
curl -X POST "http://localhost:8000/api/process_images_openrouter?model=anthropic/claude-3-haiku-20240307" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "files=@image1.png" \
-F "files=@image2.png"
{
"results": [
{
"filename": "image1.png",
"analysis": {
"text_content": "Extracted text from the image",
"key_details": "Important information found",
"document_type": "Type of document detected"
},
"model_used": "anthropic/claude-3-haiku-20240307"
}
]
}
List all available vision models
curl "http://localhost:8000/api/available_vision_models"
{
"models": [
{
"id": "anthropic/claude-3-haiku-20240307",
"name": "CLAUDE3_HAIKU",
"description": "Vision model for document analysis and text extraction"
},
...
]
}
Add a document to the RAG (Retrieval Augmented Generation) system for future querying.
{
"content": "The document content in markdown format",
"metadata": {
"source": "example.md",
"author": "John Doe",
"date": "2024-03-09"
}
}
{
"status": "success",
"message": "Document added successfully"
}
Query the RAG system with a question. The system will retrieve relevant documents and generate a response based on the context.
{
"query": "What are the key features of the product?",
"num_sources": 4,
"conversation_id": "optional-conversation-id"
}
{
"answer": "Based on the documentation, the key features include...",
"sources": [
{
"content": "Document snippet that supports the answer",
"metadata": {
"source": "features.md",
"author": "Jane Smith"
}
}
],
"conversation_id": "conversation-123"
}
Retrieve the conversation history for a specific conversation ID, including all messages and their sources.
conversation_id - The unique identifier of the conversation{
"conversation_id": "conversation-123",
"messages": [
{
"role": "user",
"content": "What are the key features?",
"created_at": "2024-03-09T10:00:00Z",
"sources": null
},
{
"role": "assistant",
"content": "Based on the documentation...",
"created_at": "2024-03-09T10:00:01Z",
"sources": [
{
"content": "Supporting document content",
"metadata": {
"source": "features.md"
}
}
]
}
]
}
Add a markdown file to the RAG (Retrieval Augmented Generation) system. The file will be split into chunks for better retrieval.
file (required) - Markdown file to uploaddocument_id (optional) - Unique identifier for the document. If provided and a document with this ID exists, it will be replacedmetadata (optional) - JSON string containing additional metadata:
author - Document authortags - Array of tagsadditional_metadata - Any additional metadatachunk_size (optional) - Size of text chunks to split the document into (default: 2000)chunk_overlap (optional) - Number of characters to overlap between chunks (default: 200)curl -X POST "http://localhost:8000/rag/add/file" \
-F "file=@document.md" \
-F "document_id=doc123" \
-F 'metadata={"author": "John Doe", "tags": ["report", "2024"]}'
{
"status": "success",
"message": "File document.md processed and added successfully",
"chunks_created": 4,
"metadata": {
"source_type": "file",
"source_name": "document.md",
"document_id": "doc123",
"author": "John Doe",
"date": "2024-12-09T21:00:00.000Z",
"tags": ["report", "2024"]
},
"document_id": "doc123"
}
Query the RAG system with natural language. The system will find relevant document chunks and generate a response based on them.
{
"query": "What is the document about?",
"conversation_id": "string", // Optional: for maintaining conversation context
"num_sources": 4 // Optional: number of source documents to retrieve
}
curl -X POST "http://localhost:8000/rag/query" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the document about?",
"conversation_id": "conv123"
}'
{
"answer": "The document is about...",
"sources": [
{
"content": "...",
"metadata": {
"source_type": "file",
"source_name": "document.md",
"document_id": "doc123",
"chunk_index": 0,
"total_chunks": 4
}
}
],
"conversation_id": "conv123"
}
Find documents that are semantically similar to a given document.
{
"document_id": "doc123",
"similarity_threshold": 0.7, // Optional: minimum similarity score (0-1)
"max_results": 5 // Optional: maximum number of results to return
}
curl -X POST "http://localhost:8000/rag/find_related" \
-H "Content-Type: application/json" \
-d '{
"document_id": "doc123",
"similarity_threshold": 0.7,
"max_results": 5
}'
[
{
"document_id": "doc456",
"similarity": 0.85,
"metadata": {
"source_type": "file",
"source_name": "related_document.md",
"author": "Jane Smith",
"date": "2024-12-08T15:30:00.000Z"
}
}
]
Process a markdown file or direct content according to the provided instruction using AI models.
/api/process-markdown-file-or-content
{
"file_path": "/path/to/file.md", // Optional: provide either file_path or content
"content": "# Markdown Content\nSome text here", // Optional: provide either file_path or content
"instruction": "Summarize the content and extract key points",
"model": "mistralai/mixtral-8x7b-instruct" // Optional: defaults to Mistral AI
}
{
"model": "mistralai/mixtral-8x7b-instruct",
"summary": "AI-generated summary based on instruction",
"details": {
"key1": "value1",
"key2": "value2",
// Additional structured information extracted from the content
}
}
Using file path:
curl -X POST "http://localhost:8000/api/process-markdown-file-or-content" \
-H "Content-Type: application/json" \
-d '{
"file_path": "/path/to/file.md",
"instruction": "Summarize the content and extract key points"
}'
Using direct content:
curl -X POST "http://localhost:8000/api/process-markdown-file-or-content" \
-H "Content-Type: application/json" \
-d '{
"content": "# My Document\nThis is some markdown content.",
"instruction": "Summarize the content and extract key points"
}'
Update Google Cloud credentials for Vision API and Vertex AI services.
Requires admin API key in the X-API-Key header.
{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "private-key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "service-account@project-id.iam.gserviceaccount.com",
"client_id": "client-id",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account%40project-id.iam.gserviceaccount.com"
}
curl -X POST "http://localhost:8000/admin/credentials/google" \
-H "Content-Type: application/json" \
-d @google-credentials.json
{
"success": true,
"message": "Google credentials updated successfully",
"timestamp": 1715068800.123456
}
Check if Google Cloud credentials are properly configured.
Requires admin API key in the X-API-Key header.
curl -X GET "http://localhost:8000/admin/credentials/google/status"
{
"success": true,
"message": "Google credentials are configured for project 'your-project-id'",
"timestamp": 1715068800.123456
}
anthropic/claude-3-haiku-20240307
Fast and efficient vision model, best for quick analysis
anthropic/claude-3-sonnet-20240229
More powerful vision model, better for detailed analysis
openai/gpt-4-vision-preview
High-quality vision model with strong reasoning capabilities
google/gemini-pro-vision
Google's vision model, good balance of speed and quality