Advanced Vision & Document Processing API

Process images and documents with state-of-the-art AI models including GPT-4 Vision, Claude 3, and Google Cloud Vision.

API Endpoints

GET /api/available_models

Get a list of all available AI models from OpenRouter.

Query Parameters

  • model_type (optional) - Filter models by type:
    • vision - Only vision-capable models
    • chat - Only chat models
    • all - All models (default)

Example Request

curl "http://localhost:8000/api/available_models?model_type=vision"

Example Response

{
    "models": [
        {
            "id": "anthropic/claude-3-haiku-20240307",
            "name": "Claude 3 Haiku",
            "description": "Fast and affordable version of Claude 3",
            "context_length": 200000,
            "pricing": {
                "prompt": 0.00025,
                "completion": 0.00125
            },
            "capabilities": {
                "vision": true,
                "chat": true
            }
        },
        // ... more models
    ],
    "count": 5,
    "type": "vision"
}

POST /api/process_image_openrouter

Process an image using any available vision model from OpenRouter.

Parameters

  • file - Image file to process (multipart/form-data)
  • model (optional) - Model ID from /api/available_models (defaults to Claude 3 Haiku)

Example Request

curl -X POST "http://localhost:8000/api/process_image_openrouter?model=anthropic/claude-3-haiku-20240307" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@image.jpg"

POST /api/process_pdf_openrouter

Process a PDF file using any available chat model from OpenRouter.

Parameters

  • file - PDF file to process (multipart/form-data)
  • model (optional) - Model ID from /api/available_models (defaults to Gemini Pro)

Example Request

curl -X POST "http://localhost:8000/api/process_pdf_openrouter?model=anthropic/claude-3-haiku-20240307" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@document.pdf"

POST /api/process_pdf

Process PDF using Google Cloud Vision OCR and Vertex AI

Request

curl -X POST "http://localhost:8000/api/process_pdf" \
    -H "accept: application/json" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@document.pdf"

POST /api/process_pdf_openrouter

Process a PDF file using OpenRouter's models for text analysis and summarization.

Parameters

  • file - PDF file to process (multipart/form-data)
  • model (optional) - Model to use for processing text:
    • google/gemini-pro (default)
    • anthropic/claude-3-haiku-20240307
    • anthropic/claude-3-sonnet-20240229
    • openai/gpt-4-vision-preview

Example Request

curl -X POST "http://localhost:8000/api/process_pdf_openrouter?model=anthropic/claude-3-haiku-20240307" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@document.pdf"

POST /api/process_image_openrouter

Process a single image using OpenRouter's vision models

Request

curl -X POST "http://localhost:8000/api/process_image_openrouter?model=anthropic/claude-3-haiku-20240307" \
    -H "accept: application/json" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@image.png"

Response

{
    "text_content": "Extracted text from the image",
    "key_details": "Important information found",
    "document_type": "Type of document detected",
    "model_used": "anthropic/claude-3-haiku-20240307"
}

POST /api/process_images_openrouter

Process multiple images using OpenRouter's vision models

Request

curl -X POST "http://localhost:8000/api/process_images_openrouter?model=anthropic/claude-3-haiku-20240307" \
    -H "accept: application/json" \
    -H "Content-Type: multipart/form-data" \
    -F "files=@image1.png" \
    -F "files=@image2.png"

Response

{
    "results": [
        {
            "filename": "image1.png",
            "analysis": {
                "text_content": "Extracted text from the image",
                "key_details": "Important information found",
                "document_type": "Type of document detected"
            },
            "model_used": "anthropic/claude-3-haiku-20240307"
        }
    ]
}

GET /api/available_vision_models

List all available vision models

Request

curl "http://localhost:8000/api/available_vision_models"

Response

{
    "models": [
        {
            "id": "anthropic/claude-3-haiku-20240307",
            "name": "CLAUDE3_HAIKU",
            "description": "Vision model for document analysis and text extraction"
        },
        ...
    ]
}

POST /rag/add

Add a document to the RAG (Retrieval Augmented Generation) system for future querying.

Request Body

{
    "content": "The document content in markdown format",
    "metadata": {
        "source": "example.md",
        "author": "John Doe",
        "date": "2024-03-09"
    }
}

Response

{
    "status": "success",
    "message": "Document added successfully"
}

POST /rag/query

Query the RAG system with a question. The system will retrieve relevant documents and generate a response based on the context.

Request Body

{
    "query": "What are the key features of the product?",
    "num_sources": 4,
    "conversation_id": "optional-conversation-id"
}

Response

{
    "answer": "Based on the documentation, the key features include...",
    "sources": [
        {
            "content": "Document snippet that supports the answer",
            "metadata": {
                "source": "features.md",
                "author": "Jane Smith"
            }
        }
    ],
    "conversation_id": "conversation-123"
}

GET /rag/conversations/{conversation_id}/history

Retrieve the conversation history for a specific conversation ID, including all messages and their sources.

Path Parameters

  • conversation_id - The unique identifier of the conversation

Response

{
    "conversation_id": "conversation-123",
    "messages": [
        {
            "role": "user",
            "content": "What are the key features?",
            "created_at": "2024-03-09T10:00:00Z",
            "sources": null
        },
        {
            "role": "assistant",
            "content": "Based on the documentation...",
            "created_at": "2024-03-09T10:00:01Z",
            "sources": [
                {
                    "content": "Supporting document content",
                    "metadata": {
                        "source": "features.md"
                    }
                }
            ]
        }
    ]
}

POST /rag/add/file

Add a markdown file to the RAG (Retrieval Augmented Generation) system. The file will be split into chunks for better retrieval.

Form Parameters

  • file (required) - Markdown file to upload
  • document_id (optional) - Unique identifier for the document. If provided and a document with this ID exists, it will be replaced
  • metadata (optional) - JSON string containing additional metadata:
    • author - Document author
    • tags - Array of tags
    • additional_metadata - Any additional metadata
  • chunk_size (optional) - Size of text chunks to split the document into (default: 2000)
  • chunk_overlap (optional) - Number of characters to overlap between chunks (default: 200)

Example Request

curl -X POST "http://localhost:8000/rag/add/file" \
    -F "file=@document.md" \
    -F "document_id=doc123" \
    -F 'metadata={"author": "John Doe", "tags": ["report", "2024"]}'

Example Response

{
    "status": "success",
    "message": "File document.md processed and added successfully",
    "chunks_created": 4,
    "metadata": {
        "source_type": "file",
        "source_name": "document.md",
        "document_id": "doc123",
        "author": "John Doe",
        "date": "2024-12-09T21:00:00.000Z",
        "tags": ["report", "2024"]
    },
    "document_id": "doc123"
}

POST /rag/query

Query the RAG system with natural language. The system will find relevant document chunks and generate a response based on them.

Request Body

{
    "query": "What is the document about?",
    "conversation_id": "string",  // Optional: for maintaining conversation context
    "num_sources": 4  // Optional: number of source documents to retrieve
}

Example Request

curl -X POST "http://localhost:8000/rag/query" \
    -H "Content-Type: application/json" \
    -d '{
        "query": "What is the document about?",
        "conversation_id": "conv123"
    }'

Example Response

{
    "answer": "The document is about...",
    "sources": [
        {
            "content": "...",
            "metadata": {
                "source_type": "file",
                "source_name": "document.md",
                "document_id": "doc123",
                "chunk_index": 0,
                "total_chunks": 4
            }
        }
    ],
    "conversation_id": "conv123"
}

POST /rag/find_related

Find documents that are semantically similar to a given document.

Request Body

{
    "document_id": "doc123",
    "similarity_threshold": 0.7,  // Optional: minimum similarity score (0-1)
    "max_results": 5  // Optional: maximum number of results to return
}

Example Request

curl -X POST "http://localhost:8000/rag/find_related" \
    -H "Content-Type: application/json" \
    -d '{
        "document_id": "doc123",
        "similarity_threshold": 0.7,
        "max_results": 5
    }'

Example Response

[
    {
        "document_id": "doc456",
        "similarity": 0.85,
        "metadata": {
            "source_type": "file",
            "source_name": "related_document.md",
            "author": "Jane Smith",
            "date": "2024-12-08T15:30:00.000Z"
        }
    }
]

POST Process Markdown File or Content

Process a markdown file or direct content according to the provided instruction using AI models.

Endpoint

/api/process-markdown-file-or-content

Request Body

{
    "file_path": "/path/to/file.md",  // Optional: provide either file_path or content
    "content": "# Markdown Content\nSome text here",  // Optional: provide either file_path or content
    "instruction": "Summarize the content and extract key points",
    "model": "mistralai/mixtral-8x7b-instruct"  // Optional: defaults to Mistral AI
}

Response

{
    "model": "mistralai/mixtral-8x7b-instruct",
    "summary": "AI-generated summary based on instruction",
    "details": {
        "key1": "value1",
        "key2": "value2",
        // Additional structured information extracted from the content
    }
}

Example Requests

Using file path:

curl -X POST "http://localhost:8000/api/process-markdown-file-or-content" \
    -H "Content-Type: application/json" \
    -d '{
        "file_path": "/path/to/file.md",
        "instruction": "Summarize the content and extract key points"
    }'

Using direct content:

curl -X POST "http://localhost:8000/api/process-markdown-file-or-content" \
    -H "Content-Type: application/json" \
    -d '{
        "content": "# My Document\nThis is some markdown content.",
        "instruction": "Summarize the content and extract key points"
    }'

Notes

  • Either file_path or content must be provided, but not both
  • When using file_path, the file must be local and accessible to the server
  • Only .md files are supported when using file_path
  • Processing is done using OpenRouter's AI models
  • The model parameter is optional and defaults to Mistral AI

Admin Endpoints

POST /admin/credentials/google

Update Google Cloud credentials for Vision API and Vertex AI services.

Authentication

Requires admin API key in the X-API-Key header.

Request Body

{
    "type": "service_account",
    "project_id": "your-project-id",
    "private_key_id": "private-key-id",
    "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
    "client_email": "service-account@project-id.iam.gserviceaccount.com",
    "client_id": "client-id",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account%40project-id.iam.gserviceaccount.com"
}

Example Request

curl -X POST "http://localhost:8000/admin/credentials/google" \
    -H "Content-Type: application/json" \
    -d @google-credentials.json

Example Response

{
    "success": true,
    "message": "Google credentials updated successfully",
    "timestamp": 1715068800.123456
}

GET /admin/credentials/google/status

Check if Google Cloud credentials are properly configured.

Authentication

Requires admin API key in the X-API-Key header.

Example Request

curl -X GET "http://localhost:8000/admin/credentials/google/status"

Example Response

{
    "success": true,
    "message": "Google credentials are configured for project 'your-project-id'",
    "timestamp": 1715068800.123456
}

Available Vision Models

Claude 3 Haiku

anthropic/claude-3-haiku-20240307

Fast and efficient vision model, best for quick analysis

Max Tokens: 1000 | Temperature: 0.1

Claude 3 Sonnet

anthropic/claude-3-sonnet-20240229

More powerful vision model, better for detailed analysis

Max Tokens: 1000 | Temperature: 0.1

GPT-4 Vision

openai/gpt-4-vision-preview

High-quality vision model with strong reasoning capabilities

Max Tokens: 1000 | Temperature: 0.1

Gemini Pro Vision

google/gemini-pro-vision

Google's vision model, good balance of speed and quality

Max Tokens: 1000 | Temperature: 0.1