Overview
The Databricks integration provides a unified data platform for advanced analytics, enabling SQL-based queries and data management across multiple business domains including SCADA analysis, recipe control, and route management.Architecture
The Databricks integration is built on a comprehensive archetype pattern:Core Components
- DatabricksAClient - HTTP client for Databricks SQL API (
DataBricks/Database/DatabricksAClient.cs) - GenericRepository - Base repository pattern for data access (
DataBricks/Archetype/Repository/GenericRepository.cs) - DatabricksExecutor - SQL execution engine
- SqlBuilder - Dynamic SQL query builder
- DatabricksUnitOfWork - Transaction management
Service Interfaces
Location:Program.cs:377-379
Authentication
Databricks supports three authentication modes:1. Static Token
2. Azure CLI
Uses Azure CLI credentials for authentication:3. Managed Identity
Uses Azure Managed Identity (for production deployments):DataBricks/Database/DatabricksAClient.cs:9-66
Business Domains
The Databricks integration covers four main business domains:1. RegistroNIR (NIR Analysis Registration)
Manages SCADA (Supervisory Control and Data Acquisition) analysis data for semi-finished and finished products.Entities
- SemielaboradoAnalisis - Semi-finished product analysis
- SemielaboradoAnalisisReal - Real-time semi-finished analysis
- SemielaboradoAnalisisNIR - NIR analysis for semi-finished products
- ProductoTerminadoAnalisis - Finished product analysis
- ProductoTerminadoAnalisisReal - Real-time finished product analysis
- ProductoTerminadoAnalisisNIR - NIR analysis for finished products
- Homologacion - Product homologation records
2. ControlReceta (Recipe Control)
Manages product recipes, tolerances, and material specifications.Entities
- MaterialPT - Finished product materials
- ToleranciaPT - Finished product tolerances
- UnidadToleranciaCatalogo - Tolerance unit catalog
3. GestionRutas (Route Management)
Manages transportation routes and distances.Entities
- RutaDistancia - Route distance information
4. Archetype (Generic Data Operations)
Provides reusable patterns for all Databricks operations.GraphQL Queries
RegistroNIR Queries
GetSemielaboradoAnalisisById
Retrieves a semi-finished product analysis record by ID. Location:DataBricks/Schema/Queries/RegistroNIR/ScadaSemiAnalisisQuery.cs:11-16
Unique identifier for the analysis record
GetSemielaboradoAnalisisPaged
Retrieves paginated semi-finished product analysis records with filtering.Filter criteria including:
pagination- Page number and sizesort- Sort fields and directionconditions- Filter conditions
GetSemielaboradoAnalisisByConditions
Retrieves all matching records based on filter conditions.GestionRutas Queries
GetRutaDistanciaById
Retrieves route distance information by ID. Location:DataBricks/Schema/Queries/GestionRutas/RutaDistanciaQuery.cs:11-16
Year of the route data (e.g., 2025)
Route category (T0, T1, T2, IMP-TRIGO, etc.)
Origin location
Destination location
Distance in kilometers
GetRutaDistanciaAvailableImportYears
Retrieves available years for importing route data. Location:DataBricks/Schema/Queries/GestionRutas/RutaDistanciaQuery.cs:34-38
ControlReceta Queries
GetMaterialesPorSector
Retrieves materials filtered by sector. Location:DataBricks/Schema/Queries/ControlReceta/MaterialPTQuery.cs:10-11
GetMaterialesByConditions
Retrieves materials based on filter conditions.Data Models
SemielaboradoAnalisis
Table:nir_semielaborado_analisis_cabecera
Location: DataBricks/Entities/RegistroNIR/SemielaboradoAnalisis.cs
RutaDistancia
Table:ruta_distancia
Location: DataBricks/Entities/GestionRutas/RutaDistancia.cs
ToleranciaPT
Table:tolerancia_pt
Location: DataBricks/Entities/ControlReceta/ToleranciaPT.cs
Filter System
The Databricks integration uses a powerful filter system:FilterRequestInput
PaginationInput
SortInput
ConditionInput
SQL Execution
The Databricks client provides three execution methods:ExecuteQueryAsync
Executes a SELECT query and waits for results. Location:DataBricks/Database/DatabricksAClient.cs:119-156
- Timeout: 60 seconds
- Polling interval: 250ms
- Returns: JSON result set
ExecuteNonQueryAsync
Executes INSERT/UPDATE/DELETE and waits for completion. Location:DataBricks/Database/DatabricksAClient.cs:158-217
- Returns: Number of rows affected
- Includes result validation
ExecuteNonQueryFastAsync
Executes a statement without waiting for completion (fire-and-forget). Location:DataBricks/Database/DatabricksAClient.cs:219-236
- No result waiting
- Fast execution
- Use for bulk operations
Transaction Management
TheDatabricksUnitOfWork class manages transactions:
GraphQL Mutations
Mutations follow similar patterns across all domains:Error Handling
Timeout Errors
Execution Errors
HTTP Errors
Standard HTTP status codes with detailed error messages.Configuration
Required configuration inappsettings.json:
Configuration Parameters
Databricks workspace URL
SQL warehouse identifier
Azure Databricks resource ID for authentication
Authentication mode: StaticToken, AzureCli, or ManagedIdentity
Static access token (required only for StaticToken mode)
Performance Optimization
- Pagination: Always use pagination for large datasets
- Filtering: Apply filters at the database level, not in memory
- Caching: Token caching reduces authentication overhead
- Batch Operations: Use
ExecuteNonQueryFastAsyncfor bulk inserts - Connection Pooling: HttpClient is configured with connection pooling
Best Practices
- Use Filters: Leverage the FilterRequestInput for efficient queries
- Pagination: Always paginate large result sets
- Transaction Management: Use UnitOfWork for multi-statement operations
- Error Handling: Implement proper timeout and error handling
- Authentication: Use Managed Identity in production environments
- SQL Injection: Use parameterized queries (handled by SqlBuilder)
- Active Flags: Soft delete using
activefield instead of hard deletes