Technical documentation
Introduction
This page provides technical details about the GEM architecture, data formats, and system components for developers and technical users.
System architecture
GEM consists of several integrated components:
GEM dashboard application (UI)
- Framework: Web-based interactive interface
- Deployment: Hosted on Dashboard (my.tomtom.com/gem)
- Authentication: Microsoft Entra ID (Azure AD)
- Authorization: Role-based access control
Backend services
- Map Matching Engine: Advanced algorithms for road network matching
- Pipeline Orchestration: Automated job processing and management
- Job Management: Status tracking and results generation
- Storage: Secure cloud storage for data management
Data format requirements
Input data specification
Input files must meet the following requirements:
File Format:
- Format: Apache Parquet
- Extension:
.parquet - Size: No limit (Azure CLI supports files of any size)
Required Fields:
| Field | Type | Description |
|---|---|---|
id | integer | Unique identifier for each road segment |
is_navigable | boolean | Flag indicating if the road is navigable |
geometry | LineString (WKT) | Road geometry in Well-Known Text format |
Example Record:
1{2 "id": 5707295,3 "is_navigable": true,4 "geometry": "LINESTRING (145.18156715700002 -37.87340530899996, 145.1809221540001 -37.87356512499997)"5}
Output data specification
Results File:
- Format: Apache Parquet
- Contains matched GERS IDs with confidence scores
- Includes matching metadata and statistics
Output Metrics:
| Metric | Description |
|---|---|
roads_matched | Percentage of roads successfully matched to GERS IDs |
roads_unmatched | Percentage of roads without any matches |
roads_fully_matched | Percentage of roads with complete, single GERS ID assignments |
roads_partially_matched | Percentage of partially matched roads |
confidence_threshold | Minimum confidence score for filtering (typically >60%) |
execution_time_sec | Job execution duration |
Authorization and security
Access control
GEM uses role-based access control for secure operations:
Permission Levels:
| Permission | Capabilities |
|---|---|
| Read Access | • View storage information • View job run data • Download results |
| Full Access | • All Read Access permissions • Trigger new job runs • Upload data to storage |
Resource Authorization:
- Job runs: Users can only access their own jobs
- Storage: Access controlled by your organization's permissions
- Credentials: Temporary, scoped to specific operations
Authentication flow
- User Authentication: Microsoft Entra ID via Dashboard
- Storage Authorization: System validates storage access
- Credential Generation: Temporary tokens for Azure CLI
Azure CLI integration
Installation
Install Azure CLI from Microsoft:
1# macOS2brew install azure-cli34# Windows5# Download from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows67# Linux8curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
Authentication
Login using Service Principal with credentials from GEM UI:
1az login --service-principal \2 --username <client_id> \3 --password <client_secret> \4 --tenant <tenant_id>
Upload data
1az storage blob upload \2 --account-name <storage-account> \3 --container-name <container> \4 --name data.parquet \5 --file /path/to/local/data.parquet \6 --auth-mode login
Download results
1az storage blob download \2 --account-name <storage-account> \3 --container-name <container> \4 --name predictions.parquet \5 --file /path/to/destination/predictions.parquet \6 --auth-mode login
Job statuses
GEM jobs progress through the following states:
| Status | Description | User Action |
|---|---|---|
| In Progress | Job is currently running | Wait for completion |
| Success | Matching completed successfully | Download results |
| Failed | Job encountered an error | Review logs, contact support |
Performance metrics
System performance
Based on production metrics:
- Uptime: ≥99% (continuously monitored)
- Processing Speed: ~100,000 road segments matched per hour
- Matching Accuracy: >85% confidence scores for high-quality input data
- System Availability: Deployed on production cluster with Helm
Performance monitoring
Performance is continuously monitored to ensure:
- High system uptime
- Fast job execution
- Accurate matching results
- Minimal errors
Metrics tracked:
- System uptime
- Job execution time
- Matching accuracy
- Error rates
Access requirements
Access to GEM is controlled by your organization and project assignments. Access the service at my.tomtom.com/gem. Contact your system administrator if you need access to GEM.
Matching algorithms
AI-driven matching
GEM uses advanced algorithms for map matching:
- Algorithm Type: AI-driven road network matching
- Matching Strategy: Geometry-based with topological validation
- Confidence Scoring: Probabilistic confidence for each match
- Sub-Segment Precision: Linear referencing for detailed attribution
Matching types
Currently supported:
- Road Matching: Match road network data to Overture Maps road segments
Future matching types (planned):
- Point of Interest (POI) matching
- Address matching
- Building footprint matching
Reference map data
Overture Maps integration
Current Release: 2024-09-24.0
GEM matches against Overture Maps Foundation datasets:
- GERS IDs: Global Entity Reference System identifiers
- Road Network: Comprehensive global road coverage
- Update Frequency: Periodic releases from Overture Foundation
- Data Quality: Community-validated and continuously improved
Technical requirements
Client requirements
- Azure CLI: Latest version installed locally
- Network: Stable internet connection for large file transfers
- Storage: Sufficient local disk space for data files
- Browser: Modern web browser for UI access (Chrome, Firefox, Safari, Edge)
Data requirements
- Input Size: No theoretical limit (Azure CLI handles any file size)
- Format Compliance: Must be valid Parquet with required schema
- Geometry Format: Valid WKT LineString geometries
- Data Quality: Better input quality leads to higher matching confidence
Security and compliance
Data security
- Encryption in Transit: TLS 1.2+ for all communications
- Encryption at Rest: Azure Blob Storage encryption
- Credential Security: Temporary tokens with limited scope
- Access Logging: Comprehensive audit trails
Compliance
- Quality Assurance: Enterprise-grade code quality scanning
- Security Scanning: Regular vulnerability detection and patching
- Database Security: Encrypted storage with network isolation
Error handling
Common error scenarios
| Error | Cause | Resolution |
|---|---|---|
| Authentication Failed | Invalid credentials | Verify Client ID and Secret |
| Upload Failed | Network or permission issue | Check storage access and retry |
| Job Submission Failed | Invalid input format | Validate Parquet schema |
| Matching Failed | Data quality or system error | Review logs, contact support |
| Download Failed | Expired credentials | Re-authenticate and retry |
Support resources
For technical issues:
- Review job error logs in detailed view
- Contact support team
Future enhancements
Planned features
Matching Capabilities:
- Support for additional entity types beyond road networks
- Enhanced matching algorithms
- Support for additional data sources
User Experience:
- Matching visualization tools
- Enhanced result analytics
- Additional export formats
Operations:
- Enhanced monitoring and notifications
- Extended support options