Technical documentation

Introduction

This page provides technical details about the GEM architecture, data formats, and system components for developers and technical users.

System architecture

GEM consists of several integrated components:

GEM dashboard application (UI)

  • Framework: Web-based interactive interface
  • Deployment: Hosted on Dashboard (my.tomtom.com/gem)
  • Authentication: Microsoft Entra ID (Azure AD)
  • Authorization: Role-based access control

Backend services

  • Map Matching Engine: Advanced algorithms for road network matching
  • Pipeline Orchestration: Automated job processing and management
  • Job Management: Status tracking and results generation
  • Storage: Secure cloud storage for data management

Data format requirements

Input data specification

Input files must meet the following requirements:

File Format:

  • Format: Apache Parquet
  • Extension: .parquet
  • Size: No limit (Azure CLI supports files of any size)

Required Fields:

FieldTypeDescription
idintegerUnique identifier for each road segment
is_navigablebooleanFlag indicating if the road is navigable
geometryLineString (WKT)Road geometry in Well-Known Text format

Example Record:

1{
2 "id": 5707295,
3 "is_navigable": true,
4 "geometry": "LINESTRING (145.18156715700002 -37.87340530899996, 145.1809221540001 -37.87356512499997)"
5}

Output data specification

Results File:

  • Format: Apache Parquet
  • Contains matched GERS IDs with confidence scores
  • Includes matching metadata and statistics

Output Metrics:

MetricDescription
roads_matchedPercentage of roads successfully matched to GERS IDs
roads_unmatchedPercentage of roads without any matches
roads_fully_matchedPercentage of roads with complete, single GERS ID assignments
roads_partially_matchedPercentage of partially matched roads
confidence_thresholdMinimum confidence score for filtering (typically >60%)
execution_time_secJob execution duration

Authorization and security

Access control

GEM uses role-based access control for secure operations:

Permission Levels:

PermissionCapabilities
Read Access• View storage information
• View job run data
• Download results
Full Access• All Read Access permissions
• Trigger new job runs
• Upload data to storage

Resource Authorization:

  • Job runs: Users can only access their own jobs
  • Storage: Access controlled by your organization's permissions
  • Credentials: Temporary, scoped to specific operations

Authentication flow

  1. User Authentication: Microsoft Entra ID via Dashboard
  2. Storage Authorization: System validates storage access
  3. Credential Generation: Temporary tokens for Azure CLI

Azure CLI integration

Installation

Install Azure CLI from Microsoft:

1# macOS
2brew install azure-cli
3
4# Windows
5# Download from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows
6
7# Linux
8curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Authentication

Login using Service Principal with credentials from GEM UI:

1az login --service-principal \
2 --username <client_id> \
3 --password <client_secret> \
4 --tenant <tenant_id>

Upload data

1az storage blob upload \
2 --account-name <storage-account> \
3 --container-name <container> \
4 --name data.parquet \
5 --file /path/to/local/data.parquet \
6 --auth-mode login

Download results

1az storage blob download \
2 --account-name <storage-account> \
3 --container-name <container> \
4 --name predictions.parquet \
5 --file /path/to/destination/predictions.parquet \
6 --auth-mode login

Job statuses

GEM jobs progress through the following states:

StatusDescriptionUser Action
In ProgressJob is currently runningWait for completion
SuccessMatching completed successfullyDownload results
FailedJob encountered an errorReview logs, contact support

Performance metrics

System performance

Based on production metrics:

  • Uptime: ≥99% (continuously monitored)
  • Processing Speed: ~100,000 road segments matched per hour
  • Matching Accuracy: >85% confidence scores for high-quality input data
  • System Availability: Deployed on production cluster with Helm

Performance monitoring

Performance is continuously monitored to ensure:

  • High system uptime
  • Fast job execution
  • Accurate matching results
  • Minimal errors

Metrics tracked:

  • System uptime
  • Job execution time
  • Matching accuracy
  • Error rates

Access requirements

Access to GEM is controlled by your organization and project assignments. Access the service at my.tomtom.com/gem. Contact your system administrator if you need access to GEM.

Matching algorithms

AI-driven matching

GEM uses advanced algorithms for map matching:

  • Algorithm Type: AI-driven road network matching
  • Matching Strategy: Geometry-based with topological validation
  • Confidence Scoring: Probabilistic confidence for each match
  • Sub-Segment Precision: Linear referencing for detailed attribution

Matching types

Currently supported:

  • Road Matching: Match road network data to Overture Maps road segments

Future matching types (planned):

  • Point of Interest (POI) matching
  • Address matching
  • Building footprint matching

Reference map data

Overture Maps integration

Current Release: 2024-09-24.0

GEM matches against Overture Maps Foundation datasets:

  • GERS IDs: Global Entity Reference System identifiers
  • Road Network: Comprehensive global road coverage
  • Update Frequency: Periodic releases from Overture Foundation
  • Data Quality: Community-validated and continuously improved

Technical requirements

Client requirements

  • Azure CLI: Latest version installed locally
  • Network: Stable internet connection for large file transfers
  • Storage: Sufficient local disk space for data files
  • Browser: Modern web browser for UI access (Chrome, Firefox, Safari, Edge)

Data requirements

  • Input Size: No theoretical limit (Azure CLI handles any file size)
  • Format Compliance: Must be valid Parquet with required schema
  • Geometry Format: Valid WKT LineString geometries
  • Data Quality: Better input quality leads to higher matching confidence

Security and compliance

Data security

  • Encryption in Transit: TLS 1.2+ for all communications
  • Encryption at Rest: Azure Blob Storage encryption
  • Credential Security: Temporary tokens with limited scope
  • Access Logging: Comprehensive audit trails

Compliance

  • Quality Assurance: Enterprise-grade code quality scanning
  • Security Scanning: Regular vulnerability detection and patching
  • Database Security: Encrypted storage with network isolation

Error handling

Common error scenarios

ErrorCauseResolution
Authentication FailedInvalid credentialsVerify Client ID and Secret
Upload FailedNetwork or permission issueCheck storage access and retry
Job Submission FailedInvalid input formatValidate Parquet schema
Matching FailedData quality or system errorReview logs, contact support
Download FailedExpired credentialsRe-authenticate and retry

Support resources

For technical issues:

  1. Review job error logs in detailed view
  2. Contact support team

Future enhancements

Planned features

Matching Capabilities:

  • Support for additional entity types beyond road networks
  • Enhanced matching algorithms
  • Support for additional data sources

User Experience:

  • Matching visualization tools
  • Enhanced result analytics
  • Additional export formats

Operations:

  • Enhanced monitoring and notifications
  • Extended support options

Additional resources