Getting Started with GEM

Introduction

Global Entity Matcher (GEM) is designed to simplify the process of aligning your geospatial data with standardized identifiers. This guide will help you understand how to access and begin using GEM for your organization.

Overview

The Global Entity Matcher (GEM) enables you to match your proprietary transportation network data against Overture Maps reference datasets. GEM provides matching results with confidence scores, helping you assess the accuracy and reliability of the alignment between your data and reference map elements.

Prerequisites

Before getting started with GEM, ensure you have:

1. Console Access

  • Dashboard Account: Active access to my.tomtom.com
  • Microsoft Entra ID: Authentication credentials configured
  • Project Assignment: Assigned to a project with GEM access enabled

2. Technical Tools

  • Azure CLI: Required for data upload/download operations
    • Install from Microsoft Azure CLI
    • Azure CLI is the recommended method as it supports files of any size
    • Alternative upload methods may be restricted by system memory and network limitations
  • Terminal Access: Command-line interface on your machine
  • Stable Internet: For uploading/downloading large datasets

3. Data Requirements

  • Format: Input files must be in Apache Parquet format with .parquet extension
    • Files in other formats can be uploaded to storage but will not trigger the matching process
  • Required Fields: Each record must contain:
    • id: Unique identifier (integer)
    • is_navigable: Boolean flag indicating if the road is navigable
    • geometry: LineString in WKT (Well-Known Text) format

Example record structure:

1{
2 "id": 5707295,
3 "is_navigable": true,
4 "geometry": "LINESTRING (145.18156715700002 -37.87340530899996, 145.1809221540001 -37.87356512499997)"
5}

4. Knowledge Requirements

  • Understanding of your source data structure and format
  • Familiarity with data licensing requirements and restrictions
  • Basic command-line operations
  • Understanding of geospatial data concepts

How to access GEM

GEM is available through the Dashboard platform.

Access steps:

  1. Navigate to Dashboard

  2. Authenticate

    • Log in using Microsoft Entra ID (Azure AD) credentials
    • Authentication is required to access the GEM interface
  3. Select Project

    • In the left navigation pane, select the appropriate project from the dropdown menu
    • Only projects with GEM access will show the Global Entity Matcher option
  4. Access GEM Dashboard

    • Click Global Entity Matcher in the sidebar
    • If this option is not visible, your organization or project may not be onboarded yet

Access requirements

Note: If your organization or project is not supported by GEM UI, the "Global Entity Matcher" option will not appear in the sidebar. Contact your system administrator or support team to request access.

GEM UI capabilities

The GEM User Interface provides comprehensive features for data matching:

Key features:

Request Management

  • View and track all previous matching requests
  • Search jobs by ID
  • Filter by status, storage, Overture release, and matching type
  • Access detailed information for each job run

Data Preparation

  • Retrieve storage credentials securely
  • Upload data using Azure CLI with step-by-step guidance
  • Support for large files (no size limit with Azure CLI)

Job Submission

  • Trigger matching requests with customizable parameters
  • Select input file, storage, matching type, and Overture release
  • Real-time validation of form inputs

Results Visualization

  • Monitor matching job status and progress
  • View detailed matching statistics
  • Track confidence scores and match quality metrics

Download Results

  • Retrieve matching results via Azure CLI
  • Secure credential management
  • Guided download process

How to access GEM

GEM is currently available through direct engagement with our sales team. To get started:

  1. Contact Sales: Reach out to the sales team to discuss your specific needs
  2. Define Your Use Case: Share your requirements, data types, and business objectives
  3. Onboarding: Work with our team to configure GEM for your specific datasets and workflows

Quick start guide

Follow these steps for your first matching job:

Step 1: Install Azure CLI

If not already installed, download and install Azure CLI:

macOS:

brew install azure-cli

Windows: Download from Microsoft Azure CLI

Linux:

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Verify installation:

az --version

Step 2: Access GEM dashboard

  1. Go to my.tomtom.com/gem
  2. Log in with your Microsoft Entra ID credentials
  3. Select your project from the dropdown
  4. You should see the GEM main page with:
    • Brief description of GEM
    • Table of previous matching runs (empty for first-time users)
    • Prepare Data button
    • Trigger matching button

Step 3: Upload your data

  1. Click the Prepare Data + button
  2. Select your storage from the dropdown
  3. Follow the authorization steps:
    • Click Unwrap to reveal credentials
    • Copy and run the az login command in your terminal
  4. Enter your local file path
  5. Copy and run the az storage blob upload command
  6. Wait for upload completion
  7. Click Finish

Step 4: Trigger matching

  1. Click the Trigger matching button
  2. Fill in the form:
    • Input file name: Your uploaded filename (e.g., my_data.parquet)
    • Storage Name: Select the storage you used
    • Matching Type: Road Matching (currently the only option)
    • Overture Release: Auto-populated (e.g., 2024-09-24.0)
  3. Click Submit
  4. Your job appears in the list with "In Progress" status

Step 5: Monitor and download

  1. Refresh the dashboard to check job status
  2. When status changes to "Success", click the details arrow (→)
  3. Review the matching statistics
  4. Click Download in the Download Results section
  5. Authorize storage access (if needed)
  6. Specify local destination directory
  7. Copy and run the az storage blob download command
  8. Your results are now available locally

Processing Time: Approximately 100,000 road segments per hour. Small datasets may complete in minutes, while larger datasets may take several hours.

Understanding your data

To get the most out of GEM, it's important to understand:

  1. Data Format: The structure and format of your geospatial datasets
  2. Data Quality: Current quality and accuracy of your data
  3. Update Frequency: How often your data changes and needs to be synchronized
  4. Coverage Area: Geographic regions covered by your datasets

System status and performance

Current status

GEM is fully operational and deployed in production:

System Metrics:

  • Uptime: ≥99% (monitored continuously)
  • Processing Speed: ~100,000 road segments matched per hour
  • Matching Accuracy: >85% confidence scores for high-quality input data
  • System Availability: Deployed on production cluster with Helm
  • Security: No critical vulnerabilities - Regular security scanning active

Infrastructure:

  • Database: Cloud database configured and operational
  • Storage: Azure Blob Storage integration active
  • Authorization: Role-based access control enforced
  • Authentication: Microsoft Entra ID integrated

Performance monitoring

System performance and health is monitored continuously to ensure high availability and reliability.

Integration approach

GEM supports an iterative approach to integration:

  1. Initial Assessment: Analyze your current datasets and conflation challenges
  2. Pilot Implementation: Start with a representative subset of data to validate the approach
  3. Quality Review: Evaluate matching results and confidence scores
  4. Gradual Rollout: Expand coverage based on business priorities and results
  5. Continuous Optimization: Refine data quality and processes based on feedback

Phase 1: Pilot (2-4 weeks)

  • Select small, representative dataset
  • Run initial matching job
  • Validate results quality
  • Identify data quality issues
  • Establish success criteria

Phase 2: Expansion (4-8 weeks)

  • Process larger datasets
  • Implement in production workflows
  • Monitor performance and accuracy
  • Gather user feedback
  • Document best practices

Phase 3: Production (Ongoing)

  • Regular data updates and matching
  • Continuous quality monitoring
  • Integration with downstream systems
  • Periodic reviews and optimization

Next steps

Once you have access to GEM and completed your first match:

Immediate actions

  1. Review Results: Analyze matching statistics and confidence scores

    • Check roads_matched percentage
    • Review roads_fully_matched vs roads_partially_matched
    • Identify unmatched roads for investigation
  2. Validate Quality: Spot-check a sample of matched GERS IDs

    • Compare with your source data
    • Verify geometry alignment
    • Check confidence scores
  3. Document Findings: Record your observations

    • Note data quality issues discovered
    • Document successful matching patterns
    • Identify areas for improvement

Ongoing activities

  1. Improve Data Quality: Based on matching results

    • Fix geometry errors
    • Complete missing fields
    • Validate navigability flags
    • Ensure unique IDs
  2. Process Additional Data: Expand your matching coverage

    • Upload remaining datasets
    • Match different geographic regions
    • Process historical data for comparison
  3. Integrate GERS IDs: Use matched identifiers in your applications

    • Update databases with GERS IDs
    • Modify data pipelines
    • Enable interoperability with other systems
  4. Monitor Performance: Track your matching jobs

    • Review processing times
    • Monitor match rates over time
    • Identify optimization opportunities

Support and documentation

Available resources

Self-Service Documentation:

Technical Support:

  • Support: Contact through support portal
  • Error Logs: Available in job details page

Training Resources:

  • Inline guidance and tooltips in GEM UI
  • Azure CLI documentation from Microsoft
  • Azure Blob Storage guides

Getting help

When contacting support, provide:

  • Job ID (if applicable)
  • Error messages (exact text)
  • Steps to reproduce the issue
  • Screenshots (if helpful)
  • Data sample (if related to data quality)

Typical Response Times:

  • Critical issues: Contact support immediately
  • General questions: 1-2 business days
  • Feature requests: Reviewed quarterly

Typical timeline

The time to get started with GEM varies based on your setup:

PhaseDurationActivities
Initial Setup1-2 daysInstall Azure CLI, verify access, prepare first dataset
First Match1 dayUpload data, trigger job, download results
Pilot Evaluation1-2 weeksTest with representative data, validate results
Process Refinement2-4 weeksImprove data quality, optimize workflow
Production DeploymentOngoingRegular matching jobs, integration with systems

Factors Affecting Timeline:

  • Data quality and preparation time
  • Dataset size and complexity
  • Internal approval processes
  • Integration requirements
  • Team availability and experience

Best practices for getting started

Data preparation

  • Start with clean, well-structured data
  • Validate Parquet files before upload
  • Ensure all required fields are present
  • Use descriptive filenames
  • Test with small dataset first

Job management

  • Document each job's purpose and parameters
  • Use consistent naming conventions
  • Download results promptly after completion
  • Keep local backups of input and output data
  • Track job IDs for reference

Quality assurance

  • Review matching statistics for each job
  • Investigate unmatched or low-confidence matches
  • Compare results across different data versions
  • Validate random samples manually
  • Monitor trends over time

Security

  • Never share credentials with unauthorized parties
  • Use credentials only in secure terminals
  • Don't store credentials in scripts or code
  • Re-authenticate when credentials expire
  • Report any security concerns immediately

Contact information

Ready to get started? Contact our team:

  • Dashboard Portal: my.tomtom.com
  • Product Page: GEM
  • Support: Available through support platform
  • Sales: Contact for enterprise or on-premises deployments

Additional resources