Workflow guide

Introduction

This comprehensive guide walks you through the complete workflow of using GEM, from initial setup to downloading your matching results.

Prerequisites

Before starting with GEM, ensure you have:

1. Access Requirements

  • Console Access: Active account at my.tomtom.com
  • Project Assignment: Assigned to a project with GEM access
  • Authentication: Microsoft Entra ID credentials configured

2. Technical Setup

  • Azure CLI Installed: Download from Microsoft
  • Terminal Access: Command-line terminal on your machine
  • Storage Space: Adequate local disk space for your data files

3. Data Preparation

  • Data Format: Files in Apache Parquet format (.parquet extension)
  • Required Fields: Your data must include:
    • id (integer): Unique identifier
    • is_navigable (boolean): Navigability flag
    • geometry (LineString WKT): Road geometry

Example data record:

1{
2 "id": 5707295,
3 "is_navigable": true,
4 "geometry": "LINESTRING (145.18156715700002 -37.87340530899996, 145.1809221540001 -37.87356512499997)"
5}

4. Knowledge Prerequisites

  • Understanding of your source data structure
  • Familiarity with data licensing requirements
  • Basic command-line operations

Complete workflow

Step 1: Access GEM dashboard

  1. Navigate to Dashboard
  2. Log in using your Microsoft Entra ID credentials
  3. In the left navigation pane, select the appropriate Project from the dropdown menu
  4. Click on Global Entity Matcher in the sidebar

What you'll see:

  • List of previous matching jobs (if any)
  • Prepare Data + button for uploading new data
  • Trigger matching button to start new jobs
  • Search and filter capabilities for job history

Step 2: Prepare and upload data

2.1 Initiate upload process

  1. Click the Prepare Data + button on the dashboard
  2. The "Upload Data" modal window will appear

2.2 Select storage

In the "Select Storage" step:

  1. Storage Name: Select your target storage from the dropdown

    • If you have access to only one storage, it will be pre-selected
    • Multiple storages: choose the appropriate one for your project
  2. Click Next to proceed

Important Notes:

  • Storage access is managed through role-based access control
  • Contact your administrator if no storage appears
  • Storage must be configured before first use

2.3 Authorize storage access

The authorization step provides credentials for Azure CLI access.

Option A: Automatic Credential Integration (Recommended)

  1. Click the Unwrap button when prompted
  2. Review the security warning about displaying sensitive credentials
  3. Click Unwrap again to confirm
  4. The command will auto-populate with your credentials
  5. Copy the complete az login command
  6. Open your terminal and execute the command

Option B: Manual Credential Entry

  1. Copy the az login command template from the UI
  2. Paste it into your terminal
  3. Replace <client_id> and <client_secret> with your actual credentials
  4. Replace <tenant_id> with your tenant identifier
  5. Execute the command

Command example:

1az login --service-principal \
2 --username <client_id> \
3 --password <client_secret> \
4 --tenant <tenant_id>
Authorize Storage

Security Best Practices:

  • Never share credentials with unauthorized parties
  • Credentials are temporary and scoped to specific operations
  • Tokens expire after a set period
  • Re-authenticate if credentials expire

2.4 Upload your data file

1. In the Enter local file path field, type or paste the full path to your Parquet file
  • Windows Example: C:\Users\username\data\my_map_data.parquet
  • macOS/Linux Example: /Users/username/data/my_map_data.parquet
2. The UI will automatically extract the filename and update the upload command
3. Copy the generated az storage blob upload command
4. Execute the command in your terminal:
1az storage blob upload \
2 --account-name <storage-account> \
3 --container-name <container> \
4 --name your_data.parquet \
5 --file /path/to/local/your_data.parquet \
6 --auth-mode login
5. Wait for upload completion - Progress will display in your terminal
6. Click Finish to close the modal

Upload Data

Upload Tips:

  • Larger files take longer - be patient
  • Azure CLI supports files of any size
  • Ensure stable network connection
  • Keep the filename simple (no special characters)
  • Verify upload success before proceeding

Step 3: Trigger matching job

3.1 Open matching form

  1. Return to the GEM dashboard
  2. Click the Trigger matching button
  3. The "Run GEM Matching" form will appear

3.2 Complete the form

Fill in all required fields:

FieldDescriptionExample
Input file nameExact filename from upload step (including .parquet extension)my_map_data.parquet
Storage NameSame storage used for uploadSelect from dropdown
Matching TypeAlgorithm to use (currently: Road Matching only)Road Matching
Overture ReleaseReference map version (auto-populated)2024-09-24.0
Trigger Matching Form

3.3 Submit the job

  1. Review all entries for accuracy
  2. Ensure the filename matches exactly (case-sensitive)
  3. Verify you selected the correct storage
  4. Click Submit

Form Validation:

  • Empty fields will trigger validation errors
  • Incorrect filename will cause job failure
  • Storage mismatch will result in file not found error

3.4 Job submission confirmation

Upon successful submission:

  • The form closes automatically
  • A new entry appears in the job list
  • Initial status shows as In Progress
  • Job ID is generated for tracking

If submission fails:

  • Error message appears in the modal
  • Common causes:
    • File not found in specified storage
    • Invalid file format
    • GEM service temporarily unavailable
    • Authorization issues
  • Review error message and correct the issue

Step 4: Monitor job progress

4.1 Job status dashboard

The main dashboard displays all your matching jobs with real-time status updates.

Job Status Types:

StatusIconDescriptionTypical Duration
In Progress🔄Job is actively processingVaries by data size (~100K roads/hour)
SuccessMatching completed successfullyN/A - Ready for download
FailedJob encountered an errorN/A - Requires investigation
Job Status Dashboard

4.2 Using dashboard features

Search by Job ID:

  • Use the search bar to find specific jobs
  • Enter partial or complete job ID
  • Results filter in real-time

Filter Jobs:

  • Filter by status (In Progress, Success, Failed)
  • Filter by storage
  • Filter by Overture release version
  • Filter by matching type

Sort Options:

  • Sort by submission date
  • Sort by completion date
  • Sort by job name
  • Sort by status

4.3 Refresh status

  • The dashboard auto-refreshes periodically
  • Manual refresh available via browser refresh
  • Click on a job row to view detailed information

Monitoring Tips:

  • Processing time varies based on data size
  • Average: ~100,000 road segments per hour
  • Small datasets (< 10K roads): Minutes
  • Large datasets (> 1M roads): Hours
  • No email notifications yet (planned feature)

Step 5: View job details

5.1 Access details page

  1. Locate your job in the dashboard list
  2. Click the details arrow (→) at the end of the job row
  3. The Job Run Details page opens

5.2 Details page overview

The details page contains two main sections:

Job Run Details Section:

  • Job ID (unique identifier)
  • Input filename used
  • Storage location
  • Matching type applied
  • Overture release version
  • Submission timestamp
  • Completion timestamp (if finished)
  • Job status

Download Results Section (appears only for successful jobs):

  • Results filename
  • Download instructions
  • Azure CLI commands for downloading
Job Details Page

5.3 Interpreting results

For successful jobs, review the matching statistics:

  • Roads Matched: Percentage successfully matched to GERS IDs
  • Roads Unmatched: Percentage without matches
  • Roads Fully Matched: Complete single GERS ID assignments
  • Roads Partially Matched: Multiple potential matches
  • Confidence Threshold: Minimum score applied (typically >60%)

Quality Indicators:

  • 85% matched: Excellent quality

  • 70-85% matched: Good quality, review unmatched roads
  • Less than 70% matched: May indicate data quality issues

Step 6: Download results

6.1 Initiate download

For jobs with Success status:

  1. On the Job Details page, locate the Download Results section
  2. Click the Download button
  3. The "Download Data" modal appears
Download Results Section

6.2 Authorize storage (if needed)

If you're already authorized from the upload step, skip to 6.3.

If authorization expired or this is a new session:

  1. Follow the same authorization process as Step 2.3
  2. Unwrap credentials and execute az login command
  3. Proceed once authenticated

6.3 Specify download location

  1. In the Local destination directory path field, enter where you want results saved

    • Windows Example: C:\Users\username\downloads\gem_results
    • macOS/Linux Example: /Users/username/downloads/gem_results
  2. The system updates the download command with:

    • Storage account name
    • Container name
    • Results filename
    • Your specified destination
  3. Copy the complete az storage blob download command

6.4 Execute download

Run the command in your terminal:

1az storage blob download \
2 --account-name <storage-account> \
3 --container-name <container> \
4 --name predictions.parquet \
5 --file /path/to/destination/predictions.parquet \
6 --auth-mode login

Download Process:

  • Progress displays in terminal
  • Download time depends on results file size
  • Verify download completes successfully
  • Click Finish to close the modal
Download Data Modal

6.5 Verify downloaded results

After download completes:

  1. Navigate to your specified destination directory
  2. Confirm the predictions.parquet file exists
  3. Check file size is reasonable (not 0 bytes)
  4. Open file in Parquet viewer or analytical tool
  5. Verify data structure and content

Working with multiple jobs

Running parallel jobs

You can submit multiple jobs simultaneously:

  • Each job processes independently
  • No limit on concurrent jobs (subject to system capacity)
  • Track all jobs from the main dashboard
  • Download results as each job completes

Organizing your work

Best Practices:

  • Use descriptive filenames for easy identification
  • Keep track of which data corresponds to which job
  • Download results promptly after completion
  • Maintain local backup of input data
  • Document matching parameters used

Job history management

  • All jobs remain in your history
  • Search by job ID for quick access
  • Filter to find specific job types
  • Review past results for comparison
  • No automatic deletion of job records

Troubleshooting common issues

Authentication problems

Issue: az login fails

Solutions:

  • Verify credentials are copied correctly (no extra spaces)
  • Check credentials haven't expired
  • Confirm Client ID, Secret, and Tenant ID are correct
  • Try unwrapping credentials again from UI
  • Contact administrator if credentials are invalid

Upload failures

Issue: File upload fails or times out

Solutions:

  • Check internet connection stability
  • Verify file path is correct and file exists
  • Ensure sufficient storage permissions (Full Access role)
  • Try smaller file for testing
  • Check Azure CLI is installed correctly: az --version

Job submission errors

Issue: Cannot submit matching job

Solutions:

  • Confirm filename matches exactly (case-sensitive, including extension)
  • Verify file was successfully uploaded to storage
  • Ensure file is valid Parquet format
  • Check required fields (id, is_navigable, geometry) exist
  • Check system status or contact support

Job failures

Issue: Job status shows "Failed"

Solutions:

  • Review detailed error logs in job details page
  • Verify input data meets format requirements
  • Check data quality (valid geometries, complete records)
  • Ensure no corrupted records in Parquet file
  • Contact support with job ID for investigation

Download issues

Issue: Cannot download results

Solutions:

  • Re-authenticate if credentials expired
  • Verify destination directory exists and is writable
  • Check disk space availability
  • Ensure correct storage permissions (Read Access or Full Access)
  • Try different destination path

Tips for optimal results

Data quality

  • Clean geometries (valid WKT LineStrings)
  • Complete records (no null required fields)
  • Unique IDs for each road segment
  • Accurate is_navigable flags
  • Proper coordinate systems

Performance optimization

  • Process data in reasonable batches
  • Use fast, stable internet connection
  • Upload during off-peak hours for large files
  • Monitor job progress regularly
  • Download results promptly after completion

Matching quality

  • High-quality input data → higher match rates
  • Recent data → better alignment with Overture
  • Complete network coverage → fewer gaps
  • Review unmatched roads for patterns
  • Iterate and improve data quality

Getting help

Self-service resources

  1. In-App Help: Tooltips and inline guidance in GEM UI
  2. Documentation: This guide and related documentation
  3. FAQ: Common questions and answers
  4. Technical Docs: System architecture and integration details

Support channels

For Issues or Questions:

  1. Review job error logs in the detailed view
  2. Check prerequisites are met
  3. Consult troubleshooting section above
  4. Contact support team with:
    • Job ID (if applicable)
    • Error messages
    • Steps to reproduce
    • Screenshots if helpful

Support Contact:

  • Support portal
  • Email: [Contact through support]
  • Include: Job IDs, timestamps, error details

Next steps

After successfully matching your data:

  1. Analyze Results: Review matching statistics and confidence scores
  2. Validate Quality: Spot-check matched GERS IDs against your data
  3. Integrate: Use GERS IDs in your applications and workflows
  4. Iterate: Refine input data based on matching results
  5. Scale: Process additional datasets as needed

Additional resources

Workflow checklist

Use this checklist to ensure you complete all steps:

  • Prerequisites verified (access, tools, data format)
  • Logged into dashboard and accessed GEM dashboard
  • Selected appropriate project
  • Prepared data in Parquet format with required fields
  • Uploaded data via Prepare Data workflow
  • Authorized storage access with Azure CLI
  • Verified file upload success
  • Triggered matching job with correct parameters
  • Monitored job status to completion
  • Reviewed job details and matching statistics
  • Downloaded results successfully
  • Verified downloaded file integrity
  • Documented job details for records

Workflow Complete! You're now ready to use your matched data with GERS IDs.