Skip to content
B

Loading experience

Knowledge GraphsSJSU · DATA 266 · 2025

Knowledge Graph-Enhanced Traffic Optimization System for the Bay Area

Anshu Reddy Dhamana, Basanth Periyapatna Roopa Kumar, Manav Rajesh Anandani, Nischitha Nagendran, Nitya Rondla, Srithareddy Devireddy, Vinuthna Papana

Department of Applied Data Science · San Jose State University · DATA 266: Generative AI

Abstract

This paper introduces a knowledge graph-based traffic optimization system for The Bay Area. Traditionally, traffic forecasting models have difficulties fusing up with various data and expressing intricate spatiotemporal relations among traffic entities. We overcome this limitation with the establishment of a wide-ranging knowledge graph framework that captures semantic associations amongst traffic junctions, roads stretches, weather circumstances, incidents and events.

Our method uses traffic meter data, weather info, incident reports, and event schedules into a unified knowledge graph, allowing us to capture more feature-rich and relationship-based information. The experimental results show that adding knowledge graph semantics achieves higher prediction accuracy, with the "connects_to" relationship resulting in the best prediction (92% accuracy). Our system provides actionable knowledge into how various consequents impacts traffic beats on weekdays in contrast to weekends, with usefulness to traffic operations and city arranging.

Knowledge graphs offer a viable option for modeling complex systems because they organize entities and their relationships in a systematic and meaningful manner — capturing both semantic context and the intricate interconnections within transportation systems.

Peak Prediction Accuracy

92%

Accuracy Improvement (KG vs Baseline)

+8–15%

RMSE Reduction

12–20%

Graph Nodes

550+

Edge / Relationship Types

6

Graph Embedding Dimension

64

System Architecture

Fig 1 — Knowledge Graph-Enhanced Traffic Optimization System

Traffic Data

traffic.csv · historical patterns

Commute Data

bay_area_9_all_commutes

Road Metadata

ca_meta.csv

Real-Time Traffic

511.org API · WZDx · Toll Data

Events & Incidents

Traffic Events API · JSON

Knowledge Graph (NetworkX → Neo4j)

Traffic Junctions

Intersections · highways

Road Segments

Sections between junctions

Weather Conditions

clear · rain · fog

Incidents

accidents · construction

Events

Special area events

Time Periods

Hour of day · day of week

Edge Types

connects_to (92%)affected_by_incident (90%)part_of_route (89%)affected_by_weather (87%)located_near (85%)

Node2Vec Embeddings

dim=64 · walk=30 · walks/node=200

Graph Feature Vectors

Semantic + structural node representation

Temporal Features

Hour · Day · Junction ID

Classical ML Features

Vehicle count · speed · weather

KG-Enhanced ML Pipeline

Random Forest

Nonlinear boundaries

XGBoost

Grid search tuned

LightGBM

Memory efficient

Linear / Ridge

Baseline models

KNN Regressor

Local patterns

Stacking Ensemble

Meta-model boosting

Traffic Predictions

92% accuracy with connects_to

Streamlit Dashboard

Route optimizer · junction explorer · forecasts

Data Collection & Preprocessing

Traffic & Commute Data

The primary traffic dataset (traffic.csv) includes data on traffic volume (vehicle counts), speed, and junction identification. Temporal variables such as hour of day and day of week were extracted for temporal pattern analysis. Commute data from bay_area_9_all_commutes_names.csv provided route-level linkages between road segments and popular commuting pathways, enhancing the knowledge graph with route-specific information.

Real-Time API Sources

Dynamic traffic conditions were captured via three 511.org API feeds: Work Zones Data (WZDx API — construction sites and roadwork), Traffic Events API (incidents, accidents, and traffic-related occurrences), and the Toll Data API (toll rates and affected road segments). API responses were saved as JSON files and included in the graph construction. California road network metadata (ca_meta.csv) provided structural information about road segments, junction kinds, lane counts, and speed limits.

Knowledge Graph Construction

Node Types

The knowledge graph includes six node types: Traffic Junctions (intersections and highway interchanges), Road Segments (road sections between junctions), Weather Conditions (clear, rain, fog), Incidents (accidents, construction), Events (special area events), and Time Periods (times of day and week). The graph was built using NetworkX, a Python graph analysis library, and exported to Neo4j for sophisticated querying and visualization.

Knowledge Graph Relationship Types and Prediction Accuracy

RelationshipDescriptionPrediction Accuracy
connects_toRoad segments ↔ intersections92%
affected_by_incidentJunctions/segments ↔ incidents90%
part_of_routeRoad segments ↔ common routes89%
affected_by_weatherIntersections ↔ weather states87%
located_nearProximity between entities85%

Node2Vec Embeddings

Graph structures were converted into numerical vector representations using node2vec, a popular graph embedding algorithm that extends the word2vec approach to graphs. Node2vec uses biased random walks to explore the neighborhood of each node, balancing between breadth-first and depth-first exploration strategies. Key parameters: embedding dimension 64, walk length 30, number of walks per node 200, context window size 10, minimum count 1.

A fallback embedding strategy ensures resilience when node2vec is unavailable — structured random embeddings with type-specific first-5-dimension signatures maintain semantic coherence for each node type (intersections, road segments, weather, time periods).

Machine Learning Models

Ensemble of Regression Models

Feature inputs comprised both typical traffic features (junction ID, time, and day of week) and knowledge graph node embeddings (64 dimensions), allowing the models to use both explicit features and graph relationship data. Five model families were evaluated: Random Forest Regressor, XGBoost (grid search-tuned), LightGBM, Linear/Ridge Regression baselines, KNN Regressor, and a Stacking Ensemble meta-model.

The AdvancedTrafficPredictionModel and KnowledgeGraphEnhancedModel classes implemented standard and categorical feature preprocessing using scikit-learn pipelines, temporal reliability split for training/evaluation, feature importance analysis, and comparative ablation between baseline and KG-enhanced models.

Evaluation Framework

Evaluation Metrics Used

MetricFormulaPurpose
R² Score1 - SS_res/SS_totVariance explained by model
RMSE√(mean((y_pred−y_true)²))Forecast error magnitude
MAEmean(|y_pred−y_true|)Average absolute error

Results

Impact of Knowledge Graph Relationships

One of the research's significant discoveries is that diverse knowledge graph relationships have distinct effects on prediction accuracy. The connects_to relationship yielded the highest prediction accuracy (92%), followed by affected_by_incident (90%), part_of_route (89%), affected_by_weather (87%), and located_near (85%). This shows that topological connectedness between junctions and road segments is the most useful feature in traffic forecasting, followed by incident data.

External Factors Analysis

Our investigation compared the impact of external influences on traffic patterns during weekdays and weekends. Weather conditions (particularly rain and snow) have a more significant impact on weekday traffic. Special events have a more pronounced effect on weekend traffic. Construction activities affect weekday traffic more heavily. Accidents impact both but with different temporal patterns.

Performance vs. Baseline Methods

The knowledge graph-enhanced strategy successfully beat baseline approaches in all parameters, with accuracy increases ranging from 8% to 15% and RMSE reductions of 12% to 20%. This highlights the importance of using knowledge graph data for traffic prediction tasks — the semantic links between traffic components provide information that standard feature engineering cannot replicate.

Streamlit Dashboard

A Streamlit online application enables users to optimize traffic using the knowledge graph and prediction models through an interactive interface. Key features: Optimal Route Finding (start/end junction selection with congestion-aware recommendations), Temporal Optimization (best departure times based on past and expected traffic patterns), Traffic Level Forecasting (color-coded vehicle count visualizations across the day), and an Interactive Junction Explorer (heatmap interface showing traffic flow patterns and junction connectivity strength).

Conclusion

This study describes a knowledge graph-enhanced traffic optimization system for the Bay Area that combines multiple data sources into a single framework for traffic prediction and analysis. The method uses the semantic links between traffic components contained in a knowledge graph to increase prediction accuracy and provide interpretable insights into traffic dynamics. The experimental results showed that the knowledge graph technique consistently outperformed traditional methods, with the connects_to relationship contributing the most to prediction accuracy.

Future work will address scalability for very large transit networks, temporal dynamics via graph neural networks (GNNs), real-time knowledge graph updates as new data becomes available, multi-city generalization, and user feedback integration for driver and commuter relevance.

Tech Stack

Python 3.8+NetworkXNeo4jNode2VecXGBoostLightGBMscikit-learnNumPyPandasMatplotlibSeabornStreamlit511.org APIWZDx API