Knowledge Graph-Enhanced Traffic Optimization System for the Bay Area
Anshu Reddy Dhamana, Basanth Periyapatna Roopa Kumar, Manav Rajesh Anandani, Nischitha Nagendran, Nitya Rondla, Srithareddy Devireddy, Vinuthna Papana
Department of Applied Data Science · San Jose State University · DATA 266: Generative AI
Abstract
This paper introduces a knowledge graph-based traffic optimization system for The Bay Area. Traditionally, traffic forecasting models have difficulties fusing up with various data and expressing intricate spatiotemporal relations among traffic entities. We overcome this limitation with the establishment of a wide-ranging knowledge graph framework that captures semantic associations amongst traffic junctions, roads stretches, weather circumstances, incidents and events.
Our method uses traffic meter data, weather info, incident reports, and event schedules into a unified knowledge graph, allowing us to capture more feature-rich and relationship-based information. The experimental results show that adding knowledge graph semantics achieves higher prediction accuracy, with the "connects_to" relationship resulting in the best prediction (92% accuracy). Our system provides actionable knowledge into how various consequents impacts traffic beats on weekdays in contrast to weekends, with usefulness to traffic operations and city arranging.
Knowledge graphs offer a viable option for modeling complex systems because they organize entities and their relationships in a systematic and meaningful manner — capturing both semantic context and the intricate interconnections within transportation systems.
Peak Prediction Accuracy
92%
Accuracy Improvement (KG vs Baseline)
+8–15%
RMSE Reduction
12–20%
Graph Nodes
550+
Edge / Relationship Types
6
Graph Embedding Dimension
64
System Architecture
Fig 1 — Knowledge Graph-Enhanced Traffic Optimization System
Traffic Data
traffic.csv · historical patterns
Commute Data
bay_area_9_all_commutes
Road Metadata
ca_meta.csv
Real-Time Traffic
511.org API · WZDx · Toll Data
Events & Incidents
Traffic Events API · JSON
Knowledge Graph (NetworkX → Neo4j)
Traffic Junctions
Intersections · highways
Road Segments
Sections between junctions
Weather Conditions
clear · rain · fog
Incidents
accidents · construction
Events
Special area events
Time Periods
Hour of day · day of week
Edge Types
Node2Vec Embeddings
dim=64 · walk=30 · walks/node=200
Graph Feature Vectors
Semantic + structural node representation
Temporal Features
Hour · Day · Junction ID
Classical ML Features
Vehicle count · speed · weather
KG-Enhanced ML Pipeline
Random Forest
Nonlinear boundaries
XGBoost
Grid search tuned
LightGBM
Memory efficient
Linear / Ridge
Baseline models
KNN Regressor
Local patterns
Stacking Ensemble
Meta-model boosting
Traffic Predictions
92% accuracy with connects_to
Streamlit Dashboard
Route optimizer · junction explorer · forecasts
Data Collection & Preprocessing
Traffic & Commute Data
The primary traffic dataset (traffic.csv) includes data on traffic volume (vehicle counts), speed, and junction identification. Temporal variables such as hour of day and day of week were extracted for temporal pattern analysis. Commute data from bay_area_9_all_commutes_names.csv provided route-level linkages between road segments and popular commuting pathways, enhancing the knowledge graph with route-specific information.
Real-Time API Sources
Dynamic traffic conditions were captured via three 511.org API feeds: Work Zones Data (WZDx API — construction sites and roadwork), Traffic Events API (incidents, accidents, and traffic-related occurrences), and the Toll Data API (toll rates and affected road segments). API responses were saved as JSON files and included in the graph construction. California road network metadata (ca_meta.csv) provided structural information about road segments, junction kinds, lane counts, and speed limits.
Knowledge Graph Construction
Node Types
The knowledge graph includes six node types: Traffic Junctions (intersections and highway interchanges), Road Segments (road sections between junctions), Weather Conditions (clear, rain, fog), Incidents (accidents, construction), Events (special area events), and Time Periods (times of day and week). The graph was built using NetworkX, a Python graph analysis library, and exported to Neo4j for sophisticated querying and visualization.
Knowledge Graph Relationship Types and Prediction Accuracy
| Relationship | Description | Prediction Accuracy |
|---|---|---|
| connects_to | Road segments ↔ intersections | 92% |
| affected_by_incident | Junctions/segments ↔ incidents | 90% |
| part_of_route | Road segments ↔ common routes | 89% |
| affected_by_weather | Intersections ↔ weather states | 87% |
| located_near | Proximity between entities | 85% |
Node2Vec Embeddings
Graph structures were converted into numerical vector representations using node2vec, a popular graph embedding algorithm that extends the word2vec approach to graphs. Node2vec uses biased random walks to explore the neighborhood of each node, balancing between breadth-first and depth-first exploration strategies. Key parameters: embedding dimension 64, walk length 30, number of walks per node 200, context window size 10, minimum count 1.
A fallback embedding strategy ensures resilience when node2vec is unavailable — structured random embeddings with type-specific first-5-dimension signatures maintain semantic coherence for each node type (intersections, road segments, weather, time periods).
Machine Learning Models
Ensemble of Regression Models
Feature inputs comprised both typical traffic features (junction ID, time, and day of week) and knowledge graph node embeddings (64 dimensions), allowing the models to use both explicit features and graph relationship data. Five model families were evaluated: Random Forest Regressor, XGBoost (grid search-tuned), LightGBM, Linear/Ridge Regression baselines, KNN Regressor, and a Stacking Ensemble meta-model.
The AdvancedTrafficPredictionModel and KnowledgeGraphEnhancedModel classes implemented standard and categorical feature preprocessing using scikit-learn pipelines, temporal reliability split for training/evaluation, feature importance analysis, and comparative ablation between baseline and KG-enhanced models.
Evaluation Framework
Evaluation Metrics Used
| Metric | Formula | Purpose |
|---|---|---|
| R² Score | 1 - SS_res/SS_tot | Variance explained by model |
| RMSE | √(mean((y_pred−y_true)²)) | Forecast error magnitude |
| MAE | mean(|y_pred−y_true|) | Average absolute error |
Results
Impact of Knowledge Graph Relationships
One of the research's significant discoveries is that diverse knowledge graph relationships have distinct effects on prediction accuracy. The connects_to relationship yielded the highest prediction accuracy (92%), followed by affected_by_incident (90%), part_of_route (89%), affected_by_weather (87%), and located_near (85%). This shows that topological connectedness between junctions and road segments is the most useful feature in traffic forecasting, followed by incident data.
External Factors Analysis
Our investigation compared the impact of external influences on traffic patterns during weekdays and weekends. Weather conditions (particularly rain and snow) have a more significant impact on weekday traffic. Special events have a more pronounced effect on weekend traffic. Construction activities affect weekday traffic more heavily. Accidents impact both but with different temporal patterns.
Performance vs. Baseline Methods
The knowledge graph-enhanced strategy successfully beat baseline approaches in all parameters, with accuracy increases ranging from 8% to 15% and RMSE reductions of 12% to 20%. This highlights the importance of using knowledge graph data for traffic prediction tasks — the semantic links between traffic components provide information that standard feature engineering cannot replicate.
Streamlit Dashboard
A Streamlit online application enables users to optimize traffic using the knowledge graph and prediction models through an interactive interface. Key features: Optimal Route Finding (start/end junction selection with congestion-aware recommendations), Temporal Optimization (best departure times based on past and expected traffic patterns), Traffic Level Forecasting (color-coded vehicle count visualizations across the day), and an Interactive Junction Explorer (heatmap interface showing traffic flow patterns and junction connectivity strength).
Conclusion
This study describes a knowledge graph-enhanced traffic optimization system for the Bay Area that combines multiple data sources into a single framework for traffic prediction and analysis. The method uses the semantic links between traffic components contained in a knowledge graph to increase prediction accuracy and provide interpretable insights into traffic dynamics. The experimental results showed that the knowledge graph technique consistently outperformed traditional methods, with the connects_to relationship contributing the most to prediction accuracy.
Future work will address scalability for very large transit networks, temporal dynamics via graph neural networks (GNNs), real-time knowledge graph updates as new data becomes available, multi-city generalization, and user feedback integration for driver and commuter relevance.
Tech Stack