Weather Delay 🚂⛈️
🚂⛈️ Overview
Weather Delay is a machine learning project focused on predicting train delays from historical delay behavior and weather conditions.
The broader idea is simple: if delay predictions become more reliable, rail travel becomes easier to trust, and that can support a shift toward more sustainable transportation.
Why I built it
Delays are one of the main pain points for train users. I wanted to explore whether weather signals, combined with operational delay patterns, could improve predictive quality enough to be practically useful.
The project aims to:
- →Improve delay estimates for passengers
- →Support operator-side planning decisions
- →Increase trust in rail as an alternative to car travel
- →Contribute to sustainability-oriented mobility strategies
🦾 Repository
Code and notebooks: github.com/gianlucarea/weather-delay
📊 Methodology
The study evaluates both classification and regression pipelines:
- →Classification: Predict if a train will be delayed beyond a threshold
- →Regression: Predict the exact delay duration
Both approaches model relationships between:
- →Departure delay (key historical indicator)
- →Weather conditions at the time of travel
- →Route and temporal factors
Modeling workflow
- →Collect and align train and weather datasets.
- →Clean and preprocess temporal and route-level features.
- →Engineer predictive features from delay and weather signals.
- →Train and compare classification and regression models.
- →Evaluate performance tradeoffs and feature impact.
📈 Data Sources
The project combines multiple data providers:
- →Weather Data: Open-Meteo - Free weather API with historical data
- →Train Data: Tuscany Train Feed - Regional train schedules and operations
- →Delay Information: Italian Train Delay Data - Historical delay records
🛠️ Tech Stack
- →Python: Core implementation language
- →Pandas: Data manipulation and analysis
- →NumPy: Numerical computations
- →Scikit-Learn: Machine learning algorithms and evaluation
- →Jupyter Notebooks: Interactive data analysis and experimentation
🚀 Project Structure
Main analysis artifact:
- →02-Data Analytics: Main data analysis and model development notebook
- →Data loading and exploration
- →Feature engineering from weather and delay data
- →Model training and evaluation
- →Comparison of classification and regression approaches
Note: data files must be downloaded from the listed sources before preprocessing and training.
📝 Research Focus
- →Comparing different classification algorithms for delay prediction
- →Evaluating regression models for delay duration estimation
- →Feature importance analysis of weather and temporal factors
- →Model performance trade-offs between accuracy and simplicity
What I learned
- →How weather and operational signals interact in delay prediction tasks
- →How to frame the same domain problem as both classification and regression
- →How to communicate model tradeoffs clearly in notebook-based experiments
🤝 Contributing
Contributions are welcome. If you want to improve data preparation, model evaluation, or documentation clarity, feel free to open a pull request.
Suggested contribution flow:
- →Fork the repository
- →Create a new branch for your feature or bugfix
- →Commit your changes and push them to your fork
- →Submit a pull request with a detailed description of your changes
Please ensure that your code follows the project's coding standards and includes appropriate documentation.
📜 License
This project is licensed under the MIT License. See the LICENSE file for more details.