Weather Delay 🚂⛈️

GR
Gianluca Rea

🚂⛈️ Overview

Weather Delay is a machine learning project focused on predicting train delays from historical delay behavior and weather conditions.

The broader idea is simple: if delay predictions become more reliable, rail travel becomes easier to trust, and that can support a shift toward more sustainable transportation.

Why I built it

Delays are one of the main pain points for train users. I wanted to explore whether weather signals, combined with operational delay patterns, could improve predictive quality enough to be practically useful.

The project aims to:

  • Improve delay estimates for passengers
  • Support operator-side planning decisions
  • Increase trust in rail as an alternative to car travel
  • Contribute to sustainability-oriented mobility strategies

🦾 Repository

Code and notebooks: github.com/gianlucarea/weather-delay

📊 Methodology

The study evaluates both classification and regression pipelines:

  1. Classification: Predict if a train will be delayed beyond a threshold
  2. Regression: Predict the exact delay duration

Both approaches model relationships between:

  • Departure delay (key historical indicator)
  • Weather conditions at the time of travel
  • Route and temporal factors

Modeling workflow

  1. Collect and align train and weather datasets.
  2. Clean and preprocess temporal and route-level features.
  3. Engineer predictive features from delay and weather signals.
  4. Train and compare classification and regression models.
  5. Evaluate performance tradeoffs and feature impact.

📈 Data Sources

The project combines multiple data providers:

🛠️ Tech Stack

  • Python: Core implementation language
  • Pandas: Data manipulation and analysis
  • NumPy: Numerical computations
  • Scikit-Learn: Machine learning algorithms and evaluation
  • Jupyter Notebooks: Interactive data analysis and experimentation

🚀 Project Structure

Main analysis artifact:

  • 02-Data Analytics: Main data analysis and model development notebook
    • Data loading and exploration
    • Feature engineering from weather and delay data
    • Model training and evaluation
    • Comparison of classification and regression approaches

Note: data files must be downloaded from the listed sources before preprocessing and training.

📝 Research Focus

  • Comparing different classification algorithms for delay prediction
  • Evaluating regression models for delay duration estimation
  • Feature importance analysis of weather and temporal factors
  • Model performance trade-offs between accuracy and simplicity

What I learned

  • How weather and operational signals interact in delay prediction tasks
  • How to frame the same domain problem as both classification and regression
  • How to communicate model tradeoffs clearly in notebook-based experiments

🤝 Contributing

Contributions are welcome. If you want to improve data preparation, model evaluation, or documentation clarity, feel free to open a pull request.

Suggested contribution flow:

  1. Fork the repository
  2. Create a new branch for your feature or bugfix
  3. Commit your changes and push them to your fork
  4. Submit a pull request with a detailed description of your changes

Please ensure that your code follows the project's coding standards and includes appropriate documentation.

📜 License

This project is licensed under the MIT License. See the LICENSE file for more details.