Weather Delay 🚂⛈️
🚂⛈️ Overview
A machine learning research project studying algorithms to predict train delays based on departure delay and weather conditions. The goal is to improve the attractiveness of rail travel over cars to reduce carbon footprint.
🎯 Project Motivation
Understanding and predicting train delays is crucial for improving passenger satisfaction and modal shift toward sustainable transportation. By combining weather data with historical delay information, we can develop models that help:
- →Provide better delay predictions to passengers
- →Help operators optimize scheduling
- →Encourage rail travel as a reliable alternative to cars
- →Reduce carbon footprint through sustainable transportation
🦾 Repository
See the code: Weather Delay Repository
📊 Methodology
This study explores classification and regression algorithms to:
- →Classification: Predict if a train will be delayed beyond a threshold
- →Regression: Predict the exact delay duration
Both approaches analyze the relationship between:
- →Departure delay (key historical indicator)
- →Weather conditions at the time of travel
- →Route and temporal factors
📈 Data Sources
The project utilizes data from:
- →Weather Data: Open-Meteo - Free weather API with historical data
- →Train Data: Tuscany Train Feed - Regional train schedules and operations
- →Delay Information: Italian Train Delay Data - Historical delay records
🛠️ Tech Stack
- →Python: Core implementation language
- →Pandas: Data manipulation and analysis
- →NumPy: Numerical computations
- →Scikit-Learn: Machine learning algorithms and evaluation
- →Jupyter Notebooks: Interactive data analysis and experimentation
🚀 Project Structure
Files and their purposes:
- →02-Data Analytics: Main data analysis and model development notebook
- →Data loading and exploration
- →Feature engineering from weather and delay data
- →Model training and evaluation
- →Comparison of classification and regression approaches
Note: Data files require downloading from the sources above before preprocessing.
📝 Research Focus
- →Comparing different classification algorithms for delay prediction
- →Evaluating regression models for delay duration estimation
- →Feature importance analysis of weather and temporal factors
- →Model performance trade-offs between accuracy and simplicity
🤝 Contributing
Contributions are welcome! Please follow these steps:
- →Fork the repository
- →Create a new branch for your feature or bugfix
- →Commit your changes and push them to your fork
- →Submit a pull request with a detailed description of your changes
Please ensure that your code follows the project's coding standards and includes appropriate documentation.
📜 License
This project is licensed under the MIT License. See the LICENSE file for more details.