HIGH SCHOOL ENROLLMENT PROJECTIONS
Time SeriesLinear RegressionRidge RegularizationARIMAHybrid ModelTime Series Cross-Validation
CONTEXT AND OBJECTIVE
Forecasting the number of students who will enroll in high schools 15 years from now is crucial for the government. High school attendance is continually increasing, necessitating the regular construction of new schools to accommodate the demand. Since building a new school takes 10 to 15 years in Switzerland, from the project phase to actual construction, enrollment forecasts up to 15 years in the future are essential for effective planning.
Based on historical enrollment data and using a standard statistical approach, reliable forecasts could only be generated for the next 10 years, i.e., up to 2033 at the time the project was realized.
The goal of this project was to develop a machine learning model capable of providing projections up to 2040, thereby filling the gap in long-term forecasts.
WHAT WAS DONE
Using Python’s statsmodel and scikit-learn packages, the initial step involved identifying a model capable of learning the general trend from the historical data and forecasts provided by the reference statistical method. Various linear regression model variants were tested, and the best results were achieved using linear and quadratic time dependency (time steps) features. Ridge regression was incorporated to prevent overfitting.
In the second step, an ARIMA model was trained on the residuals, resulting in a final hybrid model that combined the predictions of the linear and ARIMA models.
The hybrid model’s ability to generalize and provide reliable predictions was assessed using time series cross-validation (TSCV).