Tag: ARIMA

Time Series Model for Birmingham Parking Occupancy Using Python and ARIMA Part 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. The Birmingham Parking Occupancy dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the hourly number of parking occupancy for a parking facility in Birmingham. The dataset describes a time-series of parking occupancy over three months between October 2016 and December 2016, and there are 1834 hourly observations. We used the first 90% of the observations for training various models while holding back the remaining observations for validating the final model.

In this Part 1 iteration, we will train and validate the model using just one facility, BHMBCCMKT01, within the dataset.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 46. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (2, 0, 1) with the seasonal order (2, 0, 0, 24). Furthermore, the chosen model processed the validation data with an RMSE of 22, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using ARIMA for further modeling.

Dataset Used: Parking Birmingham Data Set

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham

The HTML formatted report can be found here on GitHub.

Time Series Model for Housing Starts in the USA Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Housing Starts in the USA dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly total number of housing starts in the USA. Housing start occurs when excavation begins for the footings or foundation of a building. The dataset describes a time-series of housing starts (thousands of units) over 30 years (1959-2020), and there are 739 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 9.705. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 0, 4) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 8.763, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Housing Starts: Total: New Privately Owned Housing Units Started, U.S. Census Bureau and U.S. Department of Housing and Urban Development, Housing Starts: Total: New Privately Owned Housing Units Started [HOUSTNSA], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/HOUSTNSA, August 23, 2020.

Dataset ML Model: Time series forecast with numerical attribute

The HTML formatted report can be found here on GitHub.

Time Series Model for Private Housing Permits for California Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Private Housing Permits for California dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly total number of building permits for all structure types for the state of California. The dataset describes a time-series of permits issued over 30 years (1991-2020), and there are 354 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 2153. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (0, 1, 1) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 1486, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Monthly New Private Housing Units Authorized by Building Permits for California

Dataset ML Model: Time series forecast with numerical attribute

U.S. Census Bureau, New Private Housing Units Authorized by Building Permits for California [CABPPRIV], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/CABPPRIV, August 23, 2020.

The HTML formatted report can be found here on GitHub.

Time Series Model for Metro Bus Ridership Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Metro Bus Ridership dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly number of bus riders for the Los Angeles County Metro district. The dataset describes a time-series of bus riders between January 2009 and June 2020, and there are 138 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 2.480 million. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 1, 2) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 2.397 million, which was just slightly better than the baseline model.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Metro Interactive Estimated Ridership Stats

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: http://isotp.metro.net/MetroRidership/Index.aspx

The HTML formatted report can be found here on GitHub.

Time Series Model for Exports of Goods for California Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Exports of Goods for California dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly exports of manufactured and non-manufactured commodities for the state of California. The dataset describes a time-series of exports of goods (in millions of dollars) over 25 years (1995-2020), and there are 298 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 1101. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (0, 1, 3) with the seasonal order being (1, 0, 1, 12). Furthermore, the chosen model processed the validation data with an RMSE of 724, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Monthly Exports of Goods for California

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: U.S. Census Bureau, Exports of Goods for California [EXPTOTCA], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/EXPTOTCA, August 2, 2020.

The HTML formatted report can be found here on GitHub.