Zero Hunger Time Series Machine Learning

Food Demand Forecasting for Zero Hunger Initiatives

Compared LSTM and ARIMA time-series models on FAOSTAT food consumption, income, and population data from 50 African countries to support data-driven planning for the UN Sustainable Development Goal 2 – Zero Hunger.

2025 · Msc Data Science - Dissertation

Domain: Time Series Forecasting · Food Security

Outcome: LSTM outperformed ARIMA (RMSE 0.221 vs 0.298; MAE 0.173 vs 0.246)

Overview

Project summary

This dissertation investigates how modern time-series models can improve food demand forecasting to reduce food waste and hunger. Focusing on Africa, I used multivariate data – historical per-capita food consumption, Gross National Income (GNI), and population – from FAOSTAT (2019–2022) for 50 countries to build and compare LSTM and ARIMA forecasting models.

The goal was not only to forecast future demand more accurately, but to understand how macro-level indicators influence consumption and how these insights can help governments and organisations allocate food more efficiently in support of SDG 2.

Problem

Research aim & objectives

Traditional food demand forecasts often ignore socioeconomic drivers such as income and population, making it harder to match production with real needs and avoid waste. This project set out to:

Accurately forecast regional food demand for Africa using time-series models.
Integrate macro-level indicators (income, population) with historical consumption.
Compare machine-learning-based LSTM with statistical ARIMA on predictive performance.
Provide actionable insights for policymakers and organisations working on Zero Hunger initiatives.

Approach

Data, methodology & modelling

Data collection: Retrieved secondary data from FAOSTAT’s “Suite for Food Security Indicators”, “Macro-Economic Indicators” and “Population and Employment” for 2019–2022, covering 50 African countries and producing 200 country–year observations.
EDA & correlation analysis: Conducted exploratory data analysis, plotted consumption distributions and scatterplots (e.g., consumption vs GNI), and computed a correlation matrix. Consumption showed a weak positive correlation with GNI (~0.27) and almost no correlation with population (~–0.0007).
Preprocessing: Cleaned and merged FAOSTAT datasets into a single time-series-ready dataframe, handled missing values, and normalised features using MinMaxScaler to stabilise training.
LSTM model: Built a univariate/multivariate LSTM in TensorFlow/Keras with 50 units, ReLU activation, dropout 0.2, Adam optimizer and MSE loss. Trained for 30 epochs with batch size 32 using an 80/20 train–test split.
ARIMA model: Tested stationarity using the Augmented Dickey–Fuller test, then iterated ARIMA(p,d,q) configurations using ACF/PACF plots, selecting ARIMA(4,1,0) based on significance and AIC/BIC values.
Regression analysis: Fitted an OLS regression of consumption on GNI to quantify the direct income–demand relationship (coefficient 0.2903, R² = 0.074), confirming that income matters but does not fully explain consumption patterns.
Evaluation: Compared models using MAE and RMSE on the test set and visually inspected actual vs predicted series plots to assess how well each model tracked turning points. \

Results

Key insights & outcomes

LSTM vs ARIMA: LSTM clearly outperformed ARIMA with RMSE 0.221 vs 0.298 and MAE 0.173 vs 0.246, validating the hypothesis that a nonlinear ML model is better suited to this multivariate food demand problem.
Nonlinear relationships: The LSTM model captured complex shifts in consumption that ARIMA missed, especially where changes in income and population interacted with existing consumption trends.
Income effects: OLS regression showed a statistically significant but modest positive relationship between GNI and consumption (p < 0.001, R² = 0.074), indicating that other factors (culture, geography, self-production, aid) also play a major role.
Policy relevance: Forecasts generated by the LSTM model can help anticipate future demand patterns and inform food distribution and stock planning, reducing overproduction in some regions and shortages in others.
Data limitations: The limited time window (2019–2022) and incomplete coverage for some African countries highlight the need for richer, more granular data ecosystems to further improve model performance and impact.

Reflection

What I learned

This project strengthened skills in time-series forecasting: from sourcing and cleaning macro-level data, to EDA, building LSTM and ARIMA models, tuning them, and evaluating them with appropriate metrics.

More importantly, it taught me how to connect technical model outputs to real-world sustainability goals. Translating RMSE and MAE improvements into concrete guidance for Zero Hunger initiatives helped me practice on ways to communicate with non-technical stakeholders – a skill I want to keep applying in future data-for-policy roles.