Covid 19 — A trend forecast. Italy case

DP
3 min readApr 4, 2020
Photo by Jonathan Bean on Unsplash

Italy was hit by COVID-19, a novel coronavirus, in second half-February 2020 and has been able to rapidly flatten the curve in the next weeks using two different approach based on:

  • First Strategy. A quarantine/Isolation Protocols in small city / areas
  • Second Strategy (applied only from 3 March 2020) A social distancing and a complete shutdown of all commercial and retail econony

For Italy it is the first time we are facing with an enemy so hard to isolate and manage.

The aim of this analysis is to build a forecast of the deadth’s trend starting from the official data get from Johns Hopkins University Center public repository

Data Acquisition

First things first we’ve got to pull the data down. Here is the link that you’ll want to pass into the read_csv function:

This link is from the John Hopkins University GitHub page where they are providing updated data, so that data can be easly managed by awgetfunction:

urls = [‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv',‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv']for url in urls:filename = wget.download(url)

Aggregation and Lagging

When data are acquired the next step is to lag them by the following interval:

  1. Lag 1: to check the yesterday situation
  2. Lag 7: to extract the middle period situation
  3. Lag 15: to monitor the incubation period. A relevant information on that period will mean that the above mentioned strategy are effective or not.
df.sort_values(by=[‘Country_Region’,’Date’], inplace=True)df[‘Deaths_Lag_1’] = df.groupby([‘Country_Region’])[‘Deaths’].diff().fillna(0)df[‘Deaths_Lag_7’] = df.groupby([‘Country_Region’])[‘Deaths’].diff(7).fillna(0)df[‘Deaths_Lag_15’] = df.groupby([‘Country_Region’])[‘Deaths’].diff(15).fillna(0)

Plotting the data:

log_flag = Falsefig = plt.figure(figsize=(15,7))
fig.suptitle(‘Death number by Lag’)
ax1 = fig.add_subplot(411)
ax2 = fig.add_subplot(412)
ax3 = fig.add_subplot(413)
ax4 = fig.add_subplot(414)
df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths’].plot(ax=ax1, color=’SkyBlue’, label=’Lag 0',logy=log_flag);df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths_Lag_1’].plot(ax=ax2, color=’Orange’, label=’Lag 1',logy=log_flag);df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths_Lag_7’].plot(ax=ax3, color=’RoyalBlue’, label=’Lag 7',logy=log_flag);df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths_Lag_15’].plot(ax=ax4, color=’Magenta’, label=’Lag 15',logy=log_flag);fig.legend(labels=[‘Lag 0’,’Lag 1',’Lag 7',’Lag 15'],loc=”center left”);
Italy death trend

Until now we can see a cheering trend that is similar to a bell, meaning that the virus has completed its fatal force.

Forecasting

In order to confirm the sentence below we can process data through Prophet package.

To be noted that the number of death as been pre-processed by a log function in oder to made the algorithm more sensitive to the differences.

df_IT = df.query(‘Country_Region == “Italy”’)
df_IT_L15=df_IT[[‘Date’,’Deaths_Lag_15']]
df_IT_L7.tail(3)# Use data lagged by 15
d = df_IT_L15
# Prepare the data set as Prophet requires
df_p = pd.DataFrame()
df_p[‘ds’] = pd.to_datetime(d[‘Date’])
df_p[‘y’] = d.iloc[:,1]
# Log
df_p[[‘y’]] = np.log(df_p[[‘y’]].replace(0, np.nan))
prophet = Prophet()
prophet.fit(df_p)
future = prophet.make_future_dataframe(periods=30, freq='D')
forecast = prophet.predict(future)
fig = prophet.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), prophet, forecast)

Forecast made is a projection on 30 days starting from data lagged by 15 days (incubation period).

Lag 15 forecast for next 30 days

This chart depict a really interesting situation where forecast confidence interval seems to suggest a bell.

Unfortunately this curve is not complited and set because the shape is not so yet bened at the time we are to be more accurate.

Conclusion

Italy is facing with a really hard challenge where people are oblidged to stay at home and shutdown should jeopardize several activities.

But according to the forecast this sacrifice seems to be effective.

My prevision is that next days — saying next week — should be the most relevant in terms of define the definitive trend in data.

--

--