Italy was hit by COVID-19, a novel coronavirus, in second half-February 2020 and has been able to rapidly flatten the curve in the next weeks using two different approach based on:
- First Strategy. A quarantine/Isolation Protocols in small city / areas
- Second Strategy (applied only from 3 March 2020) A social distancing and a complete shutdown of all commercial and retail econony
For Italy it is the first time we are facing with an enemy so hard to isolate and manage.
The aim of this analysis is to build a forecast of the deadth’s trend starting from the official data get from Johns Hopkins University Center public repository
Data Acquisition
First things first we’ve got to pull the data down. Here is the link that you’ll want to pass into the read_csv
function:
This link is from the John Hopkins University GitHub page where they are providing updated data, so that data can be easly managed by awget
function:
urls = [‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv',‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv']for url in urls:filename = wget.download(url)
Aggregation and Lagging
When data are acquired the next step is to lag them by the following interval:
- Lag 1: to check the yesterday situation
- Lag 7: to extract the middle period situation
- Lag 15: to monitor the incubation period. A relevant information on that period will mean that the above mentioned strategy are effective or not.
df.sort_values(by=[‘Country_Region’,’Date’], inplace=True)df[‘Deaths_Lag_1’] = df.groupby([‘Country_Region’])[‘Deaths’].diff().fillna(0)df[‘Deaths_Lag_7’] = df.groupby([‘Country_Region’])[‘Deaths’].diff(7).fillna(0)df[‘Deaths_Lag_15’] = df.groupby([‘Country_Region’])[‘Deaths’].diff(15).fillna(0)
Plotting the data:
log_flag = Falsefig = plt.figure(figsize=(15,7))
fig.suptitle(‘Death number by Lag’)ax1 = fig.add_subplot(411)
ax2 = fig.add_subplot(412)
ax3 = fig.add_subplot(413)
ax4 = fig.add_subplot(414)df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths’].plot(ax=ax1, color=’SkyBlue’, label=’Lag 0',logy=log_flag);df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths_Lag_1’].plot(ax=ax2, color=’Orange’, label=’Lag 1',logy=log_flag);df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths_Lag_7’].plot(ax=ax3, color=’RoyalBlue’, label=’Lag 7',logy=log_flag);df.query(‘Country_Region == “Italy”’).groupby([‘Date’]).sum()[‘Deaths_Lag_15’].plot(ax=ax4, color=’Magenta’, label=’Lag 15',logy=log_flag);fig.legend(labels=[‘Lag 0’,’Lag 1',’Lag 7',’Lag 15'],loc=”center left”);
Until now we can see a cheering trend that is similar to a bell, meaning that the virus has completed its fatal force.
Forecasting
In order to confirm the sentence below we can process data through Prophet
package.
To be noted that the number of death as been pre-processed by a log function in oder to made the algorithm more sensitive to the differences.
df_IT = df.query(‘Country_Region == “Italy”’)
df_IT_L15=df_IT[[‘Date’,’Deaths_Lag_15']]df_IT_L7.tail(3)# Use data lagged by 15
d = df_IT_L15# Prepare the data set as Prophet requires
df_p = pd.DataFrame()
df_p[‘ds’] = pd.to_datetime(d[‘Date’])
df_p[‘y’] = d.iloc[:,1]# Log
df_p[[‘y’]] = np.log(df_p[[‘y’]].replace(0, np.nan))prophet = Prophet()
prophet.fit(df_p)
future = prophet.make_future_dataframe(periods=30, freq='D')
forecast = prophet.predict(future)
fig = prophet.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), prophet, forecast)
Forecast made is a projection on 30 days starting from data lagged by 15 days (incubation period).
This chart depict a really interesting situation where forecast confidence interval seems to suggest a bell.
Unfortunately this curve is not complited and set because the shape is not so yet bened at the time we are to be more accurate.
Conclusion
Italy is facing with a really hard challenge where people are oblidged to stay at home and shutdown should jeopardize several activities.
But according to the forecast this sacrifice seems to be effective.
My prevision is that next days — saying next week — should be the most relevant in terms of define the definitive trend in data.