欧洲cdc数据分析python 程序

下面是分析欧洲cdc 的corid2019 下载json 数据的python 分析代码。

欧洲cdc 数据可以去数据下载链接 下载。但我这个程序是实时下载的,就是程序运行时下载的。

1:下载数据,或者导入文件

2:格式化数据,就是转化cases deaths这些数据为int

3:累加cases, deaths

4:按总cases 排序,大的在前

55:取20个排序在前国家,

6:分析最近20天的数据

"""
Created on Mon Apr 13 18:18:19 2020

@author: liwenz
"""

import json
from datetime import datetime, timedelta
import pandas as pd
import requests
 
#with open('corid2019.json','r') as f:
#    data2=json.load(f)
    
url='https://opendata.ecdc.europa.eu/covid19/casedistribution/json'
r = requests.request('GET', url)
data = r.json()

df = pd.DataFrame.from_records(data['records'])
df['day'] =df['day'].astype("int")
df['month'] =df['month'].astype("int")
df['year'] =df['year'].astype("int")
df['cases'] =df['cases'].astype("int")
df['deaths'] =df['deaths'].astype("int")
df['date']=df['dateRep'].apply(lambda x:datetime.strptime(x, "%d/%m/%Y")) 

today=datetime.now()
print(today.day,today.month,today.year)
daybefore20=today-timedelta(days=20)
day1=daybefore20.day
month1=daybefore20.month
year1=daybefore20.year
print(daybefore20)

df1=df[df['date']>daybefore20]
df2=df.groupby('countryterritoryCode').agg(
        country=('countriesAndTerritories','last'),
        sumcase=('cases',sum),
        sumdeath=('deaths',sum),
        popu=('popData2019','last'))
df2.sort_values(by=['sumdeath'], inplace=True, ascending=False)
a=df2.head(20)
b=a.index.values.tolist()
i=0;
for x in b:
    i=i+1
    print(i, a.loc[x,'country'],a.loc[x,'sumcase'],a.loc[x,'sumdeath'],a.loc[x,'popu'])
    tmp=df1[df1['countryterritoryCode']==x]
    tmp.sort_values(by=['date'], inplace=True, ascending=False)
    print(tmp.loc[:,['dateRep','cases','deaths']].to_string(index=False))
   

这个欧洲cdc 的数据表头更改了,原来popData2018更改为popData2019,表头信息如下:

df.info()

RangeIndex: 25726 entries, 0 to 25725
Data columns (total 12 columns):
dateRep 25726 non-null object
day 25726 non-null int32
month 25726 non-null int32
year 25726 non-null int32
cases 25726 non-null int32
deaths 25726 non-null int32
countriesAndTerritories 25726 non-null object
geoId 25726 non-null object
countryterritoryCode 25662 non-null object
popData2019 25564 non-null float64
continentExp 25726 non-null object
date 25726 non-null datetime64[ns]
dtypes: datetime64ns, float64(1), int32(5), object(5)
memory usage: 1.9+ MB

发表评论

电子邮件地址不会被公开。 必填项已用*标注