Sunday, October 11, 2015

Data Management and Visualization - Wesleyan University (Coursera) - Week 2 - Ammar Shigri

CODE

import pandas
import numpy


pandas.set_option('display.float_format',lambda x:'%f'%x)



data = pandas.read_csv('20151009gap.csv', low_memory=False)

print(len(data))            #Number of observations (Rows)
print(len(data.columns))    #Number of Variables (Columns)




data['polityscore'] = data['polityscore'].convert_objects(convert_numeric=True)
data['suicideper100th'] = data['suicideper100th'].convert_objects(convert_numeric=True)
data['employrate'] = data['employrate'].convert_objects(convert_numeric=True)
data['incomeperperson'] = data['incomeperperson'].convert_objects(convert_numeric=True)
data['co2emissions'] = data['co2emissions'].convert_objects(convert_numeric=True)


"""
print("count of similar suiciderate distributed over countries, plus percentage of that value")
c1 = data["suicideper100th"].value_counts(sort=False, dropna = False)
print (c1)

p1 = data["suicideper100th"].value_counts(sort=False, normalize=True)
print (p1)


print("count of similar employment rate distributed over countries, plus percentage of that value")
c2 = data["employrate"].value_counts(sort=False, dropna = False)
print (c2)

p2 = data["employrate"].value_counts(sort=False, normalize=True)
print (p2)


print("count of similar income distributed over countries, plus percentage of that value")
c3 = data["incomeperperson"].value_counts(sort=False, dropna = False)
print (c3)

p3 = data["incomeperperson"].value_counts(sort=False, normalize=True)
print (p3)


print("count of similar political score distributed over countries, plus percentage of that value")
c4 = data["polityscore"].value_counts(sort=False, dropna = False)
print (c4)

p4 = data["polityscore"].value_counts(sort=False, normalize=True)
print (p4)


print("count of similar Co2Emissions distributed over countries, plus percentage of that value")
c5 = data["co2emissions"].value_counts(sort=False, dropna = False)
print (c5)

p5 = data["co2emissions"].value_counts(sort=False, normalize=True)
print (p5)


ct1 = data.groupby("suicideper100th").size()
print (ct1)

pt1 = data.groupby("suicideper100th").size()*100/len(data)
print(pt1)
"""



sub1=data[(data['incomeperperson']<=110000) & (data['polityscore']<=10) & (data['employrate']<=60) & (data['suicideper100th']>=1)]



sub2=sub1.copy()


print('\n\n\n\n\n\n\ncriteria-1 / what is polity score when Employrate is less and equal to 60\n')

print('\n\ncount employrate from table with criteria-1\n')
c6 = sub2["employrate"].value_counts(sort=False)
print (c6)


"""
print('percentage suicideper100th from table with criteria-1')
p6 = sub2["suicideper100th"].value_counts(sort=False, normalize=True)
print (p6)
"""


print('\n\ncount polityscore per person from table with criteria-1\n')
c7 = sub2["polityscore"].value_counts(sort=False)
print (c7)

"""
print('\npercentage polityscore per person from table with criteria-1\n')
p7 = sub2["polityscore"].value_counts(sort=False, normalize=True)
print (p7)
"""

print('\n\ncount suicideper100th from the table with criteria-1\n')
c10 = sub2["suicideper100th"].value_counts(sort=False)
print (c10)





sub3=data[(data['incomeperperson']<=110000) & (data['polityscore']<=10) & (data['employrate']>60) & (data['suicideper100th']>=1)]



sub4=sub3.copy()


print('\n\n\n\n\n\n criteria-2 / what is polity score when Employrate is greater than 60\n')

print('\n\n\ncount employrate from table with criteria-1\n')
c8 = sub4["employrate"].value_counts(sort=False)
print (c8)


print('\n\n\ncount polityscore per person from table with criteria-2\n')
c9 = sub4["polityscore"].value_counts(sort=False)
print (c9)

print('\n\n\ncount suicideper100th from the table with criteria-1\n')
c11 = sub4["suicideper100th"].value_counts(sort=False)
print (c11)















OUTPUT from the code


criteria-1 / what is polity score when Employrate is less and equal to 60
count employrate from table with criteria-1
52.099998    1
57.200001    1
56.299999    2
51.299999    1
58.599998    1
59.299999    1
53.099998    1
58.799999    2
46.900002    1
59.900002    3
58.500000    1
34.900002    1
46.400002    1
54.400002    1
37.400002    1
57.900002    1
39.000000    1
40.099998    1
47.799999    1
42.400002    1
47.099998    1
45.700001    1
46.000000    2
47.299999    3
48.599998    2
49.599998    1
50.500000    1
51.400002    1
52.700001    1
53.400002    2
            ..
59.099998    2
48.700001    1
58.900002    3
49.500000    1
41.099998    1
57.599998    1
56.400002    1
50.900002    2
52.500000    1
51.000000    2
42.799999    1
56.799999    1
57.299999    1
59.000000    1
44.299999    1
41.599998    1
50.700001    1
57.500000    2
53.500000    1
58.200001    2
54.599998    1
51.200001    2
55.099998    1
55.400002    1
59.799999    1
55.900002    2
46.200001    1
59.700001    1
42.000000    1
56.500000    1
Name: employrate, dtype: int64
count polityscore per person from table with criteria-1
0.000000       2
1.000000       1
2.000000       1
3.000000       2
4.000000       2
5.000000       4
6.000000       4
7.000000       5
8.000000      11
9.000000      12
10.000000     22
-10.000000     1
-9.000000      3
-8.000000      1
-7.000000      4
-6.000000      1
-4.000000      4
-3.000000      3
-2.000000      2
Name: polityscore, dtype: int64
count suicideper100th from the table with criteria-1
5.888479     1
1.799904     1
2.206169     1
3.741588     1
4.848770     1
5.931845     1
6.597168     1
7.699330     1
15.542603    1
9.216544     1
10.171870    1
11.213970    1
12.367980    1
13.094370    1
14.776250    1
15.953850    1
16.959240    1
17.032646    1
18.583826    1
16.234370    1
20.317930    1
22.404560    1
6.882952     1
26.874690    1
9.875281     1
3.940259     1
10.059320    1
33.341860    1
35.752872    1
2.816705     1
            ..
7.858619     1
8.021970     1
9.211085     1
8.188375     1
5.838315     1
20.162010    1
10.571910    1
7.060184     1
18.954570    1
13.637060    1
3.563325     1
27.874160    1
19.422610    1
6.265789     1
4.417507     1
6.021882     1
7.745065     1
5.213720     1
12.122269    1
4.119620     1
10.645740    1
12.872222    1
15.714571    1
12.216769    1
13.089616    1
7.214221     1
15.538490    1
6.105282     1
20.369590    1
14.091530    1
Name: suicideper100th, dtype: int64
 criteria-2 / what is polity score when Employrate is greater than 60
count employrate from table with criteria-1
62.299999    1
73.199997    1
71.000000    1
73.099998    1
72.000000    1
68.000000    1
62.400002    1
61.799999    1
75.199997    1
60.400002    2
71.699997    1
61.700001    1
62.700001    1
63.900002    1
63.799999    1
78.199997    2
63.700001    1
63.099998    1
60.900002    1
61.500000    3
63.200001    1
63.500000    1
64.500000    1
65.099998    1
66.199997    1
65.900002    1
68.099998    1
65.000000    3
70.400002    2
71.599998    1
72.800003    1
64.199997    1
65.599998    1
75.699997    1
76.000000    1
77.000000    1
78.900002    1
79.800003    1
80.699997    1
81.300003    1
64.900002    1
83.199997    2
81.500000    1
61.000000    2
60.700001    1
71.800003    1
66.000000    1
64.300003    1
67.300003    1
83.000000    1
68.300003    1
68.900002    1
60.500000    1
66.800003    1
71.300003    1
61.299999    1
Name: employrate, dtype: int64
count polityscore per person from table with criteria-2
0.000000       2
1.000000       2
2.000000       1
4.000000       2
5.000000       3
6.000000       6
7.000000       8
8.000000       7
9.000000       2
10.000000     10
-1.000000      4
-10.000000     1
-8.000000      1
-7.000000      6
-6.000000      1
-5.000000      2
-4.000000      2
-3.000000      2
-2.000000      3
Name: polityscore, dtype: int64
count suicideper100th from the table with criteria-1
1.380965     1
2.034178     1
4.414990     1
5.767406     1
6.057740     1
7.443826     1
8.470030     1
9.873761     1
10.100990    1
11.396111    1
8.973104     1
13.548420    1
14.554677    1
16.913248    1
2.234896     1
11.980497    1
25.404600    1
26.219198    1
9.127511     1
1.658908     1
10.550375    1
2.515721     1
4.777007     1
4.751084     1
10.071942    1
9.847460     1
1.922485     1
12.411181    1
13.905267    1
12.019036    1
            ..
14.547167    1
4.961071     1
11.653322    1
8.204222     1
4.409532     1
8.283071     1
8.913363     1
10.171735    1
8.211067     1
8.164005     1
9.257976     1
10.129350    1
1.392951     1
6.369888     1
13.239810    1
14.713020    1
9.927033     1
11.655210    1
4.907702     1
6.288555     1
4.484753     1
12.179760    1
11.115830    1
14.680936    1
12.289122    1
13.117949    1
7.184853     1
6.811439     1
10.937718    1
14.538357    1
Name: suicideper100th, dtype: int64



Description

Definition of variables
1- Employrate (2007 total employees age 15+ (% of population)
Percentage of total population, age above 15, that has been employed during the given year.
2 - Polityscore (2009 Democracy score (Polity) Overall polity score from the Polity IV dataset, calculated by subtracting an autocracy score from a democracy score. The summary measure of a country's democratic and free nature. -10 is the lowest value, 10 the highest.)
3 - Incomeperperson (2010 Gross Domestic Product per capita in constant 2000 US$. The
inflation but not the differences in the cost of living between countries has been taken into account.)
4 - Co2emissions (2006 cumulative CO2 emission (metric tons), Total amount of CO2
emission in metric tons since 1751. CDIAC (Carbon Dioxide Information Analysis Center)
5 - Suicideper100TH 2005 Suicide, age adjusted, per 100 000 Mortality due to self-inflicted injury, per 100 000 standard population, age adjusted

I am particularly interested in finding out How does Suicide rate relate to employment rate, Polity score, IncomePerPerson and Co2emissions.

From the output of the code I was able to observe one interesting fact about the polity score how it relates to employrate.

As you can see when employrate is less than 60, the polityscore is more inclined towards higher values of Polityscore.
where as when the employrate is higher than 60, the polityscore is more evenly distributed from higher to lower values.

This is an interesting find and as I previously assumed that employrate and polityscore are positively co-related, which may not be the case and needs to be investigated.

My core goal is to find how suicide rate relates to other variables. But by doing frequency tables I was able to identify something I was not initially aware of.

I will do further research as the course progresses to identify interesting facts about the data.




No comments:

Post a Comment