Pandas: Heroes Of Pymoli - Analysis Report
Observed Trend 1: Mourning Blade is the most profitable item for file 2, Retribution Axe is the most profitable item for file 1
Observed Trend 2: most people only buy 1 item and male is the main buyer for both file 1 & 2
Observed Trend 3: age group(20-24) consumed the most items for file 1 & 2
# Read the JSON file
fpath = input ("Please choice file (1) or file (2)?" )
if (fpath == "1" ):
JS = pd .read_json ("purchase_data.json" )
elif (fpath == "2" ):
JS = pd .read_json ("purchase_data2.json" )
else :
print ("no such file" )
JS .head ()
ee = JS .to_excel ("test" + fpath + ".xlsx" ,"Sheet1" )
Please choice file (1) or file (2)?1
# Total Number of Players
Player_Count = pd .DataFrame ([{"Total Players" :JS ["SN" ].nunique ()}])
Player_Count
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Purchasing Analysis (Total)
#Number of Unique Items
UItem = JS ["Item ID" ].nunique ()
#Average Purchase Price
AvePur = JS ["Price" ].mean ()
#Total Number of Purchases
NPur = JS ["Price" ].count ()
#Total Revenue
TRev = JS ["Price" ].sum ()
PurAnsys = pd .DataFrame ([{"Number of Unique Items" :UItem , "Average Price" :"$" + str (round (AvePur ,2 )), "Number of Purchases" :NPur , "Total Revenue" :"$" + str (round (TRev ,2 ))}])
PurAnsys = PurAnsys [["Number of Unique Items" ,"Average Price" ,"Number of Purchases" ,"Total Revenue" ]]
PurAnsys
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Number of Unique Items
Average Price
Number of Purchases
Total Revenue
0
183
$2.93
780
$2286.33
TotalC = JS ["Gender" ].count ()
MaleC = JS ["Gender" ].value_counts ()
MaleP = round (MaleC / TotalC * 100 ,2 )
# Creating a new DataFrame using both duration and count
GenderDemo = pd .DataFrame ({"Percentage of players" :MaleP ,"Total Count" :MaleC })
GenderDemo
#Alternative Method
#TotalC = JS["Gender"].count()
#MaleC =pd.DataFrame(JS["Gender"].value_counts())
#MaleP =pd.DataFrame(round(MaleC / TotalC*100,2))
#MaleC = MaleC.reset_index().rename(columns={'index':'Gender','Gender':'Total Count'})
#MaleP = MaleP.reset_index().rename(columns={'index':'Gender','Gender':'Percentage of Players'})
#GenderDemo = pd.merge(MaleP,MaleC,how='left',on='Gender')
#GenderDemo
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Percentage of players
Total Count
Male
81.15
633
Female
17.44
136
Other / Non-Disclosed
1.41
11
Purchasing Analysis (Gender)
The below each broken by gender
#different method of normalization
#JSMax = JS['Price'].max()
#JSMin = JS['Price'].min()
#JSMean = JS['Price'].mean()
#JSStd = JS['Price'].std()
#JSN = (JS['Price'] - JSMin)/(JSMax-JSMin)
#JS['JSNor']=JSN
#JS.head()
JSGroup = JS .groupby (['Gender' ])
#Purchase Count
PCbyG = JSGroup ['Price' ].count ()
#Average Purchase Price
APbyG = round (JSGroup ['Price' ].mean (),2 )
#Total Purchase Value
TPbyG = JSGroup ['Price' ].sum ()
#Normalized Totals
# diff Method TPbyGN = JSGroup['JSNor'].sum()
TPbyGN = round (TPbyG / PCbyG , 2 )
PAbyG = pd .DataFrame ({"Purchase Count" :PCbyG ,"Average Purchase Price" :APbyG ,"Total Purchase Value" :TPbyG ,"Normalized Totals" :TPbyGN })
PAbyG = PAbyG [["Purchase Count" ,"Average Purchase Price" ,"Total Purchase Value" ,"Normalized Totals" ]]
PAbyG
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Purchase Count
Average Purchase Price
Total Purchase Value
Normalized Totals
Gender
Female
136
2.82
382.91
2.82
Male
633
2.95
1867.68
2.95
Other / Non-Disclosed
11
3.25
35.74
3.25
The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.)
# Create the bins in which Data will be held
JS ["Age" ]= pd .to_numeric (JS ['Age' ])
bins = [0 ,9 ,14 ,19 ,24 ,29 ,34 ,39 ,40 ]
group_names = ['<10' ,'10-14' ,'15-19' ,'20-24' ,'25-29' ,'30-34' ,'35-39' ,'40+' ]
JS ["Age Group" ] = pd .cut (JS ["Age" ],bins ,labels = group_names )
JSgrAge = JS .groupby (['Age Group' ])
TCbyA = JS ["Age Group" ].count ()
VCbyA = JS ["Age Group" ].value_counts ()
PCbyA = round (VCbyA / TCbyA * 100 ,2 )
# Creating a new DataFrame using both duration and count
AgeDemo = pd .DataFrame ({"Percentage of players" :PCbyA ,"Total Count" :VCbyA })
AgeDemo = AgeDemo .sort_index ()
AgeDemo
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Percentage of players
Total Count
<10
3.60
28
10-14
4.50
35
15-19
17.12
133
20-24
43.24
336
25-29
16.09
125
30-34
8.24
64
35-39
5.41
42
40+
1.80
14
JSgrAge = JS .groupby (['Age Group' ])
#Purchase Count
PCbyA = JSgrAge ['Price' ].count ()
#Average Purchase Price
APbyA = round (JSgrAge ['Price' ].mean (),2 )
#Total Purchase Value
TPbyA = JSgrAge ['Price' ].sum ()
#Normalized Totals
#MaxbyA = JSgrAge['Price'].max() (diff way of Calc)
#MinbyA = JSgrAge['Price'].min() (diff Way of Calc)
#TPbyAN = round((TPbyA-APbyA) / (MaxbyA-MinbyA),2)
TPbyAN = round (TPbyA / PCbyA , 2 )
ADbyG = pd .DataFrame ({"Purchase Count" :PCbyA ,"Average Purchase Price" :APbyA ,"Total Purchase Value" :TPbyA ,"Normalized Totals" :TPbyAN })
ADbyG = ADbyG [["Purchase Count" ,"Average Purchase Price" ,"Total Purchase Value" ,"Normalized Totals" ]]
ADbyG
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Purchase Count
Average Purchase Price
Total Purchase Value
Normalized Totals
Age Group
<10
28
2.98
83.46
2.98
10-14
35
2.77
96.95
2.77
15-19
133
2.91
386.42
2.91
20-24
336
2.91
978.77
2.91
25-29
125
2.96
370.33
2.96
30-34
64
3.08
197.25
3.08
35-39
42
2.84
119.40
2.84
40+
14
3.22
45.11
3.22
Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
#SN
GroupbySN = JS .groupby (['SN' ])
#Total Purchase Value by SN
TPbyS = GroupbySN ['Price' ].sum ()
#TPbyS = TPbyA.sort_values('Price', ascending=False)
#Purchase Count
PCbyS = GroupbySN ['Price' ].count ()
#Average Purchase Price
APbyS = GroupbySN ['Price' ].mean ()
TopS = pd .DataFrame ({"Purchase Count" :PCbyS ,"Average Purchase Price" :round (APbyS ,2 ),"Total Purchase Value" :TPbyS })
TopS = TopS [["Purchase Count" ,"Average Purchase Price" ,"Total Purchase Value" ]]
TopS = TopS .sort_values ('Total Purchase Value' ,ascending = False )
TopS .head (5 )
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Purchase Count
Average Purchase Price
Total Purchase Value
SN
Undirrala66
5
3.41
17.06
Saedue76
4
3.39
13.56
Mindimnya67
4
3.18
12.74
Haellysu29
3
4.24
12.73
Eoda93
3
3.86
11.58
Identify the 5 most popular items by purchase count, then list (in a table):
#Item ID
GroupbyID = JS .groupby (['Item ID' ,'Item Name' ])
#Total Purchase Value by ITem
TPbyI = GroupbyID ['Price' ].sum ()
#TPbyS = TPbyA.sort_values('Price', ascending=False)
#Purchase Count
PCbyI = GroupbyID ['Price' ].count ()
#Average Purchase Price
APbyI = GroupbyID ['Price' ].mean ()
#Item Price
IPbyI = GroupbyID ['Price' ]
TopI = pd .DataFrame ({"Purchase Count" :PCbyI ,"Item Price" :APbyI ,"Total Purchase Value" :TPbyI })
TopI = TopI [["Purchase Count" ,"Item Price" ,"Total Purchase Value" ]]
TopI = TopI .sort_values ('Purchase Count' ,ascending = False )
TopI .head ()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Purchase Count
Item Price
Total Purchase Value
Item ID
Item Name
39
Betrayal, Whisper of Grieving Widows
11
2.35
25.85
84
Arcane Gem
11
2.23
24.53
31
Trickster
9
2.07
18.63
175
Woeful Adamantite Claymore
9
1.24
11.16
13
Serenity
9
1.49
13.41
Identify the 5 most profitable items by total purchase value, then list (in a table):
#Item ID
GroupbyID = JS .groupby (['Item ID' ,'Item Name' ])
#Total Purchase Value by ITem
TPbyI = GroupbyID ['Price' ].sum ()
#TPbyS = TPbyA.sort_values('Price', ascending=False)
#Purchase Count
PCbyI = GroupbyID ['Price' ].count ()
#Average Purchase Price
APbyI = GroupbyID ['Price' ].mean ()
#Item Price
IPbyI = GroupbyID ['Price' ]
TopI = pd .DataFrame ({"Purchase Count" :PCbyI ,"Item Price" :APbyI ,"Total Purchase Value" :TPbyI })
TopI = TopI [["Purchase Count" ,"Item Price" ,"Total Purchase Value" ]]
TopI = TopI .sort_values ('Total Purchase Value' ,ascending = False )
TopI .head ()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
Purchase Count
Item Price
Total Purchase Value
Item ID
Item Name
34
Retribution Axe
9
4.14
37.26
115
Spectral Diamond Doomblade
7
4.25
29.75
32
Orenmir
6
4.95
29.70
103
Singed Scalpel
6
4.87
29.22
107
Splitter, Foe Of Subtlety
8
3.61
28.88