Value error in assigning to dataframe

Question

I am assigning different data to one dataframe. And I had the following

ValueError: If using all scalar values, you must pass an index

I follow the question post by other Here

But it did not work out.

The following is my code. All you have to do is copy and paste the code to IDE.

import pandas as pd
import numpy as np

#Loading Team performance Data (ExpG (Home away)) For and against
epl_1718 = pd.read_csv("http://www.football-data.co.uk/mmz4281/1718/E0.csv")

epl_1718 = epl_1718[['HomeTeam','AwayTeam','FTHG','FTAG']]

epl_1718 = epl_1718.rename(columns={'FTHG': 'HomeGoals', 'FTAG': 'AwayGoals'})
Home_goal_avg = epl_1718['HomeGoals'].mean()
Away_goal_avg = epl_1718['AwayGoals'].mean()


Home_team_goals        = epl_1718.groupby(['HomeTeam'])['HomeGoals'].sum()
Home_count             = epl_1718.groupby(['HomeTeam'])['HomeTeam'].count()
Home_team_avg_goal     = Home_team_goals/Home_count
Home_team_concede      = epl_1718.groupby(['HomeTeam'])['AwayGoals'].sum()
EPL_Home_average_score = epl_1718['HomeGoals'].mean()
EPL_Home_average_conc  = epl_1718['HomeGoals'].mean()
Home_team_avg_conc     = Home_team_concede/Home_count

Away_team_goals        = epl_1718.groupby(['AwayTeam'])['AwayGoals'].sum()
Away_count             = epl_1718.groupby(['AwayTeam'])['AwayTeam'].count()
Away_team_avg_goal     = Away_team_goals/Away_count
Away_team_concede      = epl_1718.groupby(['AwayTeam'])['HomeGoals'].sum()
EPL_Away_average_score = epl_1718['AwayGoals'].mean()
EPL_Away_average_conc  = epl_1718['HomeGoals'].mean()
Away_team_avg_conc     = Away_team_concede/Away_count



Home_attk_sth = Home_team_avg_goal/EPL_Home_average_score
Home_attk_sth = Home_attk_sth.sort_index().reset_index()

Home_def_sth  = Home_team_avg_conc/EPL_Home_average_conc
Home_def_sth  = Home_def_sth .sort_index().reset_index()

Away_attk_sth = Away_team_avg_goal/EPL_Away_average_score
Away_attk_sth = Away_attk_sth .sort_index().reset_index()


Away_def_sth  = Away_team_avg_conc/EPL_Away_average_conc
Away_def_sth = Away_def_sth.sort_index().reset_index()

Home_def_sth
HomeTeam = epl_1718['HomeTeam'].drop_duplicates().sort_index().reset_index().set_index('HomeTeam')
AwayTeam = epl_1718['AwayTeam'].drop_duplicates().sort_index().reset_index().sort_values(['AwayTeam']).set_index(['AwayTeam'])
#HomeTeam = HomeTeam.sort_index().reset_index()

Team = HomeTeam.append(AwayTeam).drop_duplicates()





Data = pd.DataFrame({"Team":Team,
                     "Home_attkacking":Home_attk_sth,
                     "Home_def": Home_def_sth,
                     "Away_Attacking":Away_attk_sth,
                     "Away_def":Away_def_sth,
                     "EPL_Home_avg_score":EPL_Home_average_score,
                     "EPL_Home_average_conc":EPL_Home_average_conc,
                     "EPL_Away_average_score":EPL_Away_average_score,
                     "EPL_Away_average_conc":EPL_Away_average_conc},
                    columns =['Team','Home_attacking','Home_def','Away_attacking','Away_def',
                             'EPL_Home_avg_score','EPL_Home_avg_conc','EPL_Away_avg_score','EPL_Away_average_conc'])

In this code, what I am trying to do is to get average goal score per team per game, average goals conceded per team per game. And then I am calculating other performance factors such as attacking strength, defensive strenght etc.

I have to paste the code as if i use example, creating data frame would work. Thanks for understanding. Thanks in advance for the advice too.

The format (or the columns) of final data frame will look like as follow:

Team Home Attacking Home Defensive Away attacking away defensive

and so on as mentioned in the data frame.

It means, there will be only 20 teams under team columns The shape of dataframe will be ( 20,9)

Regards,

Zep

If check `Team`, there are duplicated values `Southampton`. Do you need sum values? 5 + 22 ? Or need only first dupe row, here `5` ? — jezrael, Jun 14 '18 at 07:04
Hi Jez, Thanks for the reply. I need only one but the value has to sum up. I dropped the duplicates. but why it is still in? — Zephyr, Jun 14 '18 at 07:08
Thank you so much. I will edit the post to include how does the format of final data frame. — Zephyr, Jun 14 '18 at 07:10

score 1 · Accepted Answer · answered Jun 14 '18 at 07:26

Here main idea is remove reset_index for Series with index by teams, so variable Team is not necessary and is created as last step by reset_index. Also be carefull with columns names in DataFrame constructor, if there are changed like EPL_Home_average_conc in dictionary and then EPL_Home_avg_conc get NaNs columns:

Home_team_goals        = epl_1718.groupby(['HomeTeam'])['HomeGoals'].sum()
Home_count             = epl_1718.groupby(['HomeTeam'])['HomeTeam'].count()
Home_team_avg_goal     = Home_team_goals/Home_count
Home_team_concede      = epl_1718.groupby(['HomeTeam'])['AwayGoals'].sum()
EPL_Home_average_score = epl_1718['HomeGoals'].mean()
EPL_Home_average_conc  = epl_1718['HomeGoals'].mean()
Home_team_avg_conc     = Home_team_concede/Home_count

Away_team_goals        = epl_1718.groupby(['AwayTeam'])['AwayGoals'].sum()
Away_count             = epl_1718.groupby(['AwayTeam'])['AwayTeam'].count()
Away_team_avg_goal     = Away_team_goals/Away_count
Away_team_concede      = epl_1718.groupby(['AwayTeam'])['HomeGoals'].sum()
EPL_Away_average_score = epl_1718['AwayGoals'].mean()
EPL_Away_average_conc  = epl_1718['HomeGoals'].mean()
Away_team_avg_conc     = Away_team_concede/Away_count


#removed reset_index
Home_attk_sth = Home_team_avg_goal/EPL_Home_average_score
Home_attk_sth = Home_attk_sth.sort_index()

Home_def_sth  = Home_team_avg_conc/EPL_Home_average_conc
Home_def_sth  = Home_def_sth .sort_index()

Away_attk_sth = Away_team_avg_goal/EPL_Away_average_score
Away_attk_sth = Away_attk_sth .sort_index()


Away_def_sth  = Away_team_avg_conc/EPL_Away_average_conc
Away_def_sth = Away_def_sth.sort_index()

Data = pd.DataFrame({"Home_attacking":Home_attk_sth,
                     "Home_def": Home_def_sth,
                     "Away_attacking":Away_attk_sth,
                     "Away_def":Away_def_sth,
                     "EPL_Home_average_score":EPL_Home_average_score,
                     "EPL_Home_average_conc":EPL_Home_average_conc,
                     "EPL_Away_average_score":EPL_Away_average_score,
                     "EPL_Away_average_conc":EPL_Away_average_conc},
                    columns =['Home_attacking','Home_def','Away_attacking','Away_def',
                              'EPL_Home_average_score','EPL_Home_average_conc',
                              'EPL_Away_average_score','EPL_Away_average_conc'])

#column from index
Data = Data.rename_axis('Team').reset_index()
print (Data)

thank you so much Jez. I was not aware the impact of reset_index here. — Zephyr, Jun 14 '18 at 07:32

Value error in assigning to dataframe

1 Answers1