1

I have a directory with a hundred the CSV files inside. One of the CSV looks like this;

Time    ID
09:00   A
..      ..

I want to join all of the csv into one dataframe with including name of file (append by axis=1) I used this code:

files = glob.glob(data/*.csv')
df = pd.concat([pd.read_csv(fp).assign(File=os.path.basename(fp).split('.')[0]) for fp in files], axis=1)
df.to_csv('new.csv')
df

I got a result looks like this

Time    ID  File  Time  ID  File    ..
09:00   A   01    09:00 B   02      ..
..      ..  ..    ..    ..  ..      ..

I want to join the ID column name with the file name as a column name. my expected result looks like this:

Time    01_ID   Time    02_ID   ..
09:00   A       09:00   B       ..
..      ..      ..      ..      ..
Arief Hidayat
  • 937
  • 1
  • 8
  • 19
  • `pandas.assign(col=...)` treats the word `col` as the *string* name of the new column, and not as a variable named `col`. You can bypass the issue by passing it a dictionary with **, but it don't think it's possible to do in a list comprehension. https://stackoverflow.com/questions/39767718/pandas-assign-with-new-column-name-as-string – Aryerez Sep 25 '19 at 05:33

1 Answers1

2

You can use dictionary comprehension first:

comp = {os.path.basename(fp).split('.')[0]: pd.read_csv(fp) for fp in files}
df = pd.concat(comp, axis=1)

And then filter in list comprehension for convert MultiIndex in columns:

df.columns = [f"{a}_{b}" if b == 'ID' else b for a, b in df.columns]
print (df)
    Time 01_ID   Time 02_ID
0  09:00     A  09:00     B

df.to_csv('new.csv')

EDIT: Better solution is create unique columns names:

df.columns = df.columns.map('_'.join)
print (df)
  01_Time 01_ID 02_Time 02_ID
0   09:00     A   09:00     B
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thank you for comments. How about the only a particular column or only one column (not all of the column include the file name)? – Arief Hidayat Sep 25 '19 at 05:34