Create a new column in the dataframe that contains a list of unique items from the 'source' column.

Unique list of items contained in a dataframe column

2 min read
import pandas as pd
df=pd.DataFrame({'state':['NY','CA','CT','NJ','NY','FL','FL','CA'],\
                 'source':['source_a','source_a','source_a','source_b','source_b','source_b',\
                 'source_d','source_c']})
df

To create the unique list for each of ‘state’ value, use ‘groupby’ and apply(list)

df_grouped=df.groupby('state')['source'].unique().apply(list).reset_index()
df_grouped
state source
0 CA [source_a, source_c]
1 CT [source_a]
2 FL [source_b, source_d]
3 NJ [source_b]
4 NY [source_a, source_b]

Merge with original dataframe

result=pd.merge(df,df_grouped,on='state',how='left')
result
state source_x source_y
0 NY source_a [source_a, source_b]
1 CA source_a [source_a, source_c]
2 CT source_a [source_a]
3 NJ source_b [source_b]
4 NY source_b [source_a, source_b]
5 FL source_b [source_b, source_d]
6 FL source_d [source_b, source_d]
7 CA source_c [source_a, source_c]

Rename ‘source_x’ and ‘source_y’ columns

result.rename(columns={'source_x':'source_original','source_y':'unique_list'},inplace=True)
result
state source_original unique_list
0 NY source_a [source_a, source_b]
1 CA source_a [source_a, source_c]
2 CT source_a [source_a]
3 NJ source_b [source_b]
4 NY source_b [source_a, source_b]
5 FL source_b [source_b, source_d]
6 FL source_d [source_b, source_d]
7 CA source_c [source_a, source_c]

Join the Discussion on Discord

Have questions about this post? Want to share your own Python automation stories? Join our growing community of finance and accounting professionals who are learning Python!

Discuss on Discord

💬 Get help with your code
🎯 Share your automation wins
🤝 Connect with peers in finance & accounting