首页 > 编程知识 正文

python中pandas两列值交换,pythonpandas合并列表

时间:2023-05-04 03:52:48 阅读:232841 作者:1099

我有一个问题是在同一个数据帧(start_end)中将两列合并为一个,同时删除空值.我打算将“Start station”和“End station”合并到“station”中,并根据新列“station”保持“duration”.我已经尝试过pd.merge,pd.concat,pd.append,但我无法解决它.

start_end的dataFrame:

Duration End station Start station

14 1407 NaN 14th & V St NW

19 509 NaN 21st & I St NW

20 638 15th & P St NW. NaN

27 1532 NaN Massachusetts Ave & Dupont Circle NW

28 759 NaN Adams Mill & Columbia Rd NW

预期产量:

Duration stations

14 1407 14th & V St NW

19 509 21st & I St NW

20 638 15th & P St NW

27 1532 Massachusetts Ave & Dupont Circle NW

28 759 Adams Mill & Columbia Rd NW

我到目前为止的代码:

#start_end is the dataframe, 'start station', 'end station', 'duration'

start_end = pd.concat([df_start, df_end])

这是我试图:

station = pd.merge([start_end['Start station'],start_end['End station']])

解决方法:

>>> df

Duration End station Start station

0 1407 NaN 14th & V St NW

1 509 NaN 21st & I St NW

2 638 15th & P St NW. NaN

3 1532 NaN Massachusetts Ave & Dupont Circle NW

4 759 NaN Adams Mill & Columbia Rd NW

为两列提供相同的名称

>>> df.columns = df.columns.str.replace('.*?station', 'station')

>>> df

Duration station station

0 1407 NaN 14th & V St NW

1 509 NaN 21st & I St NW

2 638 15th & P St NW. NaN

3 1532 NaN Massachusetts Ave & Dupont Circle NW

4 759 NaN Adams Mill & Columbia Rd NW

然后堆栈取消堆叠.

>>> s = df.stack()

>>> s

0 Duration 1407

station 14th & V St NW

1 Duration 509

station 21st & I St NW

2 Duration 638

station 15th & P St NW.

3 Duration 1532

station Massachusetts Ave & Dupont Circle NW

4 Duration 759

station Adams Mill & Columbia Rd NW

dtype: object

>>> df = s.unstack()

>>> df

Duration station

0 1407 14th & V St NW

1 509 21st & I St NW

2 638 15th & P St NW.

3 1532 Massachusetts Ave & Dupont Circle NW

4 759 Adams Mill & Columbia Rd NW

>>>

这就是我认为这是有效的:

.stack使用MultiIndex创建一个系列,并为您处理空值.它对齐列名称的第二级,因为列名相同,只有一个 – unstacking只生成一个列.

如果你不改变列名,这只是基于Index之间差异的猜测.

>>> # without changing column names

>>> s.index

MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'End station', 'Start station']],

labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 2, 0, 2, 0, 1, 0, 2, 0, 2]])

>>> # column names the same

>>> s.index

MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'station']],

labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])

似乎有点棘手,也许有人会评论它.

替代方案 – 使用pd.concat和.dropna

>>> stations = pd.concat([df.iloc[:,1],df.iloc[:,2]]).dropna()

>>> stations.name = 'stations'

>>> stations

2 15th & P St NW.

0 14th & V St NW

1 21st & I St NW

3 Massachusetts Ave & Dupont Circle NW

4 Adams Mill & Columbia Rd NW

Name: stations, dtype: object

>>> df2 = pd.concat([df['Duration'], stations], axis=1)

>>> df2

Duration stations

0 1407 14th & V St NW

1 509 21st & I St NW

2 638 15th & P St NW.

3 1532 Massachusetts Ave & Dupont Circle NW

4 759 Adams Mill & Columbia Rd NW

标签:python,pandas,dataframe,merge,append

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。