在尝试调试groupby
函数应用程序时,有人建议我使用虚函数"查看正在传递的内容"到每个组的函数中.当然,我是游戏:
import numpy as np import pandas as pd np.random.seed(0) # so we can all play along at home categories = list('abc') categories = categories * 4 data_1 = np.random.randn(len(categories)) data_2 = np.random.randn(len(categories)) df = pd.DataFrame({'category': categories, 'data_1': data_1, 'data_2': data_2}) def f(x): print type(x) return x print 'single column transform' df.groupby(['category'])['data_1'].transform(f) print '\n' print 'single column (nested) transform' df.groupby(['category'])[['data_1']].transform(f) print '\n' print 'multiple column transform' df.groupby(['category'])[['data_1', 'data_2']].transform(f) print '\n' print '\n' print 'single column apply' df.groupby(['category'])['data_1'].apply(f) print '\n' print 'single column (nested) apply' df.groupby(['category'])[['data_1']].apply(f) print '\n' print 'multiple column apply' df.groupby(['category'])[['data_1', 'data_2']].apply(f)
这将以下内容放入我的标准输出中:
single column transformsingle column (nested) transform multiple column transform single column apply single column (nested) apply multiple column apply
所以看起来像:
转变
单列:3 Series
单列(嵌套):2 Series
和3DataFrame
多列:3 Series
和3DataFrame
应用
单列:3 Series
单列(嵌套):4 DataFrame
多列:4 DataFrame
这里发生了什么?任何人都可以解释为什么这6个调用中的每一个都导致上面描述的一系列对象被传递给指定的函数?