Closed
Description
from @jreback's clever data alignment trick: http://stackoverflow.com/questions/16637171/pandas-reshaping-data
not sure where this should go, it does come up quite frequently, e.g. the Movielens data set https://raw.github.com/pydata/pydata-book/master/ch02/movielens/movies.dat:
df = read_table('https://raw.github.com/pydata/pydata-book/master/ch02/movielens/movies.dat', header=None, sep='::')
In [10]: genres = df[2].str.split('|')
In [11]: genres
Out[11]:
0 [Animation, Children's, Comedy]
1 [Adventure, Children's, Fantasy]
2 [Comedy, Romance]
3 [Comedy, Drama]
4 [Comedy]
5 [Action, Crime, Thriller]
6 [Comedy, Romance]
7 [Adventure, Children's]
8 [Action]
9 [Action, Adventure, Thriller]
10 [Comedy, Drama, Romance]
11 [Comedy, Horror]
12 [Animation, Children's]
13 [Drama]
14 [Action, Adventure, Romance]
...
3868 [Horror]
3869 [Horror]
3870 [Horror]
3871 [Horror]
3872 [Horror]
3873 [Comedy]
3874 [Comedy, Drama]
3875 [Adventure, Animation, Children's]
3876 [Action, Drama, Thriller]
3877 [Thriller]
3878 [Comedy]
3879 [Drama]
3880 [Drama]
3881 [Drama]
3882 [Drama, Thriller]
Name: 2, Length: 3883, dtype: object
In [12]: dummies = genres.apply(lambda x: Series(1, index=x)).fillna(0)
In [13]: dummies[:4].T
Out[13]:
0 1 2 3
Action 0 0 0 0
Adventure 0 1 0 0
Animation 1 0 0 0
Children's 1 1 0 0
Comedy 1 0 1 1
Crime 0 0 0 0
Documentary 0 0 0 0
Drama 0 0 0 1
Fantasy 0 1 0 0
Film-Noir 0 0 0 0
Horror 0 0 0 0
Musical 0 0 0 0
Mystery 0 0 0 0
Romance 0 0 1 0
Sci-Fi 0 0 0 0
Thriller 0 0 0 0
War 0 0 0 0
Western 0 0 0 0