Skip to content

Tags to dummies helper function #3695

Closed
@wesm

Description

@wesm

from @jreback's clever data alignment trick: http://stackoverflow.com/questions/16637171/pandas-reshaping-data

not sure where this should go, it does come up quite frequently, e.g. the Movielens data set https://raw.github.com/pydata/pydata-book/master/ch02/movielens/movies.dat:

df = read_table('https://raw.github.com/pydata/pydata-book/master/ch02/movielens/movies.dat', header=None, sep='::')

In [10]: genres = df[2].str.split('|')

In [11]: genres
Out[11]: 
0      [Animation, Children's, Comedy]
1     [Adventure, Children's, Fantasy]
2                    [Comedy, Romance]
3                      [Comedy, Drama]
4                             [Comedy]
5            [Action, Crime, Thriller]
6                    [Comedy, Romance]
7              [Adventure, Children's]
8                             [Action]
9        [Action, Adventure, Thriller]
10            [Comedy, Drama, Romance]
11                    [Comedy, Horror]
12             [Animation, Children's]
13                             [Drama]
14        [Action, Adventure, Romance]
...
3868                              [Horror]
3869                              [Horror]
3870                              [Horror]
3871                              [Horror]
3872                              [Horror]
3873                              [Comedy]
3874                       [Comedy, Drama]
3875    [Adventure, Animation, Children's]
3876             [Action, Drama, Thriller]
3877                            [Thriller]
3878                              [Comedy]
3879                               [Drama]
3880                               [Drama]
3881                               [Drama]
3882                     [Drama, Thriller]
Name: 2, Length: 3883, dtype: object

In [12]: dummies = genres.apply(lambda x: Series(1, index=x)).fillna(0)

In [13]: dummies[:4].T
Out[13]: 
             0  1  2  3
Action       0  0  0  0
Adventure    0  1  0  0
Animation    1  0  0  0
Children's   1  1  0  0
Comedy       1  0  1  1
Crime        0  0  0  0
Documentary  0  0  0  0
Drama        0  0  0  1
Fantasy      0  1  0  0
Film-Noir    0  0  0  0
Horror       0  0  0  0
Musical      0  0  0  0
Mystery      0  0  0  0
Romance      0  0  1  0
Sci-Fi       0  0  0  0
Thriller     0  0  0  0
War          0  0  0  0
Western      0  0  0  0

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions