Skip to content

read_csv in multiple theads causes segmentation fault #11786

Closed
@mrocklin

Description

@mrocklin

The following script causes a segfault on my machine

from io import BytesIO
from multiprocessing.pool import ThreadPool
import pandas as pd

# Make many fake CSV files in memory
bytes = ['\n'.join(['%d,%d,%d' % (i,i,i) for i in range(10000)]).encode()
         for j in range(100)]
files = [BytesIO(b) for b in bytes]

# Read all files in many threads
pool = ThreadPool(8)
pool.map(pd.read_csv, files)
$ python script.py 
Segmentation fault (core dumped)

Python 3.4, Pandas 0.17.1, Ubuntu 14.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions