![]() Here is a video of how the main CSV file splits into multiple files. We choose a chunk size of 50,000, which means at a time, only 50,000 rows of data will be imported. chunk_size=50000 batch_no=1 for chunk in pd.read_csv('yellow_tripdata_2016-02.csv',chunksize=chunk_size): chunk.to_csv('chunk'+str(batch_no)+'.csv',index=False) batch_no+=1 Then using read_csv() with the chunksize parameter, returns an object we can iterate over. To enable chunking, we will declare the size of the chunk in the beginning. We’ll be working with the exact dataset that we used earlier in the article, but instead of loading it all in a single go, we’ll divide it into parts and load it. Pandas’ read_csv() function comes with a chunk size parameter that controls the size of the chunk. In the case of CSV files, this would mean only loading a few lines into the memory at a given point in time. ![]() In order words, instead of reading all the data at once in the memory, we can divide into smaller parts or chunks. According to Wikipedia,Ĭhunking refers to strategies for improving performance by using special knowledge of a situation to aggregate related memory-allocation requests. ![]() Before working with an example, let’s try and understand what we mean by the work chunking. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |