Taxi data for a city , One day 33210000 Records , How to separate the data of each car and put it into a special file ?
  The idea is simple :
  It's a cycle 33210000 Records , Move each vehicle's data to its proper file .
 
 But for 3000 More than ten thousand data , One cycle after another is too time consuming , I spent it 2 It took hours to move 60 Ten thousand data , Calculation 3000 I need to spend 100 Hours , That's what you need 4-5 day . And it needs to be turned on all day for five days , There must be no jamming .
 therefore , It needs to be done in parallel for Skills of circulation :
  because 3000 Ten thousand data csv In the middle csv Cannot be opened , So I took one csv adopt split The software cuts it into pieces 60 ten thousand , common 53 individual csv.
 
 My original idea was to read the folder , Get by each 60 Ten thousand csv List of documents , And then to each one separately 60 Ten thousand csv Processing . It's actually a cycle 33210000 second , parallel for Loop is dealing with several at the same time 60 Ten thousand csv file , It can reduce time consumption by times .
  In parallel for The loop is inspired by the following approach :
  I did something like this before :
 words = ['apple', 'bananan', 'cake', 'dumpling'] for word in words: print word
 parallel for The loop is like this :
from multiprocessing.dummy import Pool as ThreadPool items = list() pool = 
ThreadPool() pool.map(process, items) pool.close() pool.join() 
 among ,process Is the function to process 
  The example code is as follows :
# -*- coding: utf-8 -*- import time from multiprocessing.dummy import Pool as 
ThreadPool def process(item): print(' In parallel for loop ') print(item) time.sleep(5) 
items = ['apple', 'bananan', 'cake', 'dumpling'] pool = ThreadPool() 
pool.map(process, items) pool.close() pool.join() 
Technology