Saturday, March 14, 2020

Reading CSV | JSON > 5gb as Pandas Dataframe¶

Untitled1

Reading CSV | JSON > 5gb as Pandas Dataframe

In [1]:
import pandas as pd
import os
import sys
In [2]:
for x in os.listdir():
    print(x)
.ipynb_checkpoints
AWS SOUMIL CREDS
Code
converted-keys.ppk
netflix_titles.csv
puttygen (1).exe
Untitled.ipynb
Untitled1.ipynb
ytdtest.pem
In [11]:
ChunkSize = 10
for chunk in pd.read_csv("netflix_titles.csv", chunksize=ChunkSize):
    print(chunk.shape)
    print("="*66)
    print(chunk.head(2))
    print("="*66)
    break
(10, 12)
==================================================================
    show_id   type                                    title  \
0  81145628  Movie  Norm of the North: King Sized Adventure   
1  80117401  Movie               Jandino: Whatever it Takes   

                   director  \
0  Richard Finn, Tim Maltby   
1                       NaN   

                                                cast  \
0  Alan Marriott, Andrew Toth, Brian Dobson, Cole...   
1                                   Jandino Asporaat   

                                    country         date_added  release_year  \
0  United States, India, South Korea, China  September 9, 2019          2019   
1                            United Kingdom  September 9, 2016          2016   

  rating duration                           listed_in  \
0  TV-PG   90 min  Children & Family Movies, Comedies   
1  TV-MA   94 min                     Stand-Up Comedy   

                                         description  
0  Before planning an awesome wedding for his gra...  
1  Jandino Asporaat riffs on the challenges of ra...  
==================================================================
In [12]:
MyList = []
ChunkSize = 10
for chunk in pd.read_csv("netflix_titles.csv", chunksize=ChunkSize):
    MyList.append(chunk)

What we did

  • [df1, df2, df3 ......, dfN]

  • we had 6234/Chunk size

  • 624
In [13]:
len(MyList)
Out[13]:
624
In [15]:
df = pd.read_csv("netflix_titles.csv")
df.shape
Out[15]:
(6234, 12)

Let us see one of the chunk

In [17]:
MyList[0]
Out[17]:
show_id type title director cast country date_added release_year rating duration listed_in description
0 81145628 Movie Norm of the North: King Sized Adventure Richard Finn, Tim Maltby Alan Marriott, Andrew Toth, Brian Dobson, Cole... United States, India, South Korea, China September 9, 2019 2019 TV-PG 90 min Children & Family Movies, Comedies Before planning an awesome wedding for his gra...
1 80117401 Movie Jandino: Whatever it Takes NaN Jandino Asporaat United Kingdom September 9, 2016 2016 TV-MA 94 min Stand-Up Comedy Jandino Asporaat riffs on the challenges of ra...
2 70234439 TV Show Transformers Prime NaN Peter Cullen, Sumalee Montano, Frank Welker, J... United States September 8, 2018 2013 TV-Y7-FV 1 Season Kids' TV With the help of three human allies, the Autob...
3 80058654 TV Show Transformers: Robots in Disguise NaN Will Friedle, Darren Criss, Constance Zimmer, ... United States September 8, 2018 2016 TV-Y7 1 Season Kids' TV When a prison ship crash unleashes hundreds of...
4 80125979 Movie #realityhigh Fernando Lebrija Nesta Cooper, Kate Walsh, John Michael Higgins... United States September 8, 2017 2017 TV-14 99 min Comedies When nerdy high schooler Dani finally attracts...
5 80163890 TV Show Apaches NaN Alberto Ammann, Eloy Azorín, Verónica Echegui,... Spain September 8, 2017 2016 TV-MA 1 Season Crime TV Shows, International TV Shows, Spanis... A young journalist is forced into a life of cr...
6 70304989 Movie Automata Gabe Ibáñez Antonio Banderas, Dylan McDermott, Melanie Gri... Bulgaria, United States, Spain, Canada September 8, 2017 2014 R 110 min International Movies, Sci-Fi & Fantasy, Thrillers In a dystopian future, an insurance adjuster f...
7 80164077 Movie Fabrizio Copano: Solo pienso en mi Rodrigo Toro, Francisco Schultz Fabrizio Copano Chile September 8, 2017 2017 TV-MA 60 min Stand-Up Comedy Fabrizio Copano takes audience participation t...
8 80117902 TV Show Fire Chasers NaN NaN United States September 8, 2017 2017 TV-MA 1 Season Docuseries, Science & Nature TV As California's 2016 fire season rages, brave ...
9 70304990 Movie Good People Henrik Ruben Genz James Franco, Kate Hudson, Tom Wilkinson, Omar... United States, United Kingdom, Denmark, Sweden September 8, 2017 2014 R 90 min Action & Adventure, Thrillers A struggling couple can't believe their luck w...

Combining Chunks

What we did

  • [df1, df2, df3 ......, dfN]

  • we had 6234/Chunk size

  • 624
In [18]:
df1 = pd.concat(MyList, axis=0)
In [20]:
df1.shape
Out[20]:
(6234, 12)
In [21]:
df1.head()
Out[21]:
show_id type title director cast country date_added release_year rating duration listed_in description
0 81145628 Movie Norm of the North: King Sized Adventure Richard Finn, Tim Maltby Alan Marriott, Andrew Toth, Brian Dobson, Cole... United States, India, South Korea, China September 9, 2019 2019 TV-PG 90 min Children & Family Movies, Comedies Before planning an awesome wedding for his gra...
1 80117401 Movie Jandino: Whatever it Takes NaN Jandino Asporaat United Kingdom September 9, 2016 2016 TV-MA 94 min Stand-Up Comedy Jandino Asporaat riffs on the challenges of ra...
2 70234439 TV Show Transformers Prime NaN Peter Cullen, Sumalee Montano, Frank Welker, J... United States September 8, 2018 2013 TV-Y7-FV 1 Season Kids' TV With the help of three human allies, the Autob...
3 80058654 TV Show Transformers: Robots in Disguise NaN Will Friedle, Darren Criss, Constance Zimmer, ... United States September 8, 2018 2016 TV-Y7 1 Season Kids' TV When a prison ship crash unleashes hundreds of...
4 80125979 Movie #realityhigh Fernando Lebrija Nesta Cooper, Kate Walsh, John Michael Higgins... United States September 8, 2017 2017 TV-14 99 min Comedies When nerdy high schooler Dani finally attracts...
In [ ]:
 

2 comments:

  1. This is so helpful. Thank you!

    ReplyDelete
  2. This is not working in pycharm. i have a 150 MB file nd the pycharm is not creating its dataframe even after reading chunkwise

    ReplyDelete

Develop Full Text Search (Semantics Search) with Postgres (PGVector) and Python Hands on Lab

final-notebook Develop Full Text Search (Semantics Search) with Postgres (PGVector)...