='deep') df_gamelogs.info(memory_usage
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 171907 entries, 0 to 171906
Columns: 161 entries, date to acquisition_info
dtypes: float64(77), int64(6), object(78)
memory usage: 860.5 MB
November 10, 2020
We can check some basic info about the data with pandas .info()
function
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 171907 entries, 0 to 171906
Columns: 161 entries, date to acquisition_info
dtypes: float64(77), int64(6), object(78)
memory usage: 860.5 MB
We can see the data has 171907 rows and 161 columns and 859.4 MB memory. Let’s see how much we can optimize dtype_diet
.
Current dtype | Proposed dtype | Current Memory (MB) | Proposed Memory (MB) | Ram Usage Improvement (MB) | Ram Usage Improvement (%) | |
---|---|---|---|---|---|---|
Column | ||||||
date | int64 | int32 | 671.574219 | 335.818359 | 335.755859 | 49.995347 |
number_of_game | int64 | int8 | 671.574219 | 84.001465 | 587.572754 | 87.491857 |
day_of_week | object | category | 5036.400391 | 84.362793 | 4952.037598 | 98.324939 |
v_name | object | category | 5036.400391 | 174.776367 | 4861.624023 | 96.529736 |
v_league | object | category | 4952.461426 | 84.359375 | 4868.102051 | 98.296617 |
... | ... | ... | ... | ... | ... | ... |
h_player_9_id | object | category | 4955.471680 | 412.757324 | 4542.714355 | 91.670675 |
h_player_9_name | object | category | 5225.463379 | 421.197266 | 4804.266113 | 91.939523 |
h_player_9_def_pos | float64 | float16 | 671.574219 | 167.940430 | 503.633789 | 74.993020 |
additional_info | object | category | 2714.671875 | 190.601074 | 2524.070801 | 92.978854 |
acquisition_info | object | category | 4749.209961 | 84.070801 | 4665.139160 | 98.229794 |
161 rows × 6 columns
print(f'Original df memory: {df_gamelogs.memory_usage(deep=True).sum()/1024/1024} MB')
print(f'Propsed df memory: {new_df.memory_usage(deep=True).sum()/1024/1024} MB')
Original df memory: 860.500262260437 MB
Propsed df memory: 79.04368686676025 MB