A minimal example to show how to use dtype_type to optimize memory footprint.
# sell_prices.csv.zip
# Source data: https://www.kaggle.com/c/m5-forecasting-uncertainty/
df = pd.read_csv('data/sell_prices.csv')
df.info(memory_usage='deep')
proposed_df = report_on_dataframe(df, unit="MB")
proposed_df
It shows potential dtypes for conversion, you should review if it will cause overflow issue in the future and modify accordingly if needed.
new_df = optimize_dtypes(df, proposed_df)
optimize_dtypes take your df
and the proposed_df
as an argument to convert the dataframe to the proposed dtypes.
print(f'Original df memory: {df.memory_usage(deep=True).sum()/1024/1024} MB')
print(f'Propsed df memory: {new_df.memory_usage(deep=True).sum()/1024/1024} MB')