Member-only story
Creating and Saving a Complex Pandas DataFrame with 1 Million Rows
3 min readMar 28, 2023
Introduction:
In this tutorial, we’ll learn how to create a complex pandas DataFrame with 1 million rows and various data types, such as integers, floats, dates, and categorical data. We’ll then save the DataFrame as a CSV file. This can be useful when generating large datasets for testing, analysis, or machine learning purposes.
Import required libraries
First, we need to import the necessary libraries for creating and handling DataFrames and generating random data.
import pandas as pd
import numpy as np
Set the number of rows
We define a variable num_rows
to store the desired number of rows (1 million) for our dataset.
num_rows = 1000000
Generate random data
We create a dictionary called data
where each key represents a column name and each value is an array of random data.
data = {
'A': np.random.randint(0, 100, num_rows), # Integers between 0 and 100
'B': np.random.normal(0, 1, num_rows), # Normally distributed floats with mean 0 and std deviation 1
'C': pd.date_range('2020-01-01', periods=num_rows, freq='D'), # Daily dates starting…