Member-only story

Creating and Saving a Complex Pandas DataFrame with 1 Million Rows

Neural pAi
3 min readMar 28, 2023

Introduction:

In this tutorial, we’ll learn how to create a complex pandas DataFrame with 1 million rows and various data types, such as integers, floats, dates, and categorical data. We’ll then save the DataFrame as a CSV file. This can be useful when generating large datasets for testing, analysis, or machine learning purposes.

Import required libraries

First, we need to import the necessary libraries for creating and handling DataFrames and generating random data.

import pandas as pd
import numpy as np

Set the number of rows

We define a variable num_rows to store the desired number of rows (1 million) for our dataset.

num_rows = 1000000

Generate random data

We create a dictionary called data where each key represents a column name and each value is an array of random data.

data = {
'A': np.random.randint(0, 100, num_rows), # Integers between 0 and 100
'B': np.random.normal(0, 1, num_rows), # Normally distributed floats with mean 0 and std deviation 1
'C': pd.date_range('2020-01-01', periods=num_rows, freq='D'), # Daily dates starting…

--

--

No responses yet