Need Data for Testing or Analysis? Use This Advanced Python Script.

What Does the Script Do?
This script generates a comprehensive dataset of daily sales records, complete with realistic attributes. The data will be saved in CSV, XLS, XLSX, and SQL script formats to your desktop. Each row represents a simulated transaction with the following fields:
- Date: The date of the transaction.
- Timestamp: The full timestamp in
YYYY-MM-DD HH:MM:SS
format. - Unix Date: The UNIX timestamp of the transaction date.
- Year, Month, Day: Breakdown of the transaction date into year, month, and day for easier analysis.
- FirstName and LastName: Randomly generated customer names.
- Contact: Customer name with a randomly assigned role (e.g., “Owner,” “Manager”).
- Product: One of three products (“Product A,” “Product B,” or “Product C”).
- Channel: The sales channel (“Online,” “Store,” or “Social Media”).
- Sales: A calculated sales value based on product-specific ranges, seasonal multipliers, and weekend adjustments.
- ROI: A calculated return on investment value based on sales.
The script includes logic for:
- Seasonal trends (e.g., higher sales in December and November).
- Weekend adjustments (e.g., increased sales on Saturdays and Sundays).
- Multiple output formats (CSV, XLS, XLSX, and SQL).
Installation Commands
To use this script, ensure the following Python libraries are installed:
pip install pandas openpyxl
How to Run the Script
- Save the script as
create_data.py
on your computer. - Ensure Python and the required libraries are installed.
- Run the script from your terminal or IDE:
python create_data.py
- Locate the generated files on your desktop:
- CSV file:
sales_data.csv
- XLS file:
sales_data.xls
- XLSX file:
sales_data.xlsx
- SQL file:
sales_data.sql
- CSV file:
Example: SQL Script Output
The SQL file includes table creation and data insertion statements, such as:
CREATE TABLE sales_data (
Date TEXT,
Timestamp TEXT,
Unix_Date INTEGER,
Year INTEGER,
Month INTEGER,
Day INTEGER,
FirstName TEXT,
LastName TEXT,
Contact TEXT,
Product TEXT,
Channel TEXT,
Sales REAL,
ROI REAL
);
INSERT INTO sales_data (Date, Timestamp, Unix_Date, Year, Month, Day, FirstName, LastName, Contact, Product, Channel, Sales, ROI)
VALUES ("2023-12-31", "2023-12-31 23:59:59", 1672444799, 2023, 12, 31, "John", "Doe", "John Doe, Manager", "Product A", "Online", 145.67, 1.25);
NOTE: With respect to the .xls file only, you may receive the error below. In the event you do get that error, select “Yes”, let the data load and use save_as to render the file as true .xls.

Why Choose the Script?
The script provides:
- Flexible Output Formats: Generate data in your preferred format (CSV, Excel, or SQL).
- Dynamic Realism: Simulated trends for seasons, weekends, and product-specific behavior.
- Enhanced Use Cases: Ideal for testing database ingestion, analysis pipelines, or data visualization tools.
FULL CODE:
import os
import random
import pandas as pd
from datetime import datetime, timedelta
# Get current user's desktop path
desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
file_csv_path = os.path.join(desktop_path, "sales_data.csv")
file_xls_path = os.path.join(desktop_path, "sales_data.xls")
file_xlsx_path = os.path.join(desktop_path, "sales_data.xlsx")
file_sql_path = os.path.join(desktop_path, "sales_data.sql")
# Define date range
start_date = datetime(2017, 1, 1)
end_date = datetime(2023, 12, 31)
num_rows = (end_date - start_date).days + 1 # Total number of days in range
# Define first and last names pools
first_names = [
"John", "Jane", "Alex", "Emily", "Chris", "Sara", "Michael", "Jessica",
"David", "Laura", "Robert", "Anna", "James", "Sophia", "Daniel", "Olivia",
"Paul", "Mia", "Mark", "Chloe", "Andrew", "Isabella", "Peter", "Emma", "Liam"
]
last_names = [
"Smith", "Doe", "Johnson", "Brown", "Lee", "Taylor", "Wilson", "Anderson",
"Thomas", "White", "Harris", "Martin", "Thompson", "Garcia", "Martinez",
"Robinson", "Clark", "Rodriguez", "Lewis", "Walker", "Young", "Allen",
"King", "Scott", "Adams"
]
roles = ["Owner", "Manager", "Asst. Manager", "LLC"]
# Create sample data
data = []
for i in range(num_rows):
current_date = start_date + timedelta(days=i)
timestamp = current_date.strftime("%Y-%m-%d %H:%M:%S")
unix_date = int(current_date.timestamp())
first_name = random.choice(first_names)
last_name = random.choice(last_names)
contact = f"{first_name} {last_name}, {random.choice(roles)}"
# Define product-specific sales ranges
product = random.choice(["Product A", "Product B", "Product C"])
base_sales = {
"Product A": (20, 100),
"Product B": (30, 150),
"Product C": (10, 200),
}[product]
# Apply seasonal multiplier
month = current_date.month
seasonal_multiplier = {
12: 1.5, # December (holidays)
11: 1.3, # November (Black Friday/Cyber Monday)
6: 1.2, # June (summer sales)
}.get(month, 1.0)
# Increase sales for weekends
if current_date.weekday() >= 5: # Saturday, Sunday
seasonal_multiplier *= 1.2
# Calculate sales
sales = round(random.uniform(*base_sales) * seasonal_multiplier, 2)
# Calculate ROI based on sales
roi = round((sales / random.uniform(50, 300)), 2)
# Random channel
channel = random.choice(["Online", "Store", "Social Media"])
# Append data
data.append({
"Date": current_date.strftime("%Y-%m-%d"),
"Timestamp": timestamp,
"Unix Date": unix_date,
"Year": current_date.year,
"Month": current_date.month,
"Day": current_date.day,
"FirstName": first_name,
"LastName": last_name,
"Contact": contact,
"Product": product,
"Channel": channel,
"Sales": sales,
"ROI": roi,
})
# Convert to DataFrame
df = pd.DataFrame(data)
# Save as .csv
df.to_csv(file_csv_path, index=False)
# Save as .xls
with pd.ExcelWriter(file_xls_path, engine='openpyxl') as writer:
df.to_excel(writer, index=False)
# Save as .xlsx
with pd.ExcelWriter(file_xlsx_path, engine='openpyxl') as writer:
df.to_excel(writer, index=False)
# Save as .sql file
with open(file_sql_path, "w") as sql_file:
sql_file.write("CREATE TABLE sales_data (\n")
sql_file.write(",\n".join([
" Date TEXT",
" Timestamp TEXT",
" Unix_Date INTEGER",
" Year INTEGER",
" Month INTEGER",
" Day INTEGER",
" FirstName TEXT",
" LastName TEXT",
" Contact TEXT",
" Product TEXT",
" Channel TEXT",
" Sales REAL",
" ROI REAL"
]))
sql_file.write("\n);\n")
for _, row in df.iterrows():
sql_file.write(f"INSERT INTO sales_data (Date, Timestamp, Unix_Date, Year, Month, Day, FirstName, LastName, Contact, Product, Channel, Sales, ROI) VALUES (\"{row['Date']}\", \"{row['Timestamp']}\", {row['Unix Date']}, {row['Year']}, {row['Month']}, {row['Day']}, \"{row['FirstName']}\", \"{row['LastName']}\", \"{row['Contact']}\", \"{row['Product']}\", \"{row['Channel']}\", {row['Sales']}, {row['ROI']});\n")
print(f"Sample data saved to: {file_csv_path}, {file_xls_path}, {file_xlsx_path}, and SQL file at {file_sql_path}")
Thank you for following along with this tutorial. We hope you found it helpful and informative. If you have any questions, or if you would like to suggest new Python code examples or topics for future tutorials/articles, please feel free to join and comment. Your feedback and suggestions are always welcome!