How to generate fake data in Postgres with faker_fdw

NEW: I'm on Bluesky now!

Few months ago I wrote about how to populate a Postgres database quickly using generate_series function. It is a function that generates a set of values, from start to stop, at the interval specified by step. But in my example, the samples have values selected from a limited set of options.

In this post, I'll show you a better way to generate fake data in Postgres using faker_fdw. You can follow the installation steps from the repository of faker_fdw to install faker_fdw. But if you want to quickly give it a try, you can simply use the following Dockerfile to create a Docker image that has Postgres and faker_fdw installed:

Dockerfile

# Stage 1: Build environment
FROM postgres:16 AS builder

# Switch to root to install dependencies
USER root

# Install necessary build dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    wget \
    postgresql-server-dev-16 \
    python3 python3-dev python3-setuptools python3-pip python3.11-venv

# Create a virtual environment
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Install Faker and faker_fdw
RUN pip3 install --no-cache-dir Faker
RUN pip3 install https://github.com/guedes/faker_fdw/archive/v0.2.4.zip

# Install multicorn2
RUN wget https://github.com/pgsql-io/multicorn2/archive/refs/tags/v2.5.tar.gz && \
    tar -xvf v2.5.tar.gz && \
    cd multicorn2-2.5 && \
    make && \
    make install

# Stage 2: Runtime environment
FROM postgres:16

# Copy the built artifacts from the builder stage
COPY --from=builder /usr/lib/postgresql/16/lib/multicorn.so /usr/lib/postgresql/16/lib/
COPY --from=builder /usr/lib/postgresql/16/lib/bitcode/multicorn /usr/lib/postgresql/16/lib/bitcode/multicorn
COPY --from=builder /usr/share/postgresql/16/extension/multicorn* /usr/share/postgresql/16/extension/
COPY --from=builder /opt/venv /opt/venv

# Set environment variable for the virtual environment
ENV PATH="/opt/venv/bin:$PATH"

# Install Python and required libraries
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 \
    python3-dev \
    libpython3.11 \
    && rm -rf /var/lib/apt/lists/*

# Set the library path to include Python libraries
ENV LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}"

# Switch back to postgres user
USER postgres

# Expose the PostgreSQL port
EXPOSE 5432

# Set the default command to run when starting the container
CMD ["postgres"]

Then you can run the container with the following command:

docker run  \
    --name postgres-faker \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -p 5432:5432 \
    postgres_faker_fdw

Once the container is running, run the following query in the Postgres database:

CREATE EXTENSION multicorn;

CREATE SERVER faker_srv
 FOREIGN DATA WRAPPER multicorn
 OPTIONS (wrapper 'faker_fdw.FakerForeignDataWrapper');

Then you can use the faker_srv server to generate fake data. First, create a foreign table, then you can use the people table to generate fake data:

CREATE FOREIGN TABLE "public".people_random (
    ssn varchar,
    name varchar,
    address text,
    msisdn  text,
    email text,
    ipv4_public text
) SERVER faker_srv OPTIONS (max_results '100');

SELECT * FROM people_random;

You'll see something like this:

Image

You can use any of the provider names from Faker as the table name here to generate the data that you want. For example, you can use credit_card_number and credit_card_provider as explained in the credit_card providers document to generate credit card numbers and providers.

One thing to note is that the foreign table will return different results every time you run the query. To make it deterministic, you can either set a seed option

alter foreign table fake.person options ( add seed '1234' );

or insert the data into a normal table first

create TABLE "public".people (
    ssn varchar,
    name varchar,
    address text,
    msisdn  text,
    email text,
    ipv4_public text
);

insert into people select * from people_random;

Are you working in a team environment and your pull request process slows your team down? Then you have to grab a copy of my book, Pull Request Best Practices.

🥳 Thank you for reading until the end! Can I bother you to check my latest project DataQueue.dev? Greatly appreciate it!

How to generate fake data in Postgres with faker_fdw

Image

🔥 Popular this week

Taming React Component

How to render zoomable image with Shadcn UI and Tailwind CSS in Next.js

Use Async Local Storage to prevent props drilling in Next.js Route handlers

Running Long Jobs with Queues in Next.js using Bull and Redis

Nesktop: Make an offline desktop app with Next.js

How to test React Server Component