Exporting Clerk Users to a CSV

Hey everyone! 👋 Today, I’m excited to share a project I recently worked on.

We often need to export Clerk user data for things like mass email campaigns and whatnot. The problem is their dashboard doesn’t allow you to export. You can contact their support team, but that takes time and is somewhat focused when you’re moving away from Clerk. It’s much more efficient to do it yourself with a simple script that allows you even to export user images.

We’ll fetch user data from an API, download their profile images, and save everything into a CSV file. It’s a fun and practical exercise, perfect for anyone looking to get hands-on with Python, APIs, and data handling.

What You Need

  1. Python: Our main tool for this project.
  2. Requests Library: For making HTTP requests.
  3. CSV Library: To handle CSV file operations.
  4. A Clerk API Key: To access the user data.
  5. An image_url Column: To fetch profile images.

Setting Up the Environment

Before we dive into the code, let’s set up a virtual environment. This keeps our dependencies organized and prevents conflicts with other projects. Here’s how to do it:

  1. Navigate to Your Project Directory:
cd /path/to/your/project
  1. Create a Virtual Environment:
python3 -m venv env
  1. Activate the Virtual Environment:
source env/bin/activate
  1. Install Required Packages:
pip install requests

The Core Script: Fetching Data and Downloading Images

Now, let’s get to the main script. This script fetches user data, downloads their profile images, and saves everything into a CSV file.

import os
import requests
import csv
import time
from urllib.parse import urlparse
from pathlib import Path

def fetch_all_users(api_key):
    url = "https://api.clerk.dev/v1/users"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    offset = 0
    limit = 500
    all_users = []

    while True:
        params = {
            "limit": limit,
            "offset": offset
        }
        while True:
            response = None
            try:
                response = requests.get(url, headers=headers, params=params)
                response.raise_for_status()
            except requests.exceptions.HTTPError as err:
                if response.status_code == 429:
                    print("Rate limit exceeded. Waiting for 60 seconds before retrying...")
                    time.sleep(60)
                    continue
                else:
                    raise SystemExit(err)
            break

        data = response.json()
        all_users.extend(data)

        if len(data) < limit:
            break
        offset += limit
        time.sleep(1)  # To prevent rate limiting

    return all_users

def download_image(url, save_folder):
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()

        # Determine the file extension from the response header or URL
        content_type = response.headers.get('Content-Type', '')
        extension = ''

        if 'image/jpeg' in content_type or 'jpg' in url:
            extension = '.jpg'
        elif 'image/png' in content_type or 'png' in url:
            extension = '.png'
        elif 'image/gif' in content_type or 'gif' in url:
            extension = '.gif'
        elif 'image/webp' in content_type or 'webp' in url:
            extension = '.webp'
        else:
            print(f"Unknown image type for URL: {url}, Content-Type: {content_type}")
            return None

        # Extract filename from URL and add the extension
        parsed_url = urlparse(url)
        filename = os.path.basename(parsed_url.path)
        if not filename.endswith(extension):
            filename += extension

        save_path = os.path.join(save_folder, filename)

        # Save image to the specified folder
        with open(save_path, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192):
                file.write(chunk)

        return save_path
    except requests.RequestException as e:
        print(f"Error downloading image: {e}")
        return None

def write_to_csv(users, image_folder):
    keys = list(users[0].keys()) + ['local_image_path']
    Path(image_folder).mkdir(parents=True, exist_ok=True)

    with open('users.csv', 'w', newline='') as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()

        for user in users:
            # Download the image and get the local path
            image_url = user.get('image_url', '')
            local_image_path = None
            if image_url:
                local_image_path = download_image(image_url, image_folder)
            user['local_image_path'] = local_image_path or 'Image download failed'

            dict_writer.writerow(user)

if __name__ == "__main__":
    _api_key = os.getenv("CLERK_API_KEY")
    _users = fetch_all_users(_api_key)
    image_folder = 'images'
    write_to_csv(_users, image_folder)
    print(f"Total users fetched: {len(_users)}")

What Does This Code Do?

  1. Fetching User Data:
    • The fetch_all_users(api_key) function makes requests to the Clerk API to get user data. It handles rate limiting and pagination.
  2. Downloading Images:
    • The download_image(url, save_folder) function downloads images from the URLs and saves them in the specified folder with the correct file extension.
  3. Saving Data to CSV:
    • The write_to_csv(users, image_folder) function writes the user data to a CSV file, including the local paths of the downloaded images.

Running the Script

  1. Activate the Virtual Environment:
    • Ensure your virtual environment is activated, and the necessary packages are installed.
  2. Set the API Key:
    • Export your Clerk API key to the environment:
export CLERK_API_KEY="your_api_key_here"

3. Run the Script:

  • Execute the script:
python fetch_users.py

This will fetch the data, download the images to the images folder, and save the information in users.csv.

Final Thoughts

And that’s a wrap! This project was a great way to dive into working with APIs, handling data, and managing Python environments. We fetched user data, downloaded images, and saved everything neatly in a CSV file. If you’re looking to expand your skills, this is a practical and rewarding project to tackle.

Feel free to experiment and tweak the code to suit your needs. Happy coding! 🧑‍💻