☀️ solarchive beta

How to Extract Token Transfers

● Beginner ⏱ 10 min

One of the most common questions when working with Solana transaction data is: "How can I see token transfers that happened during a specific time period?" This guide will show you how to query SolArchive transaction data and extract token transfer information with human-readable token names using DuckDB.

Prerequisites: This guide assumes you've already downloaded SolArchive data. If you haven't, start with How to Download SolArchive Data first. You'll need both txs (transactions) and tokens (token metadata) datasets.

Understanding Token Transfers
Setting Up Your Environment
Your First Query

Understanding Token Transfers

How Solana Records Token Transfers

Unlike other blockchains where token transfers are events or logs, Solana records token transfers by capturing the state of token accounts before and after each transaction. This happens through two key fields in the transaction data:

pre_token_balances - Snapshot of all involved token accounts before the transaction
post_token_balances - Snapshot of those same token accounts after the transaction

By comparing these two snapshots, we can see exactly how token balances changed during the transaction.

Anatomy of a Token Transfer

When Alice sends 100 USDC to Bob, the transaction will show:

Alice's token account - Balance decreases by 100 USDC (pre > post)
Bob's token account - Balance increases by 100 USDC (pre < post)
Both accounts - Reference the same USDC mint address

A single transaction can involve multiple token transfers across different tokens. That's why pre_token_balances and post_token_balances are arrays - they can contain many token account changes in one transaction.

What's a mint address? On Solana, every token (like USDC, BONK, or any custom token) has a unique mint address that identifies it. Think of it like a token's ID number. For example, USDC's mint is EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v. All transfers of the same token will have the same mint address, which lets us track specific tokens across all transactions.

Key Schema Fields

Field	Description
signature	Unique transaction identifier
block_timestamp	When the transaction occurred
pre_token_balances[]	Array of token account states before execution
post_token_balances[]	Array of token account states after execution

Each token balance entry contains:

mint - The token's mint address (unique identifier for the token)
owner - The wallet address that owns this token account
amount - Token amount in the smallest unit (like lamports for SOL)
decimals - Number of decimal places for this token
account_index - Index into the transaction's accounts array

Understanding Token Amounts: Just like SOL uses lamports as its smallest unit (1 SOL = 1,000,000,000 lamports), SPL tokens also store amounts as integers in their smallest unit. The amount field contains this raw integer value. To convert to the human-readable amount, divide by 10^decimals. For example, USDC has 6 decimals, so an amount of 1,000,000 equals 1.0 USDC.

Setting Up Your Environment

For this guide, we'll use DuckDB to query the parquet files you've already downloaded. DuckDB is a fast in-process analytical database that can query parquet files directly without importing them into a separate database.

Install DuckDB

Create a new project and add DuckDB:

# Create a new project directory
mkdir token-transfers-analysis
cd token-transfers-analysis

# Initialize a new Python project with uv
uv init

# Add duckdb dependency
uv add duckdb

That's it! DuckDB can query parquet files directly, so you don't need any other dependencies.

Your First Query: See Token Names!

Now for the exciting part - let's query the data and see actual token transfers with human-readable names! Create a file called query_transfers.py:

import duckdb

con = duckdb.connect()

query = """
-- Find the 10 most recent token transfers with token names
SELECT 
    t.symbol,
    t.name,
    block_timestamp,
    SUBSTRING(signature, 1, 20) || '...' AS tx,
    pre.owner AS account,
    (post.amount - pre.amount) / POW(10, pre.decimals) AS change,
    CASE 
        WHEN post.amount > pre.amount THEN 'received'
        ELSE 'sent'
    END AS direction
FROM read_parquet('data/txs/**/*.parquet') AS txs,
    UNNEST(txs.pre_token_balances) AS pre,
    UNNEST(txs.post_token_balances) AS post
LEFT JOIN read_parquet('data/tokens/**/*.parquet') AS t
    ON pre.mint = t.mint
WHERE pre.mint = post.mint
  AND pre.account_index = post.account_index
  AND pre.amount != post.amount
  AND txs.status = 'Success'
  AND t.symbol IS NOT NULL
ORDER BY block_timestamp DESC
LIMIT 10
"""

result = con.execute(query).fetchdf()
print(result.to_string())
con.close()

Run it:

uv run query_transfers.py

You'll see output like:

   symbol  name              block_timestamp      tx                    account               change     direction
0  USDC    USD Coin          2025-12-01 23:59:58  5J9Kf2h8Ld3m9...  7vHcNqP2xT4gRk9s...   -1250.50   sent
1  BONK    Bonk              2025-12-01 23:59:55  8mP3Lk9Fn2sD7...  3jR8nQ4mK7pL2sT5...   50000000   received
2  SOL     Wrapped SOL       2025-12-01 23:59:52  2nF4sK8Lm3pQ9...  9tY5rN8jK4mL7pS3...   2.75       received
3  JUP     Jupiter           2025-12-01 23:59:48  6kL2mN9Fp4sR8...  4sW7tN3jP8mK5rL2...   -15.25     sent
4  PYUSD   PayPal USD        2025-12-01 23:59:45  1xM8pN2Lk9Fq3...  2sT9vN4jK7mL5pQ8...   500.00     received
5  RAY     Raydium           2025-12-01 23:59:42  7kP3mN8Lp2sR9...  8tW5rN3jP9mK4sL7...   -125.50    sent
6  ORCA    Orca              2025-12-01 23:59:38  3nF5sK7Lm4pQ8...  1yQ6rN2jK8mL3pS5...   75.25      received
7  USDT    Tether USD        2025-12-01 23:59:35  9mL4pN3Fk8sR2...  5xW9tN7jP4mK2sL6...   -2000.00   sent
8  MNGO    Mango             2025-12-01 23:59:30  2kP6mN5Lp9sR4...  7tY4rN8jK3mL9pQ2...   1500.00    received
9  SRM     Serum             2025-12-01 23:59:25  4nF2sK9Lm7pQ5...  3yQ8rN5jK2mL6pS4...   -50.00     sent

That's it! You can now see which tokens (USDC, BONK, SOL, JUP) were transferred, how much, and in which direction - all with human-readable names instead of cryptic mint addresses.

Understanding the Query

Let's break down what's happening:

Read both datasets - We read transactions from data/txs/ and tokens from data/tokens/
Unnest the arrays - UNNEST() expands pre_token_balances and post_token_balances into rows
Match pre/post - We match balances that reference the same mint and account index
Join with tokens - We join with the tokens table on the mint address to get symbol and name
Calculate change - Divide by 10^decimals to get the human-readable amount
Filter - Only show transfers where the amount actually changed and the token is known

Conclusion

Congratulations! You've successfully learned how to work with SolArchive's Solana blockchain data. Let's recap what you've accomplished:

Understanding Solana's data structure - You now know how token transfers are recorded through pre_token_balances and post_token_balances
Queried real blockchain data - You have 270+ GB of actual Solana transactions and token metadata from late 2025
Wrote your first query - You can now see token transfers with human-readable names like USDC and BONK instead of cryptic mint addresses

What makes this powerful is that you're not relying on centralized APIs or third-party services. You have the complete, raw blockchain data stored locally. You can query it however you want, join it with other datasets, and analyze it without rate limits or restrictions.

From here, you can:

Filter by specific tokens (WHERE t.symbol = 'USDC')
Track specific wallets across time
Calculate volume metrics and aggregate statistics
Join with the accounts dataset to analyze program interactions
Build your own custom analytics, dashboards, or research tools

The same pattern applies to all SolArchive datasets - download the parquet files, query with DuckDB, and unlock insights from the blockchain. Happy analyzing!