solarchive is a project to archive Solana's public transaction data, and make it freely accessible in ergonomic formats for developers, researchers, and the entire Solana community.
Today we are publishing datasets in Apache Parquet format for all user transaction history (no votes), and snapshots of tokens, and account states, all licensed under CC-BY-4.0.
Our top priority right now is to publish all historical data to date (2025/Dec), and on reducing the delay from on-chain activity to datasets you can use to under a week.
But processing and hosting hundreds of terabytes of data isn't free. Storing 2025 alone will cost us nearly $10,000 in yearly storage fees! So we need your help to make this a reality. Your donation will help us cover infrastructure costs, data engineering costs, and more!
If you're an enterprise needing expert support you can get in touch.
Here's a sample of what's in the archive, queried live with DuckDB-WASM. In these 3 queries we are loading data straight out of the latest available date for each dataset right here in your browser.
Initializing...
SELECT
block_slot,
CONCAT(SUBSTRING(block_hash, 1, 16), '...') as block_hash,
block_timestamp,
CONCAT(SUBSTRING(recent_block_hash, 1, 16), '...') as recent_block_hash,
CONCAT(SUBSTRING(signature, 1, 16), '...') as signature,
index,
fee / 1e9 as fee_sol,
status,
err,
compute_units_consumed,
accounts,
log_messages,
balance_changes,
pre_token_balances,
post_token_balances
FROM read_parquet('https://data.solarchive.org/txs/2025-12-04/000000000038.parquet')
WHERE len(balance_changes) >= 2
ORDER BY block_timestamp DESC
LIMIT 5 Ready
SELECT
block_slot,
CONCAT(SUBSTRING(block_hash, 1, 16), '...') as block_hash,
block_timestamp,
CONCAT(SUBSTRING(tx_signature, 1, 16), '...') as tx_signature,
retrieval_timestamp,
is_nft,
CONCAT(SUBSTRING(mint, 1, 24), '...') as mint,
CONCAT(SUBSTRING(update_authority, 1, 16), '...') as update_authority,
name,
symbol,
uri,
seller_fee_basis_points,
creators,
primary_sale_happened,
is_mutable
FROM (
SELECT DISTINCT ON (name) *
FROM read_parquet('https://data.solarchive.org/tokens/2025-12/000000000000.parquet')
WHERE name IS NOT NULL AND name != ''
ORDER BY name, block_slot DESC
)
ORDER BY block_slot DESC
LIMIT 5 Ready
SELECT
block_slot,
CONCAT(SUBSTRING(block_hash, 1, 16), '...') as block_hash,
block_timestamp,
CONCAT(SUBSTRING(pubkey, 1, 24), '...') as pubkey,
CONCAT(SUBSTRING(tx_signature, 1, 16), '...') as tx_signature,
retrieval_timestamp,
executable,
lamports / 1e9 as balance_sol,
CONCAT(SUBSTRING(owner, 1, 24), '...') as owner,
rent_epoch,
program,
space,
account_type,
is_native,
CONCAT(SUBSTRING(mint, 1, 24), '...') as mint,
state,
token_amount,
token_amount_decimals,
program_data,
authorized_voters,
CONCAT(SUBSTRING(authorized_withdrawer, 1, 16), '...') as authorized_withdrawer,
prior_voters,
CONCAT(SUBSTRING(node_pubkey, 1, 16), '...') as node_pubkey,
commission,
epoch_credits,
votes,
root_slot,
last_timestamp,
data
FROM (
SELECT DISTINCT ON (owner) *
FROM read_parquet('https://data.solarchive.org/accounts/2025-12/000000000000.parquet')
ORDER BY owner, lamports DESC
)
ORDER BY lamports DESC
LIMIT 5
Datasets are archived as Parquet files, partitioned by day (transactions)
or month (tokens, accounts snapshots). Alongside each partition and dataset, there is an index.json file that can help you discover what datasets are available, how many partitions are available within a dataset, and the list of all published files that belong to that partition alongside a checksum for verifying integrity of your downloads.
You can download all this data for free with any HTTP client:
| Query | URL |
|---|---|
| Index of txs for Nov 1, 2025 | https://data.solarchive.org/txs/2025-11-01/index.json |
| All txs for Nov 1, 2025 | https://data.solarchive.org/txs/2025-11-01/*.parquet |
| Account snapshots for Feb, 2023 | https://data.solarchive.org/accounts/2023-02/*.parquet |
| Token snapshots for Sep, 2024 | https://data.solarchive.org/tokens/2024-09/*.parquet |
| Specific file | https://data.solarchive.org/txs/2025-11-01/000000000014.parquet |
Each file contains vote-filtered transactions in Parquet format. You can process the raw data directly - import into DuckDB, pandas, Spark, or any other analytics tool.
For programmatic access, use the index files to discover available data:
https://data.solarchive.org/index.json - Root index (all datasets)
https://data.solarchive.org/txs/index.json - List of available days with metadata
https://data.solarchive.org/txs/YYYY-MM-DD/index.json - File list for specific day
For schema documentation, see the schema files:
https://data.solarchive.org/schemas/solana/transactions.json https://data.solarchive.org/schemas/solana/accounts.json https://data.solarchive.org/schemas/solana/tokens.json datasets libraryYour support helps us ship these features faster! π
If you find this data useful, consider supporting the project:
Or send directly to:
Solana Name Service domain β’ Any amount appreciated
index.json file to see what files are available. For example, download https://data.solarchive.org/txs/2025-11-01/index.json - this lists all parquet files for that day with their URLs. Then download each file URL from the index. You cannot use wildcards like *.parquet directly - you must read the index.json first to get the actual file URLs.index.json files for exact sizes before downloading.https://data.solarchive.org/schemas/solana/transactions.json, https://data.solarchive.org/schemas/solana/accounts.json, and https://data.solarchive.org/schemas/solana/tokens.json. The schemas include field descriptions, data types, and examples.index.json file includes file metadata like size and last modified timestamp. You can verify file integrity by comparing downloaded file sizes against the index. For critical applications, you can cross-reference specific transactions against Solana RPC nodes using the signature field.solarchive.sol or use the Solana Pay buttons throughout the site. Any amount helps cover storage, bandwidth, and processing costs. For enterprises needing dedicated support, custom data processing, or higher bandwidth access, contact leandro@abstractmachines.dev to discuss premium support options.datasets library to load Solana data directly: from datasets import load_dataset; ds = load_dataset('solarchive/solana-txs'). This will make it much easier to use the data in ML/AI workflows.index.json files periodically to see new partitions. The RSS feed lists the latest 50 partitions across all datasets (transactions, accounts, tokens) and is updated at build time.