Configuration
The configuration variables for the data collection process are present in two files (.env and cfg.json).
⚠️ Some of the configuration values need to be updated in both configuration files, such as the full
node_urlor fulldb_dsn. Please check if updating one file doesn't require the update of some values in the other file before you start the process.
.env
To create an .env file you can copy the provided .env.default and edit the values as needed.
Available environment variables
| ENV_VAR | Description | Default value |
|---|---|---|
PROJECT_NAME |
Prefix for container names and docker network name | "bdc" |
DATA_DIR |
Persistent data destination directory (PostgreSQL) | "./data" |
KAFKA_DATA_DIR |
Persistent data destination directory (Kafka, Zookeeper) | "./data" |
LOG_LEVEL |
logging level of consumers and producers | "INFO" |
DATA_UID |
Data directory owner ID (can be left blank) | id -u |
DATA_GID |
Data directory owner group ID (can be left blank) | getent group bdlt \| cut -d: -f3 |
N_CONSUMERS |
Number of consumers to use for each topic (blockchain) | 2 |
N_CONSUMER_INSTANCES |
Number of DataConsumer instances per consumer container | 2 |
KAFKA_N_PARTITIONS |
The number of partitions per topic | 2 * N_CONSUMERS * N_CONSUMER_INSTANCES |
SENTRY_DSN |
DSN for error monitoring via Sentry (optional) | None |
POSTGRES_PORT |
Published host port for PostgreSQL | 13338 |
POSTGRES_USER |
Username for connecting to PostgreSQL service | "username" |
POSTGRES_PASSWORD |
Password for connecting to PostgreSQL service | "postgres" |
POSTGRES_DB |
PostgreSQL default database name | "db" |
ERIGON_PORT |
Port of the erigon node | 8547 |
ERIGON_HOST |
Host of the erigon node | "host.docker.internal" |
WEB3_REQUESTS_TIMEOUT |
Timeout for every web3 request (in seconds) | 30 |
WEB3_REQUESTS_RETRY_LIMIT |
Amount of retries for each failed web3 request | 10 |
WEB3_REQUESTS_RETRY_DELAY |
Time delay between retries (in seconds) | 5 |
KAFKA_EVENT_RETRIEVAL_TIMEOUT |
Timeout before exiting consumers after not receiving any event (in seconds) | 600 |
cfg.json
The configuration json files are used for selecting the data collection mode.
A sample configuration file for the Ethereum blockchain with partial collection mode to collect USDT Transfers within blocks 13000000 and 13000020:
{
"node_url": "http://host.docker.internal:8547",
"db_dsn": "postgres://user:postgres@db_pool:6432/db",
"redis_url": "redis://redis",
"kafka_url": "kafka:9092",
"kafka_topic": "eth",
"data_collection": [
{
"mode": "partial",
"start_block": 13000000,
"end_block": 13000020,
"contracts": [
{
"address": "0xdAC17F958D2ee523a2206206994597C13D831ec7",
"symbol": "USDT",
"category": "erc20",
"events": [
"TransferFungibleEvent"
]
}
]
}
]
}
Data collection mode
"partial"= the default mode, only store the web3 data of contracts and events defined in config.json- required fields:
start_block,end_block,contracts - transaction, internal transactions and logs are stored if:
to_addressof the transaction is one of the contracts addressesaddressof any log in a transaction is one of the contracts addressescontractAddressof the transaction receipt is one of the contract addresess
- required fields:
"full"= store all web3 data (all transactions) within some block range (including internal transactions and logs)- required fields:
start_block,end_block
- required fields:
"get_logs"= producers send transactions received from theeth_getLogsRPC method to the consumers- required fields:
params(same spec as foreth_getLogs)
- required fields:
"log_filter"= (not/partially implemented)get_all_entriesmethod on web3.filter doesn't work with erigon
contracts field
The contracts field is an array of objects that describe a contract and the events that should be collected.
Example contract object for USDT for which three events are collected:
{
"address": "0xdAC17F958D2ee523a2206206994597C13D831ec7",
"symbol": "USDT",
"category": "erc20",
"events": [
"TransferFungibleEvent",
"MintFungibleEvent",
"BurnFungibleEvent"
]
}
| Field | Type | Description | Required |
|---|---|---|
| address | string | the address of a contract | Yes |
| symbol | string | symbol of the contract (only used for convenience in the config file) | No |
| category | string | category of a contract, has to be matching one of the keys defined in contract_abi.json | Yes |
| events | array of string | exhaustive list of all events that will result in data being saved in the db | Yes |
params field
The params field is an object which will be directly passed to the web3 eth_getLogs RPC method call.
Example params object: