Most WhatsApp teams in India can tell you how many messages they sent last month. Far fewer can tell you, without a fight, how many were actually read, how many drove a reply, which campaign and template each one belonged to, and what every one of them cost — all from a single trustworthy place. That gap is not a reporting problem; it is a data-engineering problem. The webhook events Meta fires for every send, delivery, read and reply are a rich event stream, but they arrive as fragile, out-of-order JSON that vanishes if you only ever read them to update a chat UI. This is a build blueprint for the data team and the founder who sponsors them: how to capture WhatsApp message events from the webhook, move them through a queue, land them in a warehouse on a clean star schema, define the metrics so "delivered" never gets confused with "read" or "engaged" again, join billing events back to campaigns and templates for honest cost-per-outcome, and design the whole thing to be DPDP-safe with sane retention and pseudonymisation. The aim is practical enough that a data engineer can start building this week — and honest about its limits: every Meta payload shape, message category and regulatory specific here is directional and must be verified against the official Meta and DPDP sources as of 2026. This is general engineering and compliance guidance, not legal advice.
Why message events deserve a real pipeline
A WhatsApp conversation generates a stream of discrete events: a template goes out and Meta acknowledges it with a message ID, then fires sent, delivered and read status callbacks against that ID, and separately an inbound webhook lands when the contact replies. If your application only consumes those callbacks to flip a tick from grey to blue in the inbox, you are throwing away the exact data that answers every commercial question a founder asks — what is our real read rate, which template earns replies, what did this campaign cost per conversation, is our 24-hour-window spend going up or down. The reason a pipeline beats ad-hoc SQL over your operational tables is threefold. First, webhooks are unreliable and unordered: Meta may retry, deliver a read before you have processed the delivered, or send duplicates, so you need an idempotent landing zone, not a live UPDATE. Second, your operational database is tuned for serving chats, not for scanning millions of rows to answer "engaged rate by template by week" — that analysis belongs in a warehouse. Third, metrics drift the moment two people compute them two ways; a single modelled fact table is the only durable cure. Treat the message-event stream as a first-class data product with an owner, a schema and tests — the same discipline you would give billing — and the reporting questions stop being arguments. For which numbers this pipeline ultimately needs to serve, our companion guide to WhatsApp campaign KPIs and metrics covers the "which KPIs matter" question this blueprint exists to feed.
The architecture: webhook to queue to warehouse
The backbone is a classic, boring, reliable shape — and boring is the point. (1) Ingest. A thin webhook endpoint receives Meta's status and inbound callbacks, validates the signature, and does the absolute minimum: write the raw payload to an append-only landing store (a raw_events table or object storage) with a received-at timestamp and a hash for deduplication, then return 200 fast so Meta does not retry. Do no business logic here. (2) Queue. Push a lightweight job per event onto a durable queue (Redis, SQS or similar) so a spike of delivery callbacks during a campaign blast cannot overwhelm the endpoint or the warehouse — the queue is your shock absorber and your retry mechanism. (3) Transform. Workers pull jobs, parse the payload into typed columns, resolve the message ID to its campaign, template and contact, and write to staging. (4) Load and model. A scheduled batch (every few minutes for near-real-time, or hourly) upserts staging into the warehouse fact and dimension tables, deduplicating on the event hash so a replayed webhook is harmless. (5) Serve. BI tools and the exec dashboard read only the modelled tables, never the raw stream. The golden rule across all five stages is idempotency: because Meta can and will redeliver, every write must be safe to run twice, keyed on a natural identity like (wamid, status). Build that in from line one and replays become a non-event instead of a double-counted disaster.
The core idea in one line: capture every WhatsApp webhook event into an append-only raw store, buffer through a durable queue, model it into a warehouse star schema keyed for idempotency, and define your metrics once — so "delivered", "read", "engaged" and "billable" mean exactly one thing and every cost ties cleanly back to a campaign and template. The pipeline is the asset; the dashboard is just its newest reader.
Raw events and what they actually mean
Before modelling anything, get precise about the raw events, because half of all WhatsApp reporting errors come from misreading a payload. The table maps the common event types to their meaning and the warehouse table they feed. Treat the exact field names and payload shapes as directional and verify them against Meta's current webhook reference as of 2026 — Meta evolves the schema.
| Raw event | What it actually means | Warehouse destination |
|---|---|---|
| Message sent (accepted) | Meta accepted your outbound and issued a message ID — not yet on the device | fact_messages (one row per outbound, status = sent) |
| Status delivered | The message reached the recipient's device — says nothing about whether a human saw it | fact_message_status (delivered timestamp on the message) |
| Status read | The recipient opened the chat — only if they have read receipts on; absence is not proof of non-read | fact_message_status (read timestamp) |
| Status failed | Delivery failed (invalid number, block, policy) with an error code worth keeping | fact_message_status (failed + error_code) |
| Inbound message | The contact sent you something — a reply, a button tap, a Flow submission | fact_messages (inbound row) + reply linkage |
| Conversation / pricing event | A billable conversation or category signal Meta attaches for charging | fact_billing_events (joined to the message/campaign) |
The single most expensive misconception lives in this table: delivered and read are not interchangeable, and read is not even a reliable proxy for attention because a recipient with read receipts disabled generates no read event despite having read the message. Engineers who collapse these into one "seen" number quietly mislead the whole company. Keep every status as its own timestamped column on the message, never overwrite an earlier status with a later one, and let the metrics layer decide what to call "engaged".
The star schema: one fact, a few dimensions
Model the stream as a star: a central fact table of message-grained events surrounded by conformed dimensions you join for slicing. A workable core schema looks like this. fact_messages — one row per message (outbound or inbound), with the WhatsApp message ID (wamid) as the natural key, direction, the sent/delivered/read/failed timestamps (or a companion fact_message_status if you prefer status-grained rows), an error code, and foreign keys to every dimension. dim_contact — the recipient, but pseudonymised (see the DPDP section): a surrogate key, a hashed phone identifier, and coarse non-identifying attributes only. dim_template — template name, language, category (marketing/utility/authentication) and version, so you can compare templates honestly. dim_campaign — the broadcast or journey the message belonged to, with its objective and owner. dim_date (and a time-of-day dimension) — the standard calendar dimension for trend and cohort analysis. fact_billing_events — a second fact at the conversation/charge grain, carrying the cost and category, with foreign keys back to the campaign and template so spend is sliceable the same way volume is. Two design notes that save pain later: make wamid the deduplication key so replayed webhooks upsert rather than duplicate, and keep dimensions conformed — one dim_campaign shared by fact_messages and fact_billing_events — so cost and outcome always join on the same campaign identity. That conformance is what makes the cost-allocation join in the next section trivial instead of a reconciliation nightmare.
Canonical metric definitions: delivered, read, engaged, billable
This is the section that ends the meetings where two dashboards disagree. Write these definitions down once, in the semantic layer or a dbt model, and forbid anyone from redefining them in a BI tool. The table is the canonical contract; the denominators matter as much as the numerators.
| Metric | Canonical definition (illustrative) | Common mistake it prevents |
|---|---|---|
| Sent | Count of outbound messages Meta accepted (has a wamid) | Counting attempted-but-rejected sends as sent |
| Delivered rate | Messages with a delivered status ÷ messages sent | Treating delivered as "seen" by a person |
| Read rate | Messages with a read status ÷ messages delivered — flagged as a floor, since read-receipts-off recipients never emit a read | Reading a low read rate as low attention when it is just disabled receipts |
| Engaged / reply rate | Distinct conversations with an inbound reply, button tap or Flow submit ÷ outbound conversations started | Conflating a passive read with active engagement |
| Billable conversations | Count from fact_billing_events, by category — not the raw message count | Estimating cost from message volume instead of Meta's conversation charges |
| Failure rate | Failed-status messages ÷ sent, broken out by error code | Hiding a number-quality or policy problem inside an average |
Three principles make these stick. First, delivered ≠ read ≠ engaged ≠ billable — four different denominators and four different business meanings; never let a slide blur them. Second, every rate must publish its denominator next to it, because a 70% read rate over delivered and over sent are different claims. Third, mark read rate explicitly as a lower bound in the metric description so nobody panics over a structural undercount. The deeper measurement question — whether a campaign actually caused incremental sales rather than just correlating with engaged contacts — sits above this layer; our guide to WhatsApp marketing incrementality measurement covers the holdout-and-lift methods that turn these engagement metrics into causal claims.
Get a 1-minute BSP audit on WhatsApp
Drop your WhatsApp number — we line-item your current invoice against Meta India rates in under 60 seconds. India-hosted, DPDP-compliant.
Cost-allocation joins: tying spend to campaigns and templates
The question founders ask that most WhatsApp setups cannot answer is "what did this campaign cost, and what was its cost per reply?" The pipeline answers it because billing arrives as events too. Meta charges per conversation by category, and those charge signals land in fact_billing_events. Because that fact carries the same conformed dim_campaign and dim_template foreign keys as fact_messages, the allocation is a straight join: aggregate billing cost by campaign, aggregate engaged conversations by the same campaign, and divide for an honest cost-per-engaged-conversation. The same join by dim_template tells you which template earns its keep and which burns marketing-category spend for no reply. A few honest cautions. Conversation-based billing means cost does not map one-to-one to messages — many messages inside one 24-hour window can be a single charged conversation — so always cost from fact_billing_events, never by multiplying message count by a rate. Keep utility, authentication and marketing categories separate in the cost rollup, because their economics differ sharply and blending them hides where money goes. And verify Meta's current conversation-pricing model and category definitions as of 2026, since Meta has revised this structure before and may again. Done right, unit economics become a dashboard tile instead of a quarterly spreadsheet exercise — for the wider framing, see our guide to WhatsApp cost optimisation and unit economics.
Build checklist (directional): 1) Thin signed webhook endpoint that writes raw payloads append-only and returns 200 fast. 2) Durable queue between ingest and transform as a shock absorber and retry layer. 3) Idempotent upserts keyed on (wamid, status) so replays never double-count. 4) Star schema — fact_messages + fact_billing_events + conformed dim_contact (pseudonymised), dim_template, dim_campaign, dim_date. 5) One canonical metrics model where delivered, read, engaged and billable are defined once with explicit denominators. 6) Cost-allocation joins billing to campaign and template on conformed keys. 7) Retention and pseudonymisation policy enforced in the warehouse, not bolted on later. Verify every Meta payload, category and DPDP specific as of 2026.
DPDP-safe design: retention and pseudonymisation
A WhatsApp event warehouse is, by definition, a store of personal data — phone numbers are identifiers, and message metadata is linked to identifiable people — so India's Digital Personal Data Protection framework applies, and the honest posture is to build privacy into the schema rather than apologise for it later. Verify the operative DPDP Rules and your obligations with qualified counsel as of 2026; the following is directional engineering practice, not legal advice. Pseudonymise at the boundary. The phone number should not sit in plain text across your analytical tables. Hash it (with a keyed hash) into a stable surrogate in dim_contact so joins and cohorting still work, keep any necessary reversible mapping in a separately access-controlled vault, and let analysts work only against the pseudonymised key. Practise purpose limitation and minimisation. The warehouse exists for delivery analytics and cost allocation — it does not need message body text to compute read rates, so do not haul content into analytics by default; store the metadata the metrics actually require and leave the rest in the operational system under its own controls. Set retention windows by data class. Raw payloads, modelled facts and any reversible identity mapping should each have an explicit time-to-live, after which they are purged or further de-identified. The table below is an illustrative retention model to adapt with counsel.
| Data class | Example contents | Illustrative retention posture (verify as of 2026) |
|---|---|---|
| Raw webhook payloads | Full JSON as received, for replay and debugging | Short window (e.g. weeks), then purge — keep only the modelled facts |
| Pseudonymised facts | fact_messages with hashed contact key, timestamps, FKs | Longer analytical window; no plain-text PII so lower sensitivity |
| Reversible identity map | Hash-to-phone mapping in a controlled vault | Tightly access-controlled, shortest justifiable retention, audited access |
| Message body / content | The text a contact sent or received | Generally keep out of the analytics warehouse; retain in ops only as needed |
| Billing facts | Conversation charges by category and campaign | Retain for finance/audit needs; aggregate, low direct-PII sensitivity |
The point is that retention and pseudonymisation are schema decisions, not afterthoughts — design the data classes and their lifecycles before the first event lands, and a data-deletion or access request becomes a routine query against a known location instead of a frantic hunt across systems.
The one-page exec dashboard spec
All this engineering exists to make a founder's single screen trustworthy. Resist the urge to expose forty tiles; a useful exec view is six to eight numbers, each with its denominator and a trend. A directional spec: (1) Messages sent this period vs last, split utility / marketing / authentication. (2) Delivered rate with its denominator stated. (3) Read rate labelled as a floor. (4) Engaged/reply rate — the number that actually tracks whether anyone is responding. (5) Failure rate with the top error code, as an early-warning of number-quality or policy trouble. (6) Billable conversations and total spend, by category, straight from fact_billing_events. (7) Cost per engaged conversation — the unit-economics headline, from the cost-allocation join. (8) Top campaigns/templates by reply rate and by cost, so wins and waste are both visible. Two rules keep it honest: every tile names its denominator and time window so no one misreads it, and every tile traces to the canonical metric model — if a number cannot be derived from the modelled tables, it does not belong on the page. Built this way, the dashboard updates itself as new events flow through the pipeline, and the founder stops asking "where did this number come from" because the lineage is the pipeline. For how this data ties into the broader contact and pipeline system, our comparison of the best WhatsApp CRM in India covers the operational side these analytics sit beside.
How RichAutomate fits — honestly scoped
A blueprint needs an event source, and that is where a platform earns its place: RichAutomate runs on the official Meta WhatsApp Business API and emits exactly the send, delivery, read, reply and billing signals this pipeline consumes — a clean, categorised stream your data team can land in the warehouse above. It gives you the no-code campaign and template structure that makes dim_campaign and dim_template meaningful, a shared inbox and Flows that generate the inbound and submission events your engagement metrics count, and per-message billing transparency that maps to fact_billing_events. Commercially there is no platform tax to muddy your unit economics: ₹0 platform fee, ₹0 setup, ₹0 monthly, pay per message only. On Client Pay that is ₹0.10 per message with Meta's conversation charges billed to you directly by Meta — the lowest software markup and the cleanest cost data, since Meta's charges arrive straight from Meta. On SaaS Pay it is ₹1.20 per marketing message and ₹0.30 per utility/authentication message, all-in with Meta's charge absorbed — one predictable per-message number to model against. New teams start on a 14-day free trial with 100 credits, enough to generate real events and prototype the warehouse before committing. What the platform does not do, stated plainly: it is not your warehouse, your metrics layer or your compliance programme — it is a well-behaved event source. The pipeline, the canonical definitions, the retention policy and the DPDP obligations remain your data team's to own. Model your spend on the pricing page, and verify Meta's current category charges as of 2026.
This article is general engineering and product guidance for data teams, analytics engineers and founders, not legal, compliance or data-protection advice. Meta's WhatsApp Business webhook payload shapes, status and event types, message categories and conversation-based pricing, and India's DPDP Act and Rules including pseudonymisation, retention and data-subject-request obligations all change, and every specific here — the event field names, the status semantics, the message categories, the metric definitions and denominators, the cost-allocation mechanics, and the retention windows and data classes — is illustrative and directional and must be verified against the official Meta documentation, the DPDP Rules and qualified legal advice as of 2026. The cost figures use real RichAutomate per-message rates with illustrative volumes and metric examples; your actual bill and your actual metrics depend on your message mix and Meta's current charges. A platform provides a clean event source and features that help; it is not a warehouse, a metrics layer or a compliance programme, and responsibility for the pipeline, the definitions, consent, retention and pseudonymisation remains yours. RichAutomate's ₹0 platform / ₹0 setup / ₹0 monthly posture, Client Pay ₹0.10/message with Meta billed to you directly, SaaS Pay ₹1.20 marketing / ₹0.30 utility-auth, and 14-day trial with 100 credits are current as described but should be confirmed on the pricing page. Verify everything before you rely on it.
Get a clean WhatsApp event stream worth building a warehouse on
RichAutomate runs on the official Meta WhatsApp Business API and emits the send, delivery, read, reply and billing signals your data pipeline needs — with a no-code campaign and template structure that makes campaign and template attribution meaningful, a shared team inbox and Flows that generate real engagement events, and transparent per-message billing you can model your unit economics against. It is a well-behaved event source, not your warehouse or your compliance programme. ₹0 platform fee, ₹0 setup, ₹0 monthly — pay per message only: Client Pay ₹0.10/msg with Meta's conversation charges billed to you directly by Meta, or SaaS Pay ₹1.20 marketing / ₹0.30 utility-auth. 14-day free trial with 100 credits. See full pricing, WhatsApp us at 917434901027, or book a 30-minute walkthrough at https://calendly.com/inrichdaddy/30min.