Validating email domains in Power Query, without AI

You don't need AI or a paid validation service to flag dead emails. Power Query plus a public DNS endpoint handles the job β€” including at real-world scale.

What to use

If you've got a long list of customer emails and you need to flag the ones that aren't real, the reflex in 2026 is to reach for AI or a paid validation service. Neither is the right first tool.

For checking whether a domain actually exists and accepts mail β€” which is what a "dead email" usually means β€” Power Query handles the whole job for free. No API key, no registration, no LLM call. A public DNS endpoint and a small function are all it takes.

Here's the approach, and how to scale it once the list gets bigger.

The building block

A DNS-over-HTTPS endpoint accepts a domain name and returns whether MX (mail exchange) records exist for it. Google publishes one at dns.google/resolve that needs no authentication and returns plain JSON. Anything with MX records is a live, mail-accepting domain. Anything without is either not a real domain, a parked domain, or a dead one.

Here's a minimal M function that takes a domain and returns the lookup result:

// fnCheckMX β€” returns MX lookup result for a single domain
(domain as text) as record =>
let
Raw = Json.Document(
Web.Contents(
"https://dns.google",
[
RelativePath = "resolve",
Query = [name = domain, type = "MX"]
]
)
),
Answers = Record.FieldOrDefault(Raw, "Answer", {}),
MXAnswers = List.Select(Answers, each _[type] = 15),
HasMX = List.Count(MXAnswers) > 0
in
[ HasMX = HasMX, Status = Raw[Status], MXHost = if HasMX then MXAnswers{0}[data] else null , CheckedOn = DateTime.LocalNow() ]

That's the whole lookup. Status = 0 means the resolver answered. HasMX = true means the domain accepts mail.

The simple case β€” a small list

For a short email list, wire it up like this:

  1. Load the email list as a Power Query source.

  2. Add a column that extracts the domain: Text.Lower(Text.AfterDelimiter([Email], "@")).

  3. Call fnCheckMX on each row.

  4. Expand the returned record.

First escalation β€” dedupe before calling

Real email lists are not collections of unique domains. A list of 10,000 customer emails in the Baltics typically contains at most a few hundred unique domains β€” Gmail, inbox.lv, Hotmail, and a long tail. Calling fnCheckMX per row makes ten thousand API calls when a few hundred would have done.

Always dedupe the domains first, call the function on the distinct list, then join back:

let
Source = Excel.CurrentWorkbook(){[Name="EmailList"]}[Content],
WithDomain = Table.AddColumn(Source, "Domain",
each Text.Lower(Text.AfterDelimiter([Email], "@"))),
// Dedupe BEFORE calling the function
UniqueDomains = Table.Distinct(Table.SelectColumns(WithDomain, {"Domain"})),
Checked = Table.AddColumn(UniqueDomains, "MX", each fnCheckMX([Domain])),
Expanded = Table.ExpandRecordColumn(Checked, "MX", {"HasMX", "Status", "MXHost"}),
// Join back to the full email list
Joined = Table.NestedJoin(WithDomain, {"Domain"}, Expanded, {"Domain"}, "X", JoinKind.LeftOuter),
Final = Table.ExpandTableColumn(Joined, "X", {"HasMX", "Status", "MXHost"})
in
Final

This alone takes a 10,000-row list from a refresh that might take fifteen minutes to one that runs in under a minute. Simple change, biggest single impact.

A pitfall worth fixing β€” SERVFAIL is not NXDOMAIN

One thing worth getting right before pushing further. DNS lookups have three possible outcomes, not two:

  • Status = 0 β€” domain exists, here are the records

  • Status = 3 β€” domain does not exist (NXDOMAIN)

  • Status = 2 β€” the resolver couldn't get a clean answer (SERVFAIL)

SERVFAIL is not "no MX records." It's "I don't know right now." Soft fail, not hard fail. The minimal function above lumps SERVFAIL and NXDOMAIN together under HasMX = false, which will false-positive on perfectly good domains β€” especially on .lt, .lv, and other smaller ccTLDs where the authoritative nameservers are more flaky.

On a real client job I watched this happen to kemperiai365.lt β€” a Lithuanian camper and trailer company I'd bought a trailer from a few years back, and obviously still in business. Google's resolver returned SERVFAIL. Cloudflare's resolver returned the MX records cleanly on the next try.

The fix is a dual-resolver fallback: try Google first, and if the result is SERVFAIL, retry against Cloudflare's cloudflare-dns.com/dns-query before labelling the domain dead. For a client deliverable, bucket results three ways β€” confirmed good, confirmed dead, needs human review β€” not two. The SERVFAIL rows belong in the third bucket.

Second escalation β€” the staging cache

Removing duplicate domain names solves a lot while it most likely is not enough.

On a real job, a client handed over to me 240,000+ emails with approximatelly 12,000 unique domains. Twelve thousand calls to a free public endpoint in one session is the kind of thing that should raise questions if Google will block you or not. I pushed a test run up to around five hundred domains and stopped. It was my gut call, not a measurement, but I wasn't going to find out where the ceiling was in front of a client.

The fix is a staging cache: a persistent Excel table that remembers every domain already checked, so each refresh only hits the API for genuinely new domains.

Set up a worksheet table β€” call it CheckedDomains β€” with columns Domain, HasMX, Status, MXHost, Note, CheckedOn. Empty on day one. Then restructure the query:

  1. Extract unique domains from the full email list.

  2. Anti-join against CheckedDomains β€” this leaves only the domains not yet in the cache.

  3. Take the next batch of 100 (or 500, or whatever batch size your API patience allows).

  4. Run them through fnCheckMX.

  5. Append the results back into CheckedDomains, with today's date in CheckedOn.

  6. Join the full email list against CheckedDomains to produce the final labelled output.

  7. Make sure that the query outputs the same table as CheckedDomains table (first load it as separate table, delete the original CheckedDomains blank table, rename the new outputed table to CheckedDomains)

let
Source = Excel.CurrentWorkbook(){[Name="CheckedDomains"]}[Content],
CheckedDomains = Table.SelectRows(Source, each [Domain] <> null and [Domain] <> ""),
EmailDomainsList = Table.Distinct(Table.SelectColumns(EmailList, {"Domain"})),
AllDomains = Table.Combine({CheckedDomains, EmailDomainsList}),
UniqueDomains = Table.Distinct(AllDomains, {"Domain"}),
UncheckedDomains = Table.SelectRows(UniqueDomains, each [CheckedOn] = null),
BatchToValidate = Table.FirstN(UncheckedDomains, domains_to_validate_at_once),
BatchDomainsOnly = Table.SelectColumns(BatchToValidate, {"Domain"}),
BatchWithMX = Table.AddColumn(
BatchDomainsOnly,
"MXResult",
each try fnCheckMX_CheckedOn([Domain])
otherwise [HasMX = false, Status = "Error", MXHost = null, CheckedOn = DateTime.LocalNow()]
),
CheckedDomainsNew = Table.ExpandRecordColumn(
BatchWithMX,
"MXResult",
{"HasMX", "Status", "MXHost", "CheckedOn"},
{"HasMX", "Status", "MXHost", "CheckedOn"}
),
FinalResult = Table.Combine({CheckedDomains, CheckedDomainsNew})
in
FinalResult

Each refresh chips at the unchecked list. Silently, slowly, the cache fills up. The client doesn't have to babysit it β€” they kick off a refresh, wait for the batch, and eventually every domain in their list is covered.

Two things make this architecture pay off more than it looks like it should.

First, the top-frequency domains β€” Gmail, inbox.lv, Hotmail, a handful of Baltic ISPs β€” land in the cache within the first couple of runs and represent the majority of rows across every customer list the client will ever load. The second CSV they import barely touches the API. The third touches it even less. The cache becomes an asset.

Second, the CheckedOn column gives a clean way to keep the cache honest. Once a year, drop rows older than 365 days and let them re-check. MX records do change β€” companies migrate to Google Workspace, domains get sold, nameservers get reconfigured β€” so periodic re-verification is worth the cost. A shorter window for SERVFAIL rows specifically β€” one to seven days β€” catches transient failures without hammering the resolver.

The pattern

Any validation that depends on a free, public, external API at scale wants this same shape:

  • Dedupe before you call.

  • Check only what you haven't checked before.

  • Stamp each result with when you checked it.

  • Invalidate on a schedule that matches how often the underlying data actually changes.

This is less about architecture and more about respecting a constraint. Free public APIs are free because no single caller is hammering them. Treat the API like a bottomless well and you will find the bottom.

The pattern is not specific to DNS. The same shape applies to geocoding (Nominatim for addresses), VAT number verification (VIES), currency reference data, company registry lookups β€” anywhere a cheap external API gives you a ground-truth answer you want to apply at volume.

Where it breaks

  • The cache tells you a domain exists and accepts mail, not that a specific mailbox does. For mailbox-level verification you need a paid commercial API (ZeroBounce, Mails.so, NeverBounce). Be explicit with clients about which question you're answering.

  • Corporate firewalls sometimes block public DoH endpoints. If Web.Contents fails with a connection error rather than a DNS response, that's usually why. Cloudflare's 1.1.1.1 is blocked slightly more often than Google's dns.google in my experience β€” which is the other reason Google-primary with Cloudflare-fallback is the right default.

Where AI fits

AI is genuinely useful on the residual that falls out of this pipeline β€” typo detection on NXDOMAIN rows (gmial.com β†’ gmail.com), flagging role-based addresses that tend to bounce (info@, noreply@), categorizing business versus free-provider addresses. Those are tasks for after the structural layer has run, not replacements for it.

Running an LLM on 240,000 rows when 99% of them resolve cleanly with a dedupe, a DNS call, and a cache is slow, expensive, and less accurate than the deterministic path. The right first tool for this problem is Power Query and a public DNS resolver. AI is the fifth tool, not the first.

Download sample file