Walmart is a top target for price tracking and assortment monitoring, but its product pages are built with Next.js — which means the cleanest data source isn't the visible HTML, it's the __NEXT_DATA__ JSON blob the page ships with. Parsing that script tag gives you structured, reliable fields (name, price, availability) without brittle CSS selectors. The samples below fetch the page through the Bright Data Web Unlocker and read straight from __NEXT_DATA__.
Prerequisites
export PROXY_URL="http://brd-customer-<id>-zone-<unblocker_zone>:<password>@brd.superproxy.io:22225"
New to Bright Data? 7-day free trial, no credit card required. Get started →
Products are identified by their item ID from the URL: https://www.walmart.com/ip/<id>.
PHP
<?php
// Run: php walmart.php 5689919121
$proxy = getenv('PROXY_URL');
$itemId = $argv[1] ?? '5689919121';
$ch = curl_init("https://www.walmart.com/ip/$itemId");
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_PROXY => $proxy,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_TIMEOUT => 60,
CURLOPT_HTTPHEADER => ['Accept-Language: en-US,en;q=0.9'],
]);
$html = curl_exec($ch);
curl_close($ch);
// Walmart embeds all product data as JSON in a __NEXT_DATA__ script tag.
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xp = new DOMXPath($doc);
$json = $xp->query('//script[@id="__NEXT_DATA__"]')->item(0)?->textContent;
$data = json_decode($json ?? '{}', true);
$p = $data['props']['pageProps']['initialData']['data']['product'] ?? [];
$product = [
'id' => $p['usItemId'] ?? $itemId,
'name' => $p['name'] ?? null,
'price' => $p['priceInfo']['currentPrice']['price'] ?? null,
'priceString' => $p['priceInfo']['currentPrice']['priceString'] ?? null,
'availability' => $p['availabilityStatus'] ?? null,
];
echo json_encode($product, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES), PHP_EOL;
Node.js
// walmart.mjs — node walmart.mjs 5689919121
// Install: npm i axios https-proxy-agent cheerio
import axios from 'axios';
import { HttpsProxyAgent } from 'https-proxy-agent';
import * as cheerio from 'cheerio';
const agent = new HttpsProxyAgent(process.env.PROXY_URL);
const itemId = process.argv[2] ?? '5689919121';
const { data: html } = await axios.get(`https://www.walmart.com/ip/${itemId}`, {
httpsAgent: agent, proxy: false, timeout: 60_000,
headers: { 'Accept-Language': 'en-US,en;q=0.9' },
});
const $ = cheerio.load(html);
const next = JSON.parse($('#__NEXT_DATA__').text() || '{}');
const p = next?.props?.pageProps?.initialData?.data?.product ?? {};
console.log(JSON.stringify({
id: p.usItemId ?? itemId,
name: p.name ?? null,
price: p.priceInfo?.currentPrice?.price ?? null,
priceString: p.priceInfo?.currentPrice?.priceString ?? null,
availability: p.availabilityStatus ?? null,
}, null, 2));
Rust
// Cargo.toml:
// reqwest = { version = "0.12", features = ["blocking"] }
// scraper = "0.20"
// serde_json = "1"
use scraper::{Html, Selector};
use serde_json::Value;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let item_id = std::env::args().nth(1).unwrap_or_else(|| "5689919121".into());
let client = reqwest::blocking::Client::builder()
.proxy(reqwest::Proxy::all(std::env::var("PROXY_URL")?)?)
.danger_accept_invalid_certs(true)
.build()?;
let html = client
.get(format!("https://www.walmart.com/ip/{item_id}"))
.header("Accept-Language", "en-US,en;q=0.9")
.send()?
.text()?;
let doc = Html::parse_document(&html);
let sel = Selector::parse("script#__NEXT_DATA__").unwrap();
let raw = doc
.select(&sel)
.next()
.map(|e| e.text().collect::<String>())
.unwrap_or_default();
let data: Value = serde_json::from_str(&raw)?;
let p = &data["props"]["pageProps"]["initialData"]["data"]["product"];
let product = serde_json::json!({
"id": p["usItemId"].as_str().unwrap_or(&item_id),
"name": p["name"],
"price": p["priceInfo"]["currentPrice"]["price"],
"priceString": p["priceInfo"]["currentPrice"]["priceString"],
"availability": p["availabilityStatus"],
});
println!("{}", serde_json::to_string_pretty(&product)?);
Ok(())
}
Notes
- Parsing
__NEXT_DATA__is far more stable than scraping rendered HTML — the same blob also carries seller info, ratings, shipping, and variant data underproduct. - If
initialDatais ever absent, Walmart occasionally hydrates from a__PRELOADED_STATE__script instead; dump the JSON keys to confirm the current path. - For a tracker, persist
{id, price, timestamp}per run, schedule with cron, and alert on drops — see the Amazon tracker for the full pattern.
See our E-commerce Web Scraping Solutions overview, the Bright Data Web Unlocker review, and How to Avoid Getting Blocked.