Scrapingforge logo
data-parsing·

How to Parse Dynamic CSS Classes When Web Scraping

Learn how to scrape websites with dynamic CSS class names that change on every page load. Use semantic HTML, data attributes, XPath, and structural selectors.

The Problem with Dynamic CSS Classes

You build a scraper that works perfectly. Two days later the site deploys and all your selectors break. The class that was product-title is now css-1k7j2h9. Welcome to CSS in JS.

Modern frameworks like Styled Components, Emotion, and CSS Modules generate unique class names at build time. Every deployment can produce new hashes. Your class based selectors become useless.

Here's what these dynamic classes look like in the wild. Styled Components produces things like sc-bdVaJa fKHaBc. CSS Modules generates Price_container__3xK7q. Emotion uses css-1dbjc4n r-1awozwy. None of these are stable and none of them should be in your selectors.

Reliable Selector Strategies

Target Semantic HTML Elements

HTML5 introduced semantic elements and smart developers use them. Instead of chasing random class names, look for the structure. Tags like article, main, header, section, and aside give you anchors that don't change.

from bs4 import BeautifulSoup

html = '''
<main>
    <article class="css-xyz123">
        <header class="css-abc789">
            <h1 class="css-title99">MacBook Pro 14"</h1>
            <span class="css-price12">$1,999</span>
        </header>
        <section class="css-desc45">
            <p>M3 Pro chip with 11-core CPU</p>
        </section>
    </article>
</main>
'''

soup = BeautifulSoup(html, 'lxml')

# Ignore classes entirely, use semantic structure
article = soup.find('article')
title = article.find('h1').get_text(strip=True)
price = article.find('header').find('span').get_text(strip=True)
description = article.find('section').find('p').get_text(strip=True)

print(f"{title}: {price}")
print(description)

Use Data Attributes

Data attributes are gold for scrapers. Developers add them for JavaScript functionality and automated testing. They rarely change because changing them breaks the site's own code.

Look for attributes like data-testid, data-product-id, data-price, or data-sku. These are meant to be stable identifiers.

from bs4 import BeautifulSoup

html = '''
<div class="sc-fzXfNJ kLPsaK" data-testid="product-card" data-product-id="12345">
    <img data-testid="product-image" src="/img/laptop.jpg" alt="Laptop">
    <div data-testid="product-details">
        <span data-testid="product-name">Gaming Laptop</span>
        <span data-testid="product-price" data-currency="USD">1299.00</span>
        <span data-testid="product-rating" data-score="4.5">4.5 stars</span>
    </div>
    <button data-action="add-to-cart" data-sku="GL-2024">Add to Cart</button>
</div>
'''

soup = BeautifulSoup(html, 'lxml')

card = soup.find(attrs={'data-testid': 'product-card'})
name = card.find(attrs={'data-testid': 'product-name'}).text
price_el = card.find(attrs={'data-testid': 'product-price'})
price = price_el.text
currency = price_el.get('data-currency')
rating = card.find(attrs={'data-testid': 'product-rating'}).get('data-score')
sku = card.find(attrs={'data-action': 'add-to-cart'}).get('data-sku')

print(f"SKU: {sku}, Name: {name}, Price: {currency} {price}, Rating: {rating}")

XPath for Complex Navigation

XPath gives you precise control when you need to find elements based on their position, text content, or relationship to other elements. It's particularly useful when you can identify a label and need to grab the value next to it.

from lxml import html

page_content = '''
<div class="css-random1">
    <div class="css-random2">
        <span>Regular Price:</span>
        <span class="css-random3">$299.99</span>
    </div>
    <div class="css-random4">
        <span>Sale Price:</span>
        <span class="css-random5">$199.99</span>
    </div>
    <div class="css-random6">
        <span>In Stock:</span>
        <span class="css-random7">Yes</span>
    </div>
</div>
'''

tree = html.fromstring(page_content)

# Find span that comes after "Sale Price:" text
sale_price = tree.xpath("//span[contains(text(), 'Sale Price:')]/following-sibling::span/text()")[0]

# Find span after "In Stock:"
stock_status = tree.xpath("//span[contains(text(), 'In Stock:')]/following-sibling::span/text()")[0]

# Find all price values (spans that contain $)
all_prices = tree.xpath("//span[contains(text(), '$')]/text()")

print(f"Sale: {sale_price}, Stock: {stock_status}")
print(f"All prices: {all_prices}")

Match Partial Class Names with Regex

Sometimes classes have a stable prefix or suffix with only part being randomized. CSS Modules often produces classes like ProductCard_title__d4e5f where ProductCard_title stays constant but the hash changes. You can match the stable part with regex.

from bs4 import BeautifulSoup
import re

html = '''
<div class="ProductCard_wrapper__a1b2c ProductCard_featured__x9y8z">
    <h2 class="ProductCard_title__d4e5f">Wireless Headphones</h2>
    <div class="ProductCard_pricing__g7h8i">
        <span class="ProductCard_originalPrice__j1k2l">$199</span>
        <span class="ProductCard_salePrice__m3n4o">$149</span>
    </div>
    <div class="ProductCard_specs__p5q6r">
        <span class="ProductCard_spec__s7t8u">40hr battery</span>
        <span class="ProductCard_spec__v9w0x">Active noise canceling</span>
    </div>
</div>
'''

soup = BeautifulSoup(html, 'lxml')

wrapper = soup.find('div', class_=re.compile(r'^ProductCard_wrapper'))
title = soup.find('h2', class_=re.compile(r'^ProductCard_title')).text
sale_price = soup.find('span', class_=re.compile(r'^ProductCard_salePrice')).text
specs = soup.find_all('span', class_=re.compile(r'^ProductCard_spec__'))

print(f"Product: {title}")
print(f"Price: {sale_price}")
print(f"Specs: {[s.text for s in specs]}")

Extract JSON from Script Tags

Many sites embed product data as JSON in the page. This is often more reliable than parsing visible HTML because it's structured data meant for the application itself. Look for JSON-LD schema markup and JavaScript state objects like __INITIAL_STATE__ or __NEXT_DATA__.

from bs4 import BeautifulSoup
import json
import re

html = '''
<html>
<head>
    <script type="application/ld+json">
    {
        "@context": "https://schema.org",
        "@type": "Product",
        "name": "Ergonomic Office Chair",
        "description": "Adjustable lumbar support",
        "sku": "EOC-2024",
        "offers": {
            "@type": "Offer",
            "price": "449.00",
            "priceCurrency": "USD",
            "availability": "https://schema.org/InStock"
        }
    }
    </script>
    <script>
        window.__INITIAL_STATE__ = {"products":[{"id":1,"name":"Chair","price":449}]};
    </script>
</head>
<body>
    <div class="css-gibberish">Product content here</div>
</body>
</html>
'''

soup = BeautifulSoup(html, 'lxml')

# Parse JSON-LD structured data
json_ld = soup.find('script', type='application/ld+json')
if json_ld:
    product = json.loads(json_ld.string)
    print(f"From JSON-LD: {product['name']} - ${product['offers']['price']}")

# Extract embedded JavaScript state
scripts = soup.find_all('script')
for script in scripts:
    if script.string and '__INITIAL_STATE__' in script.string:
        match = re.search(r'__INITIAL_STATE__\s*=\s*({.*?});', script.string)
        if match:
            state = json.loads(match.group(1))
            for prod in state['products']:
                print(f"From state: {prod['name']} - ${prod['price']}")

Combine Multiple Strategies with Fallbacks

Real scrapers need resilience. Build a function that tries multiple strategies in order. If the site changes and breaks one approach, the fallbacks keep things working.

from bs4 import BeautifulSoup
import re

def extract_price(soup):
    """Try multiple strategies to find the price"""

    el = soup.find(attrs={'data-testid': 'price'})
    if el:
        return el.get_text(strip=True)

    el = soup.find(attrs={'data-price': True})
    if el:
        return el.get('data-price')

    el = soup.find(attrs={'itemprop': 'price'})
    if el:
        return el.get('content') or el.get_text(strip=True)

    el = soup.find(class_=re.compile(r'(price|Price|cost|Cost)'))
    if el:
        return el.get_text(strip=True)

    text = soup.get_text()
    match = re.search(r'\$[\d,]+\.?\d*', text)
    if match:
        return match.group()

    return None

test_cases = [
    '<span data-testid="price">$99.99</span>',
    '<div data-price="149.99">See price in cart</div>',
    '<span itemprop="price" content="199.99">$199.99</span>',
    '<span class="ProductPrice_value__abc">$249.99</span>',
    '<div class="info">Great deal at $299.99 today</div>',
]

for html in test_cases:
    soup = BeautifulSoup(html, 'lxml')
    print(f"Found: {extract_price(soup)}")

Professional Solutions

For production scraping where you can't afford selector breakage, ScrapingForge handles this automatically.

import requests

response = requests.get(
    "https://api.scrapingforge.com/v1/scrape",
    params={
        'api_key': 'YOUR_API_KEY',
        'url': 'https://shop.example.com/product/12345',
        'render_js': 'true',
        'extract_rules': {
            'name': {'selector': 'h1', 'type': 'text'},
            'price': {'selector': '[data-testid="price"], [itemprop="price"], .price', 'type': 'text'},
            'image': {'selector': 'img[data-testid="product-image"]', 'type': 'attribute', 'attribute': 'src'}
        }
    }
)

data = response.json()

The API uses AI to identify product data regardless of class names. When selectors fail, it falls back to visual and semantic analysis.

What to Remember

Never trust class names alone on modern JavaScript sites. Always check for JSON-LD first because it's structured and reliable. Data attributes used by the site's own JavaScript are your best friends. Use semantic HTML tags when classes are random. Build selector chains with multiple fallback strategies so one breaking doesn't kill your scraper. Monitor for extraction failures and alert when things break.

If the HTML itself is missing, check Why Your Scraper Doesn't See the Data. For cleaning up the text after you find elements, see How to Turn HTML to Text. When content loads dynamically, the JavaScript Rendering Issues guide helps. And if bot detection is breaking your selectors, read about CAPTCHA Blocking.

From the blog, the Ecommerce Web Scraping Guide covers handling product pages at scale, and How to Bypass CreepJS Fingerprinting explains how to avoid detection on protected sites.