How to extract headings from SERP for any keyword

Learn how to extract H1, H2, H3, and H4 headings from the top-ranking pages for any keyword.
Picture of Sk Rafiqul Islam

Sk Rafiqul Islam

SEO consultant and co-founder of Accrue SERP

👉 Schedule a 15-minute call with me to learn how our SEO services can scale your business.

This Google Colab script will save time for every SEO professional.

Here’s what it does:

This extracts the H1, H2, H3, and H4 headings of the top 10 search results for your keyword in a chosen country, and lets you download the data as a CSV file.

Purpose

This SOP outlines the steps to extract H1, H2, and H3 headings from the top 10 search results for a specific keyword in a chosen country using a Google Colab script.

The extracted data can be downloaded as a CSV file for content analysis and creation.

Prerequisites

  1. Create a free account with SearchAPI to obtain an API key.
  2. Access the Google Colab script via the provided link: https://colab.research.google.com/drive/1xIe-AHR-5T7AsNB2MxG26n4Gau_Px8hj?usp=sharing 

Here’s the step-by-step process to extract headings

Step 1: Copy the Script

Open the Google Colab link 

Step 2: Run the script in Google Colab

Run the first cell to install the required libraries.

Once the required library is installed, you will be asked to enter your SERP API key code.

Step 3: Enter Your API Key

  1. Go to your dashboard in SERP API.
  2. Copy the API key code
  3. In the next cell, enter your SerpAPI key in Google Colab.

Step 4: Input Your Keyword and select the country

When prompted, enter the keyword you want to analyze and the country code corresponding to your target audience (e.g., ‘us’ for the United States).

Step 5: Process the Search Results

  1. The script will retrieve your keyword and country’s top 10 search results.
  2. It will extract the H1, H2, and H3 headings from each result.
  3. The extracted data will be compiled into a pandas DataFrame.

Step 6: Save the Results

  1. The script will save the DataFrame to a CSV file named “[keyword]_[country]_headings.csv“.
  2. A download link for the CSV file will be provided. Or, it will start downloading your CSV file automatically.

Step 7: Export to Google Sheet

To analyze the data better, I’d recommend importing the CSV file into a Google Sheet and checking the headings used by top-ranking pages.

Here’s a sample data:

Use case

The extracted headings can be used for content analysis and creation:

  1. Identify key topics and themes covered by top-ranking pages.
  2. Create content outlines that align with SEO intent.
  3. Improve your information gain by following a more detailed structure than top pages.

Troubleshooting

If you encounter any issues during the process, ensure that:

  1. Your SerpAPI key is valid and has not expired.
  2. Your internet connection is stable and allows access to Google Colab and the SerpAPI.
  3. You have entered the correct keyword and country code.
  4. Sometimes, you might see some results show no heading. This is likely because scraping is not possible for these websites.

Explanation of the Code

!pip install requests beautifulsoup4 google-search-results pandas

import requests
from bs4 import BeautifulSoup
from serpapi import GoogleSearch
import pandas as pd
from google.colab import files

def extract_headings(url):
    try:
        response = requests.get(url, timeout=10)
        soup = BeautifulSoup(response.text, 'html.parser')

        h1 = [h.text.strip() for h in soup.find_all('h1')]
        h2 = [h.text.strip() for h in soup.find_all('h2')]
        h3 = [h.text.strip() for h in soup.find_all('h3')]
        h4 = [h.text.strip() for h in soup.find_all('h4')]  # Added H4 extraction

        return {
            'H1': '\n'.join(h1),
            'H2': '\n'.join(h2[:5]),  # Limit to first 5 H2 headings
            'H3': '\n'.join(h3[:5]),  # Limit to first 5 H3 headings
            'H4': '\n'.join(h4[:5])   # Limit to first 5 H4 headings
        }
    except Exception as e:
        print(f"Error extracting headings from {url}: {str(e)}")
        return {'H1': 'N/A', 'H2': 'N/A', 'H3': 'N/A', 'H4': 'N/A'}

def get_search_results(keyword, country, api_key):
    params = {
        "engine": "google",
        "q": keyword,
        "api_key": api_key,
        "num": 10,
        "gl": country
    }

    try:
        search = GoogleSearch(params)
        results = search.get_dict()
        return results.get('organic_results', [])
    except Exception as e:
        print(f"Error getting search results: {str(e)}")
        return []

def main():
    api_key = input("Enter your SerpAPI key: ")
    keyword = input("Enter your keyword: ")
    country = input("Enter the country code (e.g., us, uk, de): ")

    print(f"\nSearching for '{keyword}' in '{country}'")
    results = get_search_results(keyword, country, api_key)

    if not results:
        print("No results found. Please check your API key and search parameters.")
        return

    data = []
    for i, result in enumerate(results, 1):
        url = result['link']
        print(f"\nProcessing Result {i}: {url}")
        headings = extract_headings(url)
        data.append({
            'Rank': i,
            'URL': url,
            'H1': headings['H1'],
            'H2': headings['H2'],
            'H3': headings['H3'],
            'H4': headings['H4']  # Added H4 to the data
        })

    # Create DataFrame
    df = pd.DataFrame(data)

    # Save to CSV with line breaks preserved
    csv_filename = f"{keyword}_{country}_headings.csv"
    df.to_csv(csv_filename, index=False, quoting=1, quotechar='"', escapechar='\\')

    # Display the table in the notebook
    print("\nHeadings Table (preview):")
    pd.set_option('display.max_colwidth', None)
    print(df.head().to_string())

    # Provide download link
    files.download(csv_filename)
    print(f"\nCSV file '{csv_filename}' has been created and is ready for download.")

if __name__ == "__main__":
    main()

This Google Colab script consists of several key components:

Libraries Used

python

!pip install requests beautifulsoup4 google-search-results pandas
  • requests: For making HTTP requests to fetch web pages.
  • BeautifulSoup: For parsing HTML and extracting data.
  • GoogleSearch: To interact with the SerpAPI and retrieve search results.
  • pandas: For data manipulation and exporting to CSV.

Functions Defined

  1. extract_headings(url):
    • This function takes a URL as input and retrieves the H1, H2, and H3 headings from the page.
    • It uses requests to fetch the page content and BeautifulSoup to parse the HTML.
    • The function returns a dictionary containing the headings, limiting H2 and H3 headings to the first five entries for brevity.
  2. get_search_results(keyword, country, api_key):
    • This function interacts with the SerpAPI to fetch the top 10 organic search results for the specified keyword and country.
    • It constructs a request with the necessary parameters and returns a list of search results.
  3. main():
    • This is the main driver function that orchestrates the script’s execution.
    • It prompts the user for their API key, keyword, and country code.
    • It retrieves search results, processes each URL to extract headings, and compiles the data into a pandas DataFrame.
    • Finally, it saves the DataFrame to a CSV file and provides a download link.

Code Execution

The script is executed by calling the main() function, which initiates user input and processes the search results. The output is a CSV file that contains the rank, URL, and extracted headings (H1, H2, H3) for each of the top 10 search results.

python

if __name__ == "__main__":

    main()

This line ensures that the main() function runs when the script is executed directly in Google Colab.

Picture of Sk Rafiqul Islam

Sk Rafiqul Islam

Rafiqul is the SEO and content marketing strategist who work closely with clients and ensure the execution goes as per the strategy. He strategize the content and organic growth strategy for brands.

Twitter, LinkedIn, or mail our team at [email protected].

Let's make SEO the biggest growth engine for your brand.

We started Accrue SERP with one clear goal— making SEO an outcome-based marketing channel  (leads, signups, & conversions) rather than chasing site traffic that doesn’t add any business value to your bottom line.

Let’s schedule a 15 to 30-minute meeting and understand how we can help you achieve business growth with our outcome-based SEO services in India.

Book a strategy call