WEB_CONTENT

Overview

This function fetches the content of a web page and returns it in Markdown format using the Jina Reader API (https://jina.ai/reader/ ). It’s useful as a starting point for extracting specific information, summarizing content, or other text processing tasks directly in Excel, enabling business users to quickly analyze, summarize, or reference web-based information without leaving their spreadsheet environment.

Usage

Fetches the content from the specified URL.


=WEB_CONTENT(url)

Parameters

Parameter	Type	Required	Description
url	string	Yes	The full URL of the web page to fetch.

Return Value

Return Value	Type	Description
Content	string	The main content of the web page in Markdown format, extracted by Jina Reader.

Examples

Example 1: Company Analysis for Market Research

A business analyst wants to extract and summarize information about a competitor from their company profile page to include in a market research report.


=WEB_CONTENT("https://www.ycombinator.com/companies/airbnb")

The function returns the extracted content about Airbnb, including their business model and company history. The analyst can use Excel formulas to further summarize or categorize the information for reporting.

Example 2: News Article Extraction for Trend Analysis

A marketing team wants to monitor the latest startup news for industry trends and funding announcements.


=WEB_CONTENT("https://techcrunch.com/category/startups/")

The function returns the latest startup news articles from TechCrunch. The team can use Excel’s text analysis tools to identify trends or key topics.

Limitations

If the URL is invalid or unreachable, an error or empty result is returned.
Some web pages may block automated access or require authentication, resulting in incomplete or missing content.
The function only returns the main content as determined by the Jina Reader API; some details or formatting may be lost.
Large or complex web pages may be truncated.

Benefits

Native Excel does not provide a built-in way to fetch and extract web page content as Markdown. While Power Query can import web data, it is limited to tables and may not extract the main content or text. Manual copy-paste is error-prone and inefficient.

Why use this Python function?

Automates extraction of readable web content for analysis or reporting.
Enables integration of web-based research directly into Excel workflows.
More flexible and robust than native Excel web import tools for unstructured content.

Source Code


import requests
 
def web_content(url):
    """
    Returns web page content in markdown format using Jina.  Useful as a starting point for extraction, summarization, etc.
 
    Args:
        url (str): The full URL to fetch.
 
    Returns:
        str: The content of the response from the URL.
    """
    headers = {
        "X-Retain-Images": "none"
    }
    base_url = "https://r.jina.ai/"
    full_url = base_url + url
    response = requests.get(full_url, headers=headers)
    response.raise_for_status()
    # Extract content after 'Markdown Content:' marker
    try:
        content = response.text.split("Markdown Content:")[1]
    except IndexError:
        # Handle cases where the marker might not be present
        content = response.text 
    return content.strip() # Strip leading/trailing whitespace