WEB_CONTENT
Overview
This function fetches the content of a web page and returns it in Markdown format using the Jina Reader API (https://jina.ai/reader/ ). It’s useful as a starting point for extracting specific information, summarizing content, or other text processing tasks directly in Excel, enabling business users to quickly analyze, summarize, or reference web-based information without leaving their spreadsheet environment.
Usage
Fetches the content from the specified URL.
=WEB_CONTENT(url)
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url | string | Yes | The full URL of the web page to fetch. |
Return Value
Return Value | Type | Description |
---|---|---|
Content | string | The main content of the web page in Markdown format, extracted by Jina Reader. |
Examples
Example 1: Company Analysis for Market Research
A business analyst wants to extract and summarize information about a competitor from their company profile page to include in a market research report.
=WEB_CONTENT("https://www.ycombinator.com/companies/airbnb")
The function returns the extracted content about Airbnb, including their business model and company history. The analyst can use Excel formulas to further summarize or categorize the information for reporting.
Example 2: News Article Extraction for Trend Analysis
A marketing team wants to monitor the latest startup news for industry trends and funding announcements.
=WEB_CONTENT("https://techcrunch.com/category/startups/")
The function returns the latest startup news articles from TechCrunch. The team can use Excel’s text analysis tools to identify trends or key topics.
Limitations
- If the URL is invalid or unreachable, an error or empty result is returned.
- Some web pages may block automated access or require authentication, resulting in incomplete or missing content.
- The function only returns the main content as determined by the Jina Reader API; some details or formatting may be lost.
- Large or complex web pages may be truncated.
Benefits
Native Excel does not provide a built-in way to fetch and extract web page content as Markdown. While Power Query can import web data, it is limited to tables and may not extract the main content or text. Manual copy-paste is error-prone and inefficient.
Why use this Python function?
- Automates extraction of readable web content for analysis or reporting.
- Enables integration of web-based research directly into Excel workflows.
- More flexible and robust than native Excel web import tools for unstructured content.
Source Code
import requests
def web_content(url):
"""
Returns web page content in markdown format using Jina. Useful as a starting point for extraction, summarization, etc.
Args:
url (str): The full URL to fetch.
Returns:
str: The content of the response from the URL.
"""
headers = {
"X-Retain-Images": "none"
}
base_url = "https://r.jina.ai/"
full_url = base_url + url
response = requests.get(full_url, headers=headers)
response.raise_for_status()
# Extract content after 'Markdown Content:' marker
try:
content = response.text.split("Markdown Content:")[1]
except IndexError:
# Handle cases where the marker might not be present
content = response.text
return content.strip() # Strip leading/trailing whitespace