JINA_READER
Overview
JINA_READER fetches and extracts the main content of a web page and returns it in Markdown format using the Jina Reader API (https://jina.ai/reader/ ). The Jina Reader API is a developer-friendly service that leverages advanced content extraction algorithms to identify and return the most relevant, readable sections of any web page. It is designed to work with a wide range of web content, including articles, company profiles, and news, and outputs clean Markdown for easy integration into downstream workflows.
This function is useful for extracting specific information, summarizing content, or other text processing tasks directly in Excel. It enables business users to quickly analyze, summarize, or reference web-based information without leaving their spreadsheet environment. The Jina Reader API is robust against clutter, advertisements, and navigation elements, focusing on delivering the core readable content. It is ideal for automating research, reporting, and integrating web-based data into business processes.
Usage
To use the JINA_READER
function in Excel, enter it as a formula in a cell, specifying the URL of the web page you want to fetch. Optionally, you can provide an API key if you have one:
=JINA_READER(url, [api_key])
Arguments
Argument | Type | Required | Description | Example |
---|---|---|---|---|
url | string | Yes | The full URL of the web page to fetch | ”https://www.ycombinator.com/companies/airbnb ” |
api_key | string | No | API key for authentication (if required) | “your_api_key” |
Returns
Returns | Type | Description | Example |
---|---|---|---|
Content | string | The main content of the web page in Markdown format, extracted by Jina Reader. | ”# Airbnb…” |
Error | string | Error message if the URL is invalid or unreachable. | ”Error: Invalid URL” |
Examples
Company Analysis for Market Research
Sample Input:
URL | API Key |
---|---|
https://www.ycombinator.com/companies/airbnb | (optional) |
Sample Call:
=JINA_READER("https://www.ycombinator.com/companies/airbnb")
=JINA_READER("https://www.ycombinator.com/companies/airbnb", "your_api_key")
Sample Output: Returns the extracted content about Airbnb, including their business model and company history (in Markdown format).
Python Code
def jina_reader(url, api_key=None):
"""
Returns web page content in markdown format using Jina. Useful as a starting point for extraction, summarization, etc.
Args:
url (str): The full URL to fetch.
api_key (str, optional): API key for authentication. Default is None.
Returns:
str: The content of the response from the URL, or an error message string if the request fails or input is invalid.
"""
import requests
if not isinstance(url, str) or not url.strip():
return "Error: Invalid URL"
headers = {
"X-Retain-Images": "none"
}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
base_url = "https://r.jina.ai/"
full_url = base_url + url
try:
response = requests.get(full_url, headers=headers, timeout=15)
if response.status_code != 200:
return f"Error: HTTP {response.status_code} - {response.reason}"
# Extract content after 'Markdown Content:' marker
try:
content = response.text.split("Markdown Content:")[1]
except IndexError:
content = response.text
return content.strip() if content.strip() else "Error: No content returned"
except requests.exceptions.RequestException as e:
return f"Error: {str(e)}"
Live Notebook
Edit this function in a live notebook .