JINA_READER

Overview

JINA_READER fetches and extracts the main content of a web page and returns it in Markdown format using the Jina Reader API (https://jina.ai/reader/ ). The Jina Reader API is a developer-friendly service that leverages advanced content extraction algorithms to identify and return the most relevant, readable sections of any web page. It is designed to work with a wide range of web content, including articles, company profiles, and news, and outputs clean Markdown for easy integration into downstream workflows.

This function is useful for extracting specific information, summarizing content, or other text processing tasks directly in Excel. It enables business users to quickly analyze, summarize, or reference web-based information without leaving their spreadsheet environment. The Jina Reader API is robust against clutter, advertisements, and navigation elements, focusing on delivering the core readable content. It is ideal for automating research, reporting, and integrating web-based data into business processes.

Usage

To use the JINA_READER function in Excel, enter it as a formula in a cell, specifying the URL of the web page you want to fetch. Optionally, you can provide an API key if you have one:


=JINA_READER(url, [api_key])

Arguments

Argument	Type	Required	Description	Example
url	string	Yes	The full URL of the web page to fetch	”https://www.ycombinator.com/companies/airbnb ”
api_key	string	No	API key for authentication (if required)	“your_api_key”

Returns

Returns	Type	Description	Example
Content	string	The main content of the web page in Markdown format, extracted by Jina Reader.	”# Airbnb…”
Error	string	Error message if the URL is invalid or unreachable.	”Error: Invalid URL”

Examples

Company Analysis for Market Research

Sample Input:

URL	API Key
https://www.ycombinator.com/companies/airbnb	(optional)

Sample Call:


=JINA_READER("https://www.ycombinator.com/companies/airbnb")
=JINA_READER("https://www.ycombinator.com/companies/airbnb", "your_api_key")

Sample Output: Returns the extracted content about Airbnb, including their business model and company history (in Markdown format).

Python Code


def jina_reader(url, api_key=None):
    """
    Returns web page content in markdown format using Jina. Useful as a starting point for extraction, summarization, etc.
 
    Args:
        url (str): The full URL to fetch.
        api_key (str, optional): API key for authentication. Default is None.
 
    Returns:
        str: The content of the response from the URL, or an error message string if the request fails or input is invalid.
    """
    import requests
    if not isinstance(url, str) or not url.strip():
        return "Error: Invalid URL"
    headers = {
        "X-Retain-Images": "none"
    }
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"
    base_url = "https://r.jina.ai/"
    full_url = base_url + url
    try:
        response = requests.get(full_url, headers=headers, timeout=15)
        if response.status_code != 200:
            return f"Error: HTTP {response.status_code} - {response.reason}"
        # Extract content after 'Markdown Content:' marker
        try:
            content = response.text.split("Markdown Content:")[1]
        except IndexError:
            content = response.text
        return content.strip() if content.strip() else "Error: No content returned"
    except requests.exceptions.RequestException as e:
        return f"Error: {str(e)}"

Live Notebook

Edit this function in a live notebook .