Skip to Content

JINA_READER

Overview

JINA_READER fetches and extracts the main content of a web page and returns it in Markdown format using the Jina Reader API (https://jina.ai/reader/). The Jina Reader API is a developer-friendly service that leverages advanced content extraction algorithms to identify and return the most relevant, readable sections of any web page. It is designed to work with a wide range of web content, including articles, company profiles, and news, and outputs clean Markdown for easy integration into downstream workflows.

This function is useful for extracting specific information, summarizing content, or other text processing tasks directly in Excel. It enables business users to quickly analyze, summarize, or reference web-based information without leaving their spreadsheet environment. The Jina Reader API is robust against clutter, advertisements, and navigation elements, focusing on delivering the core readable content. It is ideal for automating research, reporting, and integrating web-based data into business processes.

Usage

To use the JINA_READER function in Excel, enter it as a formula in a cell, specifying the URL of the web page you want to fetch. Optionally, you can provide an API key if you have one:

=JINA_READER(url, [api_key])

Arguments

ArgumentTypeRequiredDescriptionExample
urlstringYesThe full URL of the web page to fetchhttps://www.ycombinator.com/companies/airbnb
api_keystringNoAPI key for authentication (if required)“your_api_key”

Returns

ReturnsTypeDescriptionExample
ContentstringThe main content of the web page in Markdown format, extracted by Jina Reader.”# Airbnb…”
ErrorstringError message if the URL is invalid or unreachable.”Error: Invalid URL”

Examples

Company Analysis for Market Research

Sample Input:

URLAPI Key
https://www.ycombinator.com/companies/airbnb(optional)

Sample Call:

=JINA_READER("https://www.ycombinator.com/companies/airbnb") =JINA_READER("https://www.ycombinator.com/companies/airbnb", "your_api_key")

Sample Output: Returns the extracted content about Airbnb, including their business model and company history (in Markdown format).

Python Code

def jina_reader(url, api_key=None): """ Returns web page content in markdown format using Jina. Useful as a starting point for extraction, summarization, etc. Args: url (str): The full URL to fetch. api_key (str, optional): API key for authentication. Default is None. Returns: str: The content of the response from the URL, or an error message string if the request fails or input is invalid. """ import requests if not isinstance(url, str) or not url.strip(): return "Error: Invalid URL" headers = { "X-Retain-Images": "none" } if api_key: headers["Authorization"] = f"Bearer {api_key}" base_url = "https://r.jina.ai/" full_url = base_url + url try: response = requests.get(full_url, headers=headers, timeout=15) if response.status_code != 200: return f"Error: HTTP {response.status_code} - {response.reason}" # Extract content after 'Markdown Content:' marker try: content = response.text.split("Markdown Content:")[1] except IndexError: content = response.text return content.strip() if content.strip() else "Error: No content returned" except requests.exceptions.RequestException as e: return f"Error: {str(e)}"

Live Notebook

Edit this function in a live notebook.

Live Demo

Last updated on