tfrere's picture
tfrere HF Staff
Clean repository - remove missing LFS files
6afedde
|
raw
history blame
9.8 kB

Notion Importer

Complete Notion to MDX (Markdown + JSX) importer optimized for Astro with advanced media handling, interactive components, and seamless integration.

๐Ÿš€ Quick Start

Method 1: Using NOTION_PAGE_ID (Recommended)

# Install dependencies
npm install

# Setup environment variables
cp env.example .env
# Edit .env with your Notion token and page ID

# Complete Notion โ†’ MDX conversion (fetches title/slug automatically)
NOTION_TOKEN=secret_xxx NOTION_PAGE_ID=abc123 node index.mjs

# Or use .env file
node index.mjs

Method 2: Using pages.json (Legacy)

# Install dependencies
npm install

# Setup environment variables
cp env.example .env
# Edit .env with your Notion token

# Configure pages in input/pages.json
# {
#   "pages": [
#     {
#       "id": "your-page-id",
#       "title": "Title",
#       "slug": "slug"
#     }
#   ]
# }

# Complete Notion โ†’ MDX conversion
node index.mjs

# For step-by-step debugging
node notion-converter.mjs    # Notion โ†’ Markdown
node mdx-converter.mjs       # Markdown โ†’ MDX

๐Ÿ“ Structure

notion-importer/
โ”œโ”€โ”€ index.mjs                    # Complete Notion โ†’ MDX pipeline
โ”œโ”€โ”€ notion-converter.mjs         # Notion โ†’ Markdown with notion-to-md v4
โ”œโ”€โ”€ mdx-converter.mjs           # Markdown โ†’ MDX with Astro components
โ”œโ”€โ”€ post-processor.mjs          # Markdown post-processing
โ”œโ”€โ”€ package.json                # Dependencies and scripts
โ”œโ”€โ”€ env.example                 # Environment variables template
โ”œโ”€โ”€ static/                     # Static files injected at build time
โ”‚   โ”œโ”€โ”€ frontmatter.mdx        # Static frontmatter (overrides all others)
โ”‚   โ””โ”€โ”€ bibliography.bib       # Static bibliography
โ”œโ”€โ”€ input/                      # Configuration
โ”‚   โ””โ”€โ”€ pages.json             # Notion pages to convert
โ””โ”€โ”€ output/                     # Results
    โ”œโ”€โ”€ *.md                   # Intermediate Markdown
    โ”œโ”€โ”€ *.mdx                  # Final MDX for Astro
    โ””โ”€โ”€ media/                 # Downloaded media files

โœจ Key Features

๐ŸŽฏ Advanced Media Handling

  • Local download: Automatic download of all Notion media (images, files, PDFs)
  • Path transformation: Smart path conversion for web accessibility
  • Image components: Automatic conversion to Astro Image components with zoom/download
  • Media organization: Structured media storage by page ID

๐Ÿงฎ Interactive Components

  • Callouts โ†’ Notes: Notion callouts converted to Astro Note components
  • Enhanced tables: Tables wrapped in styled containers
  • Code blocks: Enhanced with copy functionality
  • Automatic imports: Smart component and image import generation

๐ŸŽจ Smart Formatting

  • Link fixing: Notion internal links converted to relative links
  • Artifact cleanup: Removal of Notion-specific formatting artifacts
  • Static frontmatter: Priority injection of custom frontmatter from static/frontmatter.mdx
  • Static bibliography: Automatic copying of static/bibliography.bib
  • Astro compatibility: Full compatibility with Astro MDX processing

๐Ÿ”ง Robust Pipeline

  • Notion preprocessing: Advanced page configuration and media strategy
  • Post-processing: Markdown cleanup and optimization
  • MDX conversion: Final transformation with Astro components
  • Auto-copy: Automatic copying to Astro content directory

๐Ÿ“„ Static Files Configuration

The importer supports static files for consistent metadata and bibliography:

Frontmatter (static/frontmatter.mdx)

Create this file to override frontmatter across all conversions:

---
title: "My Article Title"
subtitle: "Optional subtitle"
description: "Article description for SEO"
authors:
  - name: "Jane Doe"
    url: "https://example.com"
    affiliations:
      - "Hugging Face"
tags:
  - AI
  - Research
doi: "10.1000/182"
tableOfContentsAutoCollapse: true
---

This static frontmatter takes highest priority over any Notion metadata or existing frontmatter.

Bibliography (static/bibliography.bib)

Add your BibTeX entries to be copied to src/content/bibliography.bib:

@article{example2024,
  title={Example Article},
  author={Doe, Jane and Smith, John},
  journal={Example Journal},
  year={2024}
}

๐Ÿ“Š Example Workflow

# 1. Configure your Notion pages
# Edit input/pages.json with your page IDs

# 2. Complete automatic conversion
NOTION_TOKEN=your_token node index.mjs --clean

# 3. Generated results
ls output/
# โ†’ getting-started.md (Intermediate Markdown)  
# โ†’ getting-started.mdx (Final MDX for Astro)
# โ†’ media/ (downloaded images and files)

๐Ÿ“‹ Conversion Result

The pipeline generates MDX files optimized for Astro with:

---
title: "Getting Started with Notion"
published: "2024-01-15"
tableOfContentsAutoCollapse: true
---

import Image from '../components/Image.astro';
import Note from '../components/Note.astro';
import gettingStartedImage from './media/getting-started/image1.png';

## Introduction

Here is some content with a callout:

<Note type="info" title="Important">
This is a converted Notion callout.
</Note>

And an image:

<Figure
  src={gettingStartedImage}
  alt="Getting started screenshot"
  zoomable
  downloadable
  layout="fixed"
/>

โš™๏ธ Required Astro Configuration

To use the generated MDX files, ensure your Astro project has the required components:

// src/components/Figure.astro
---
export interface Props {
  src: any;
  alt?: string;
  caption?: string;
  zoomable?: boolean;
  downloadable?: boolean;
  layout?: string;
  id?: string;
}

const { src, alt, caption, zoomable, downloadable, layout, id } = Astro.props;
---

<figure {id} class="figure">
  <img src={src} alt={alt} />
  {caption && <figcaption>{caption}</figcaption>}
</figure>

๐Ÿ› ๏ธ Prerequisites

  • Node.js with ESM support
  • Notion Integration: Set up an integration in your Notion workspace
  • Notion Token: Copy the "Internal Integration Token"
  • Shared Pages: Share the specific Notion page(s) with your integration
  • Astro to use the generated MDX

๐ŸŽฏ Technical Architecture

4-Stage Pipeline

  1. Notion Preprocessing (notion-converter.mjs)

    • Configuration loading from pages.json
    • Notion API client initialization
    • Media download strategy configuration
  2. Notion-to-Markdown (notion-to-md v4)

    • Page conversion with NotionConverter
    • Media downloading with downloadMediaTo()
    • File export with DefaultExporter
  3. Markdown Post-processing (post-processor.mjs)

    • Notion artifact cleanup
    • Link fixing and optimization
    • Table and code block enhancement
  4. MDX Conversion (mdx-converter.mjs)

    • Component transformation (Figure, Note)
    • Automatic import generation
    • Frontmatter enhancement
    • Astro compatibility optimization

๐Ÿ“Š Configuration Options

Pages Configuration (input/pages.json)

{
  "pages": [
    {
      "id": "your-notion-page-id",
      "title": "Page Title",
      "slug": "page-slug"
    }
  ]
}

Environment Variables

Copy env.example to .env and configure:

cp env.example .env
# Edit .env with your actual Notion token

Required variables:

NOTION_TOKEN=secret_your_notion_integration_token_here

Command Line Options

# Full workflow
node index.mjs --clean --token=your_token

# Notion to Markdown only
node index.mjs --notion-only

# Markdown to MDX only  
node index.mjs --mdx-only

# Custom paths
node index.mjs --input=my-pages.json --output=converted/

๐Ÿ“Š Conversion Statistics

For a typical Notion page:

  • Media files automatically downloaded and organized
  • Callouts converted to interactive Note components
  • Images transformed to Figure components with zoom/download
  • Tables enhanced with proper styling containers
  • Code blocks enhanced with copy functionality
  • Links fixed for proper internal navigation

โœ… Project Status

๐ŸŽ‰ Complete Features

  • โœ… Notion โ†’ MDX Pipeline: Full end-to-end functional conversion
  • โœ… Media Management: Automatic download and path transformation
  • โœ… Component Integration: Seamless Astro component integration
  • โœ… Smart Formatting: Intelligent cleanup and optimization
  • โœ… Robustness: Error handling and graceful degradation
  • โœ… Flexibility: Modular pipeline with step-by-step options

๐Ÿš€ Production Ready

The toolkit is now 100% operational for converting Notion pages to MDX/Astro with all advanced features (media handling, component integration, smart formatting).

๐Ÿ”— Integration with notion-to-md v4

This toolkit leverages the powerful notion-to-md v4 library with:

  • Advanced Media Strategies: Download, upload, and direct media handling
  • Custom Renderers: Block transformers and annotation transformers
  • Exporter Plugins: File, buffer, and stdout output options
  • Database Support: Full database property and frontmatter transformation
  • Page References: Smart internal link handling

๐Ÿ“š Additional Resources