Doc Extract
v1.0.4 - Latest Release

Doc Extract

A powerful Node.js library for reading and extracting text from various document formats including PDF, DOCX, DOC, PPT, PPTX, and TXT files.

PDF
DOCX
DOC
PPT
PPTX
TXT

Powerful Features

Everything you need to extract text and metadata from documents with ease and reliability.

6 Formats
Multiple Format Support

Extract text from PDF, DOCX, DOC, PPT, PPTX, and TXT files with a unified API.

Detailed Stats
Rich Metadata

Get comprehensive document statistics including word count, character count, and page numbers.

Memory Efficient
Buffer Support

Read documents directly from memory buffers without writing to disk.

Type Safe
TypeScript Ready

Full TypeScript support with comprehensive type definitions and IntelliSense.

Async/Await
Promise-based API

Modern async/await API design for seamless integration with your applications.

Robust
Error Handling

Comprehensive error handling with custom error types and detailed error messages.

6+
Supported Formats
100%
TypeScript Coverage
MIT
Open Source License

Documentation

Complete API reference and usage examples to get you started quickly.

Installation & Setup

Install the Package

npm install doc-extract

System Dependencies

For full functionality, install these system packages:

Ubuntu/Debian
sudo apt-get install antiword unrtf poppler-utils tesseract-ocr
macOS
brew install antiword unrtf poppler tesseract
Windows
choco install poppler tesseract

Real-World Examples

See how to integrate doc-extract into your applications with these practical examples.

Express.js Integration

Handle file uploads and extract text in an Express.js application

Express
Multer
File Upload
import express from "express";
import multer from "multer";
import { DocumentReader } from "doc-extract";

const app = express();
const upload = multer();
const reader = new DocumentReader();

app.post("/upload", upload.single("document"), async (req, res) => {
  try {
    if (!req.file) {
      return res.status(400).json({ error: "No file uploaded" });
    }

    const content = await reader.readDocumentFromBuffer(
      req.file.buffer,
      req.file.originalname,
      req.file.mimetype
    );

    res.json({
      text: content.text,
      metadata: content.metadata,
    });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

app.listen(3000, () => {
  console.log("Server running on port 3000");
});

Need More Examples?

Check out our comprehensive documentation and community examples on GitHub.