v1.1.5

13. března 2026 · 2 minuty čtení

Programmer and data architect

Version 1.1.5 internal test release

Testing .docx and .pdf text extraction

Following from conversation about StudentTasks, a technology test was completed to create a service to be able to ansyncronously retrieve a base64 encoded document from a FileMaker record. This would be triggered after a submit action finalises a student task submission. There is no UI requirement, so status information becomes key to confirm completion, failure or stalled extractions. Paramters are {taskID, fileType}

{
  "text":       "extracted plain text content",
  "charCount":  17961,
  "durationMs": 209,
  "library":    "mammoth",
  "version":    "1.8.0",
  "filename":   "report.docx",
  "lineBreaks": "single",
  "hash":       "c99b32954d120bf62ad818944660275f"
}

This has been extended to provide a micro-service, which can take parameters of type and base64encoded file, along with fileName and returnHash. As this is an open endpoint we shall be adding a state or session parameter to reduce fake attempts.

info

cURL https://server/extraction/api/extract/direct {"b64": "${B64}", "fileType": "pdf", "fileName": "test4.pdf", "returnHash": true}

{
    "text": "extracted text",
    "charCount": 2115,
    "durationMs": 241,
    "library": "mammoth",
    "version": "1.12.0",
    "filename": "test.docx",
    "hash": "c99b32954d120bf62ad818944660275f"
}

Response is very fast at sub 300ms for 4 page test docx file, and this covers:

TLS handshake
nginx proxy overhead and routing
base64 decode
mammoth parsing the DOCX XML
JSON serialisation of the response
network latency both ways

A playbook is written and fully tested.

Upload → stores b64 of doc on StudentTask record
(may repeat — each upload overwrites previous b64)
Commit → OData PATCH (answers + timestamp + locked)
→ fire-and-forget POST to extraction service
POST /api/extract

Node Extraction Service (Express)

PATCH StudentTask → extractionStatus: 'processing'
Fetch b64 from StudentTask via OData
Decode → Buffer
Branch: DOCX → mammoth | PDF → pdfjs-dist
Compute MD5 hash of original binary
PATCH StudentTask → extracted text + metadata
POST to FileMaker script → archive b64 to Documents FileMaker StudentTask record — extracted text, status fields Documents record — reconstituted original binary linked to StudentTask

If questions are prefixed with a known character (§) then the text can be extracted with singke carriage returns and then substitute extra lines before teh character for presentation purposes. The extraction is written as a service, so could be called from other places in the FileMaker ecosphere.

Version 1.1.5 internal test release​

Version 1.1.5 internal test release