GTSCO Cleanroom
Report Pipeline
A Python automation pipeline engineered for GTSCO that intercepts raw particle counter machine dumps, validates every reading against ISO 14644 Class 7/8 air cleanliness limits, generates Matplotlib charts, and dispatches a fully branded client-ready Word + PDF compliance report — with zero manual steps between machine and mailbox.
Hours of manual work per report — with zero tolerance for error
After every cleanroom service, GTSCO engineers manually opened the particle counter's XLS dump, copied readings into Excel, rebuilt charts from scratch, checked each value against ISO 14644 Class 7 and Class 8 limits by hand, wrote the cover letter, formatted the Word report using the branded template, and then converted it to PDF.
Each report took hours. With multiple rooms, multiple service dates, and multiple clients per week — this was an entire work day spent on documentation that could get a reading wrong and expose the business to liability on a safety-critical compliance document.
Double-click. Walk away. Report in your inbox.
Built a 9-step Python pipeline wrapped in a double-click BAT file that non-technical staff can operate. Drop the machine XLS in the inputs folder, run it, answer a few prompts (client name, ISO class, room names), and the pipeline handles everything: parsing two different XLS format variants, ISO validation, chart generation, Word template population at the XML level, mastersheet update, and PDF export.
The Word report is generated by directly manipulating the OOXML structure of the branded GTSCO template — preserving all logos, headers, and formatting while injecting live data, charts, and a dynamic cover letter. No copy-pasting. No format drift. No calculation errors.
Engineering choices that mattered.
Dual-format XLS parser
The particle counter machine exports in two different XLS layouts depending on firmware version — Format A (timestamp-indexed) and Format B (location-grouped). The pipeline auto-detects the format by inspecting the header row and routes to the correct parser, so staff never need to know which format they have.
OOXML manipulation over template copy
Rather than generating a Word file from scratch (losing GTSCO's branding, fonts, and table styles), the pipeline operates directly on the existing branded MASTER_TEMPLATE.docx at the XML level using lxml. It clones room table sections, replaces drawings with live Matplotlib charts, and injects data while preserving every pixel of the client's format.
Two-path PDF conversion
PDF conversion tries LibreOffice headless first (cross-platform), then falls back to COM automation via Microsoft Word if LibreOffice isn't installed. If neither is available, the DOCX is still perfect and staff can convert manually. The pipeline never fails to produce a deliverable.
Technology deployed.
Reads legacy .xls particle counter machine dumps. Auto-detects two format variants. Extracts readings, timestamps, and location codes from both.
Generates a per-client Excel mastersheet with professional styling, column groups, alternating row colours, and merged headers. Each run appends a new sheet — never overwrites history.
Generates one line chart per room per particle size (3 sizes × n rooms). Each chart overlays actual readings against ISO Class 7 and Class 8 limit lines. Rendered headlessly at 150 DPI for crisp print output.
Operates on the GTSCO MASTER_TEMPLATE.docx at the OOXML XML level. Clones room table sections dynamically, replaces chart placeholder drawings with live PNGs, fills data tables and cover letter text while preserving all branding.
Dual-path PDF conversion: LibreOffice headless subprocess first, COM automation via Microsoft Word as fallback. Ensures PDF output on any Windows machine regardless of what's installed.
A double-click BAT file that auto-installs all Python dependencies on first run and then launches the pipeline. Zero technical knowledge required from GTSCO staff — drop XLS, double-click, collect report.
Hardcoded safety limits. Zero calculation drift.
ISO 14644 limits are immutable constants — not spreadsheet formulas that can drift. Every reading in every room for every particle size is individually validated against the selected class. One reading over the limit fails the entire room.
What shipped.
Handles any number of rooms in a single run. Each room gets its own 3-chart section in the report, its own Pass/Fail status, and its own rows in the mastersheet.
A per-client Excel mastersheet accumulates every run as a new sheet. GTSCO's team can pull up any client's full service history at any time — zero data loss across runs.
One double-click BAT file. No Python knowledge needed. Auto-installs dependencies on first run. Any GTSCO staff member can generate a report on any Windows machine.