PII Redaction Tool

Application to detect and redact Personally Identifiable Information.

PII Redaction Tool

πŸ”— Repository: https://github.com/PSavvateev/pii-redaction-tool.git

🧩 Overview

Purpose

This application automatically detects and redacts Personally Identifiable Information (PII) from customer interactions.

Designed for:

  • Customer support teams needing to remove sensitive data from tickets
  • Data engineers and analysts preparing datasets
  • Privacy and compliance officers ensuring GDPR/PII protection

Why use it:

  • Automates PII redaction
  • Integrates with internal systems via modular connectors
  • Uses LLMs for flexible, accurate detection
  • Reduces compliance risk (e.g., GDPR, CCPA)

πŸ›‘οΈ GDPR Context

PII (Personally Identifiable Information) refers to any data that can be used β€” either alone or in combination with other information β€” to identify, contact, or locate an individual.

Examples of PII:

  • Direct identifiers: Full name, social security number, driver’s license number, passport number, email address, phone number, date of birth.
  • Indirect identifiers: IP addresses, device identifiers, biometric data, credit card and bank account numbers.
  • Other sensitive data: Medical information, criminal history, citizenship or immigration status, ethnicity, or religious affiliation.

PII redaction is the process of detecting and removing (or masking) this data in order to protect privacy and reduce exposure to sensitive information leaks.

Why redaction matters under GDPR:

  • βœ… Protect individuals’ privacy and prevent unauthorized access to personal data.
  • βœ… Support data minimization, keeping only the data that’s truly necessary.
  • βœ… Reduce risk when data is shared, stored, or processedβ€”especially by third parties or internal teams not authorized to access PII.

βš™οΈ Functionality

General Workflow (System-to-System Integration)

General Workflow
  1. Customer interactions β€” such as support tickets, messages, chats or emails β€” are stored in a connected data source (e.g., CRM, analytics platform, or internal database).
  2. The app retrieves ticket data via a pre-configured API connector, using a unique ticket ID. Multiple connectors can be supported simultaneously, making the app easily extendable.
  3. The app analyzes the ticket content, detects any PII, and applies redaction according to the configured strategy.
  4. The redacted ticket is then pushed back to the original system, replacing the unredacted version.

CRM-Agent Workflow (Zendesk example)

CRM Workflow
  1. A support agent opens a ticket in the CRM and clicks a pre-configured β€œRedact PII” button.
  2. This button triggers a webhook to the app, passing the ticket ID and CRM source.
  3. The app fetches the ticket from CRM database.
  4. The app identifies PII entities.
  5. In case of Zendesk integration, redaction executed on the CRM level. (depending on other CRM-system requirements, redaction can be executed within the app)
  6. The ticket content is updated in the CRM database with the redacted version.

LLM Agent

The main β€˜decision making’ module of the app is an PII-identifying agent - the LLM-agent built using Google ADK framework - which requires access to Google API. I used the cheapest available gemini-1.5-flash LLM-model that seems to sufficient for such task.

However, Google ADK alows to use different models.

Redaction Strategies

The app supports multiple redaction strategies to handle detected PII.

StrategyDescriptionExample Output
maskReplaces every character in the PII span with a *.Email: ********************
tokenizeReplaces the PII with a structured placeholder that includes the type.Email: [PII::email]
hashReplaces the PII with a hashed version (useful for anonymized comparisons).Email: 6f8db599de986fab7a21625b7916589c

ℹ️ Default strategy: mask

πŸ”Œ Creating and Using Connectors (General workflow)

To integrate with a new CRM or data platform, implement a connector module that defines two functions:

 
def fetch_ticket(source: str, ticket_id: str) -> Ticket: ...
def update_ticket(source: str, redacted_ticket: RedactedTicket) -> None: ...
 
 

Requirements:

  • The connector must reside in connectors/ directory and be registered in connector_registry.py:
_CONNECTORS = {
    "test": "app.connectors.test_crm_connector",
    "zendesk":  "app.connectors.zendesk_crm_connector",
    "salesforce": "app.connectors.zendesk_crm_connector",
}
  • The source string passed to the app (e.g. β€œzendesk”, β€œtest”) is used to route to the correct connector.

A test/mock connector is included out of the box under connectors/test_crm_connector.py. This allows testing the system end-to-end without any real data source.

Test connector uses local file as a tickets database example located connectors/mock_db.json

You can use it by sending this payload to the /ticket-redaction/test/{ticket_id} endpoint.

πŸ”Œ Using Zendesk connector (CRM-Agent Workflow)

in progress

πŸ› οΈ Tech Details

Project Structure

πŸ“‚ app/
β”‚
β”œβ”€β”€ main.py                         # FastAPI entry point and routes
β”‚β”‚
β”œβ”€β”€ πŸ“‚ config/                      # App-level configurations
β”‚   └── logger_config.py            # Logger setup and format
β”‚
β”œβ”€β”€ πŸ“‚ models/                             
β”‚   └── pydentic_models.py          # Pydentic data models
β”‚
β”œβ”€β”€ πŸ“‚ agents/                      # Google ADK LLM agent(s)
β”‚   β”œβ”€β”€ pii_detector_agent.py       # LLM interface for identifying PII
β”‚   β”œβ”€β”€ pii_detector_runner.py      # Runner to initialize the agent
β”‚   └── prompts.py                  # Prompt templates for LLM
β”‚
β”œβ”€β”€ πŸ“‚ connectors/                  # API connectors
β”‚   β”œβ”€β”€ connector_registry.py       # Register/load external service connectors
β”‚   β”œβ”€β”€ test_crm_connector.py       # Example CRM connector
β”‚   └── mock_db.json                # Test local DB data
β”‚
β”œβ”€β”€ πŸ“‚ services/                    # Core logic and business services
β”‚   └── redaction_service.py        # Main workflow: fetch, detect, redact, update
β”‚
└── πŸ“‚ utils/                      # Utility functions
    β”œβ”€β”€ markdown_stripper.py        # Clean markdown artifacts from LLM output
    β”œβ”€β”€ pii_redactor.py             # Redaction logic
    └── pii_spans_locator.py        # Identify spans in the text for redaction

Stack

  • 🐍 Python v3.13
  • πŸš€ FastAPI
  • πŸ€– Google ADK (Agent Development Kit) v1.5.0

Versions

  • v1.0.0 (10 Jul 2025)