PII Redaction Tool
Application to detect and redact Personally Identifiable Information.

π Repository: https://github.com/PSavvateev/pii-redaction-tool.git
π§© Overview
Purpose
This application automatically detects and redacts Personally Identifiable Information (PII) from customer interactions.
Designed for:
- Customer support teams needing to remove sensitive data from tickets
- Data engineers and analysts preparing datasets
- Privacy and compliance officers ensuring GDPR/PII protection
Why use it:
- Automates PII redaction
- Integrates with internal systems via modular connectors
- Uses LLMs for flexible, accurate detection
- Reduces compliance risk (e.g., GDPR, CCPA)
π‘οΈ GDPR Context
PII (Personally Identifiable Information) refers to any data that can be used β either alone or in combination with other information β to identify, contact, or locate an individual.
Examples of PII:
- Direct identifiers: Full name, social security number, driverβs license number, passport number, email address, phone number, date of birth.
- Indirect identifiers: IP addresses, device identifiers, biometric data, credit card and bank account numbers.
- Other sensitive data: Medical information, criminal history, citizenship or immigration status, ethnicity, or religious affiliation.
PII redaction is the process of detecting and removing (or masking) this data in order to protect privacy and reduce exposure to sensitive information leaks.
Why redaction matters under GDPR:
- β Protect individualsβ privacy and prevent unauthorized access to personal data.
- β Support data minimization, keeping only the data thatβs truly necessary.
- β Reduce risk when data is shared, stored, or processedβespecially by third parties or internal teams not authorized to access PII.
βοΈ Functionality
General Workflow (System-to-System Integration)

- Customer interactions β such as support tickets, messages, chats or emails β are stored in a connected data source (e.g., CRM, analytics platform, or internal database).
- The app retrieves ticket data via a pre-configured API connector, using a unique ticket ID. Multiple connectors can be supported simultaneously, making the app easily extendable.
- The app analyzes the ticket content, detects any PII, and applies redaction according to the configured strategy.
- The redacted ticket is then pushed back to the original system, replacing the unredacted version.
CRM-Agent Workflow (Zendesk example)

- A support agent opens a ticket in the CRM and clicks a pre-configured βRedact PIIβ button.
- This button triggers a webhook to the app, passing the ticket ID and CRM source.
- The app fetches the ticket from CRM database.
- The app identifies PII entities.
- In case of Zendesk integration, redaction executed on the CRM level. (depending on other CRM-system requirements, redaction can be executed within the app)
- The ticket content is updated in the CRM database with the redacted version.
LLM Agent
The main βdecision makingβ module of the app is an PII-identifying agent - the LLM-agent built using Google ADK framework - which requires access to Google API.
I used the cheapest available gemini-1.5-flash
LLM-model that seems to sufficient for such task.
However, Google ADK alows to use different models.
Redaction Strategies
The app supports multiple redaction strategies to handle detected PII.
Strategy | Description | Example Output |
---|---|---|
mask | Replaces every character in the PII span with a * . | Email: ******************** |
tokenize | Replaces the PII with a structured placeholder that includes the type. | Email: [PII::email] |
hash | Replaces the PII with a hashed version (useful for anonymized comparisons). | Email: 6f8db599de986fab7a21625b7916589c |
βΉοΈ Default strategy: mask
π Creating and Using Connectors (General workflow)
To integrate with a new CRM or data platform, implement a connector module that defines two functions:
def fetch_ticket(source: str, ticket_id: str) -> Ticket: ...
def update_ticket(source: str, redacted_ticket: RedactedTicket) -> None: ...
Requirements:
- The connector must reside in
connectors/
directory and be registered inconnector_registry.py
:
_CONNECTORS = {
"test": "app.connectors.test_crm_connector",
"zendesk": "app.connectors.zendesk_crm_connector",
"salesforce": "app.connectors.zendesk_crm_connector",
}
- The source string passed to the app (e.g. βzendeskβ, βtestβ) is used to route to the correct connector.
A test/mock connector is included out of the box under connectors/test_crm_connector.py
. This allows testing the system end-to-end without any real data source.
Test connector uses local file as a tickets database example located connectors/mock_db.json
You can use it by sending this payload to the /ticket-redaction/test/{ticket_id}
endpoint.
π Using Zendesk connector (CRM-Agent Workflow)
in progress
π οΈ Tech Details
Project Structure
π app/
β
βββ main.py # FastAPI entry point and routes
ββ
βββ π config/ # App-level configurations
β βββ logger_config.py # Logger setup and format
β
βββ π models/
β βββ pydentic_models.py # Pydentic data models
β
βββ π agents/ # Google ADK LLM agent(s)
β βββ pii_detector_agent.py # LLM interface for identifying PII
β βββ pii_detector_runner.py # Runner to initialize the agent
β βββ prompts.py # Prompt templates for LLM
β
βββ π connectors/ # API connectors
β βββ connector_registry.py # Register/load external service connectors
β βββ test_crm_connector.py # Example CRM connector
β βββ mock_db.json # Test local DB data
β
βββ π services/ # Core logic and business services
β βββ redaction_service.py # Main workflow: fetch, detect, redact, update
β
βββ π utils/ # Utility functions
βββ markdown_stripper.py # Clean markdown artifacts from LLM output
βββ pii_redactor.py # Redaction logic
βββ pii_spans_locator.py # Identify spans in the text for redaction
Stack
- π Python v3.13
- π FastAPI
- π€ Google ADK (Agent Development Kit) v1.5.0
Versions
- v1.0.0 (10 Jul 2025)