Back to Blog
Development
July 1, 2024
9 min read

Customizing Fake Data with Regular Expressions

Discover advanced techniques for creating custom data patterns using regular expressions and custom generation rules for specific business requirements.

regular expressions
custom patterns
data validation
business rules
pattern matching

Customizing Fake Data with Regular Expressions

Regular expressions provide powerful pattern-matching capabilities that can transform how you generate custom fake data. Whether you need specific format validation, industry-standard identifiers, or complex business rule compliance, regex-driven data generation ensures your test data matches exact requirements.

Understanding Regex-Driven Data Generation

Traditional fake data libraries provide general-purpose data, but many applications require data that follows specific patterns, formats, or business rules. Regular expressions allow you to define precise patterns and generate data that conforms to your exact specifications.

Benefits of Regex-Based Generation

Precision: Generate data that exactly matches your validation rules Compliance: Ensure data meets industry standards and formats Flexibility: Create complex patterns that adapt to business logic Validation: Test edge cases and boundary conditions systematically Consistency: Maintain format consistency across large datasets

Basic Regex Pattern Generation

1. Simple Pattern Matching

Start with basic patterns for common data types:

const RandExp = require('randexp');

// Generate phone numbers in specific format const phonePattern = /\(\d{3}\) \d{3}-\d{4}/; const phoneGenerator = new RandExp(phonePattern);

console.log(phoneGenerator.gen()); // "(555) 123-4567" console.log(phoneGenerator.gen()); // "(892) 456-7890"

// Generate product codes const productCodePattern = /[A-Z]{2}\d{4}-[A-Z]{3}/; const productGenerator = new RandExp(productCodePattern);

console.log(productGenerator.gen()); // "AB1234-XYZ" console.log(productGenerator.gen()); // "CD5678-QRS"

// Generate license plates const licensePlatePattern = /[A-Z]{3}-\d{3}/; const plateGenerator = new RandExp(licensePlatePattern);

console.log(plateGenerator.gen()); // "ABC-123" console.log(plateGenerator.gen()); // "XYZ-789"

2. Advanced Pattern Techniques

Create more sophisticated patterns for complex requirements:

class CustomPatternGenerator {
  constructor() {
    this.patterns = new Map();
  }
  
  addPattern(name, regex, options = {}) {
    this.patterns.set(name, {
      regex: new RandExp(regex),
      options: options
    });
  }
  
  generate(patternName, count = 1) {
    const pattern = this.patterns.get(patternName);
    if (!pattern) throw new Error(Pattern '${patternName}' not found);
    
    const results = [];
    for (let i = 0; i < count; i++) {
      let generated = pattern.regex.gen();
      
      // Apply post-processing if specified
      if (pattern.options.transform) {
        generated = pattern.options.transform(generated);
      }
      
      results.push(generated);
    }
    
    return count === 1 ? results[0] : results;
  }
}

// Usage examples const generator = new CustomPatternGenerator();

// Social Security Numbers (US format) generator.addPattern('ssn', /\d{3}-\d{2}-\d{4}/, { transform: (value) => value.replace(/^000|00$|0000$/, '123') // Avoid invalid SSN patterns });

// Employee IDs with department prefix generator.addPattern('employeeId', /(ENG|SAL|MKT|HR)\d{5}/, { transform: (value) => value.toUpperCase() });

// Custom email patterns for testing generator.addPattern('testEmail', /[a-z]{3,8}\.[a-z]{3,8}@(test|dev|staging)\.(com|org|net)/);

// International phone numbers generator.addPattern('intlPhone', /\+\d{1,3}-\d{3}-\d{3}-\d{4}/);

// Generate samples console.log(generator.generate('ssn', 5)); console.log(generator.generate('employeeId', 3)); console.log(generator.generate('testEmail', 2));

Create custom data patterns instantly with our pattern generator.

Industry-Specific Pattern Generation

1. Financial Data Patterns

Generate industry-compliant financial identifiers:

class FinancialPatternGenerator {
  constructor() {
    this.patterns = {
      // Credit card patterns (test numbers only)
      visa: /4\d{3}-\d{4}-\d{4}-\d{4}/,
      mastercard: /5[1-5]\d{2}-\d{4}-\d{4}-\d{4}/,
      amex: /3[47]\d{2}-\d{6}-\d{5}/,
      
      // Bank routing numbers (ABA format)
      routingNumber: /[0-9]{9}/,
      
      // Account numbers
      accountNumber: /[0-9]{8,12}/,
      
      // IBAN pattern (simplified)
      iban: /[A-Z]{2}\d{2}[A-Z0-9]{4}\d{7}([A-Z0-9]?){0,16}/,
      
      // SWIFT codes
      swift: /[A-Z]{6}[A-Z0-9]{2}([A-Z0-9]{3})?/
    };
  }
  
  generateCreditCard(type = 'visa') {
    const pattern = this.patterns[type];
    if (!pattern) throw new Error(Unknown card type: ${type});
    
    const generator = new RandExp(pattern);
    let cardNumber = generator.gen();
    
    // Ensure test card numbers don't accidentally match real ones
    cardNumber = this.makeTestCardNumber(cardNumber);
    
    return {
      number: cardNumber,
      type: type,
      cvv: this.generateCVV(type),
      expiryDate: this.generateExpiryDate(),
      isTestCard: true
    };
  }
  
  makeTestCardNumber(cardNumber) {
    // Modify to ensure it's clearly a test number
    return cardNumber.replace(/^\d{4}/, '4000'); // Start with test prefix
  }
  
  generateCVV(type) {
    const length = type === 'amex' ? 4 : 3;
    return new RandExp(\\d{${length}}).gen();
  }
  
  generateExpiryDate() {
    const futureDate = new Date();
    futureDate.setFullYear(futureDate.getFullYear() + Math.floor(Math.random() * 5) + 1);
    
    const month = String(futureDate.getMonth() + 1).padStart(2, '0');
    const year = String(futureDate.getFullYear()).slice(-2);
    
    return ${month}/${year};
  }
  
  generateBankAccount() {
    return {
      routingNumber: new RandExp(this.patterns.routingNumber).gen(),
      accountNumber: new RandExp(this.patterns.accountNumber).gen(),
      accountType: faker.helpers.arrayElement(['checking', 'savings']),
      isTestAccount: true
    };
  }
  
  generateIBAN(countryCode = 'GB') {
    // Simplified IBAN generation for testing
    const checkDigits = String(Math.floor(Math.random() * 100)).padStart(2, '0');
    const bankCode = new RandExp(/[A-Z]{4}/).gen();
    const accountNumber = new RandExp(/\d{8}/).gen();
    
    return ${countryCode}${checkDigits}${bankCode}${accountNumber};
  }
}

// Usage const finGenerator = new FinancialPatternGenerator();

console.log(finGenerator.generateCreditCard('visa')); console.log(finGenerator.generateCreditCard('mastercard')); console.log(finGenerator.generateBankAccount()); console.log(finGenerator.generateIBAN('US'));

2. Healthcare Data Patterns

Generate HIPAA-compliant healthcare identifiers:

class HealthcarePatternGenerator {
  constructor() {
    this.patterns = {
      // National Provider Identifier (NPI)
      npi: /[1-9]\d{9}/,
      
      // Medical Record Numbers
      mrn: /(MRN|MR)-\d{6,10}/,
      
      // Prescription numbers
      prescriptionNumber: /RX\d{7,10}/,
      
      // Insurance member IDs
      insuranceMemberId: /[A-Z]{3}\d{7,9}/,
      
      // Lab result IDs
      labResultId: /LAB-\d{4}-\d{6}/,
      
      // Appointment IDs
      appointmentId: /APT\d{8}/
    };
  }
  
  generatePatientData() {
    return {
      // Synthetic identifiers only
      patientId: PAT-${new RandExp(/\d{8}/).gen()},
      mrn: new RandExp(this.patterns.mrn).gen(),
      
      // Age groups instead of specific dates
      ageGroup: faker.helpers.arrayElement([
        '0-17', '18-34', '35-54', '55-74', '75+'
      ]),
      
      // General geographic region
      region: faker.helpers.arrayElement([
        'Northeast', 'Southeast', 'Midwest', 'Southwest', 'West'
      ]),
      
      // Synthetic medical data
      primaryProvider: {
        npi: new RandExp(this.patterns.npi).gen(),
        name: Dr. ${faker.person.lastName()},
        specialty: faker.helpers.arrayElement([
          'Family Medicine', 'Internal Medicine', 'Cardiology', 'Neurology'
        ])
      },
      
      insurance: {
        memberId: new RandExp(this.patterns.insuranceMemberId).gen(),
        groupNumber: new RandExp(/[A-Z0-9]{6,10}/).gen(),
        provider: faker.helpers.arrayElement([
          'TestCare Insurance', 'Demo Health Plan', 'Sample Medical Group'
        ])
      },
      
      // Synthetic flags
      syntheticRecord: true,
      hipaaCompliant: true
    };
  }
  
  generatePrescription() {
    return {
      prescriptionNumber: new RandExp(this.patterns.prescriptionNumber).gen(),
      medication: faker.helpers.arrayElement([
        'Generic Medication A', 'Test Drug B', 'Sample Prescription C'
      ]),
      dosage: ${faker.number.int({ min: 5, max: 500 })}mg,
      frequency: faker.helpers.arrayElement([
        'Once daily', 'Twice daily', 'Three times daily', 'As needed'
      ]),
      prescribedDate: faker.date.past({ years: 1 }),
      prescribingProvider: new RandExp(this.patterns.npi).gen(),
      isTestPrescription: true
    };
  }
}

Generate healthcare-compliant test data with our medical data generator.

Business Rule Implementation

1. Conditional Pattern Generation

Generate data that follows complex business rules:

class BusinessRuleGenerator {
  constructor() {
    this.rules = new Map();
  }
  
  addRule(name, conditions, patterns) {
    this.rules.set(name, { conditions, patterns });
  }
  
  generateByRule(ruleName, context = {}) {
    const rule = this.rules.get(ruleName);
    if (!rule) throw new Error(Rule '${ruleName}' not found);
    
    // Evaluate conditions to determine which pattern to use
    for (const condition of rule.conditions) {
      if (condition.when(context)) {
        const pattern = condition.pattern;
        const generator = new RandExp(pattern);
        let result = generator.gen();
        
        // Apply any transformations
        if (condition.transform) {
          result = condition.transform(result, context);
        }
        
        return result;
      }
    }
    
    // Default pattern if no conditions match
    const defaultPattern = rule.patterns.default || /[A-Z0-9]{8}/;
    return new RandExp(defaultPattern).gen();
  }
}

// Example: Employee ID generation based on department and seniority const businessRules = new BusinessRuleGenerator();

businessRules.addRule('employeeId', [ { when: (ctx) => ctx.department === 'Engineering' && ctx.seniority === 'Senior', pattern: /SE\d{4}/, // Senior Engineer transform: (value, ctx) => ${value}-${ctx.location || 'HQ'} }, { when: (ctx) => ctx.department === 'Engineering', pattern: /EN\d{4}/, // Engineer }, { when: (ctx) => ctx.department === 'Sales' && ctx.seniority === 'Senior', pattern: /SS\d{4}/, // Senior Sales }, { when: (ctx) => ctx.department === 'Sales', pattern: /SL\d{4}/, // Sales }, { when: (ctx) => ctx.seniority === 'Manager', pattern: /MG\d{4}/, // Manager } ]);

// Generate employee IDs based on context console.log(businessRules.generateByRule('employeeId', { department: 'Engineering', seniority: 'Senior', location: 'NYC' })); // "SE1234-NYC"

console.log(businessRules.generateByRule('employeeId', { department: 'Sales', seniority: 'Junior' })); // "SL5678"

2. Cross-Field Validation Patterns

Generate data where multiple fields must be consistent:

class CrossFieldPatternGenerator {
  constructor() {
    this.fieldRelationships = new Map();
  }
  
  addRelationship(primaryField, dependentField, relationship) {
    if (!this.fieldRelationships.has(primaryField)) {
      this.fieldRelationships.set(primaryField, []);
    }
    
    this.fieldRelationships.get(primaryField).push({
      field: dependentField,
      relationship: relationship
    });
  }
  
  generateRelatedFields(primaryField, primaryValue) {
    const relationships = this.fieldRelationships.get(primaryField) || [];
    const result = { [primaryField]: primaryValue };
    
    for (const rel of relationships) {
      result[rel.field] = rel.relationship(primaryValue);
    }
    
    return result;
  }
}

// Example: Generate consistent address data const addressGenerator = new CrossFieldPatternGenerator();

// Define relationships between address fields addressGenerator.addRelationship('zipCode', 'state', (zipCode) => { // Simplified: determine state from zip code pattern const zip = parseInt(zipCode); if (zip >= 10000 && zip <= 14999) return 'NY'; if (zip >= 90000 && zip <= 96199) return 'CA'; if (zip >= 60000 && zip <= 60999) return 'IL'; return 'XX'; // Default for test data });

addressGenerator.addRelationship('zipCode', 'city', (zipCode) => { // Generate city name based on zip code return TestCity${zipCode.slice(-3)}; });

addressGenerator.addRelationship('state', 'country', (state) => { return 'US'; // All states map to US });

// Generate consistent address const zipCode = new RandExp(/\d{5}/).gen(); const address = addressGenerator.generateRelatedFields('zipCode', zipCode);

console.log(address); // { // zipCode: "10001", // state: "NY", // city: "TestCity001", // country: "US" // }

Data Validation and Testing

1. Pattern Validation Testing

Test your regex patterns thoroughly:

class PatternValidator {
  static validatePattern(pattern, testCases, expectedMatches = true) {
    const results = {
      pattern: pattern.toString(),
      passed: 0,
      failed: 0,
      failures: []
    };
    
    for (const testCase of testCases) {
      const matches = pattern.test(testCase);
      
      if (matches === expectedMatches) {
        results.passed++;
      } else {
        results.failed++;
        results.failures.push({
          input: testCase,
          expected: expectedMatches,
          actual: matches
        });
      }
    }
    
    return results;
  }
  
  static generateAndValidate(pattern, count = 100) {
    const generator = new RandExp(pattern);
    const generated = [];
    const validationErrors = [];
    
    for (let i = 0; i < count; i++) {
      const value = generator.gen();
      generated.push(value);
      
      // Validate that generated value matches the pattern
      if (!pattern.test(value)) {
        validationErrors.push({
          value: value,
          error: 'Generated value does not match pattern'
        });
      }
    }
    
    return {
      generated: generated,
      errors: validationErrors,
      successRate: ((count - validationErrors.length) / count * 100).toFixed(2) + '%'
    };
  }
}

// Test phone number pattern const phonePattern = /^\(\d{3}\) \d{3}-\d{4}$/;

const validPhones = [ '(555) 123-4567', '(800) 555-1234', '(123) 456-7890' ];

const invalidPhones = [ '555-123-4567', // Wrong format '(555) 1234567', // Missing dash '(55) 123-4567' // Wrong digit count ];

console.log('Valid phone tests:'); console.log(PatternValidator.validatePattern(phonePattern, validPhones, true));

console.log('\nInvalid phone tests:'); console.log(PatternValidator.validatePattern(phonePattern, invalidPhones, false));

console.log('\nGenerated phone validation:'); console.log(PatternValidator.generateAndValidate(phonePattern, 50));

2. Edge Case Pattern Testing

Generate edge cases for thorough testing:

class EdgeCaseGenerator {
  static generateEdgeCases(basePattern) {
    const edgeCases = [];
    
    // Generate minimum length cases
    const minPattern = this.createMinimumPattern(basePattern);
    if (minPattern) {
      edgeCases.push({
        type: 'minimum_length',
        pattern: minPattern,
        value: new RandExp(minPattern).gen()
      });
    }
    
    // Generate maximum length cases
    const maxPattern = this.createMaximumPattern(basePattern);
    if (maxPattern) {
      edgeCases.push({
        type: 'maximum_length',
        pattern: maxPattern,
        value: new RandExp(maxPattern).gen()
      });
    }
    
    // Generate boundary cases
    const boundaryPatterns = this.createBoundaryPatterns(basePattern);
    for (const boundaryPattern of boundaryPatterns) {
      edgeCases.push({
        type: 'boundary_case',
        pattern: boundaryPattern,
        value: new RandExp(boundaryPattern).gen()
      });
    }
    
    return edgeCases;
  }
  
  static createMinimumPattern(pattern) {
    // Convert quantifiers to minimum values
    let minPattern = pattern.toString();
    minPattern = minPattern.replace(/\{(\d+),\d*\}/g, '{$1}'); // {2,5} -> {2}
    minPattern = minPattern.replace(/\+/g, ''); // + -> single occurrence
    minPattern = minPattern.replace(/\/g, ''); //  -> no occurrence
    minPattern = minPattern.replace(/\?/g, ''); // ? -> no occurrence
    
    return new RegExp(minPattern.slice(1, -1)); // Remove /.../ wrapper
  }
  
  static createMaximumPattern(pattern) {
    // Convert quantifiers to maximum reasonable values
    let maxPattern = pattern.toString();
    maxPattern = maxPattern.replace(/\{\d*,(\d+)\}/g, '{$1}'); // {2,5} -> {5}
    maxPattern = maxPattern.replace(/\+/g, '{10}'); // + -> reasonable max
    maxPattern = maxPattern.replace(/\/g, '{10}'); //  -> reasonable max
    maxPattern = maxPattern.replace(/\?/g, ''); // ? -> single occurrence
    
    return new RegExp(maxPattern.slice(1, -1));
  }
  
  static createBoundaryPatterns(pattern) {
    // Create patterns for testing character class boundaries
    const boundaries = [];
    
    // Number boundaries
    if (pattern.toString().includes('\\d')) {
      boundaries.push(/0+/);  // All zeros
      boundaries.push(/9+/);  // All nines
    }
    
    // Letter boundaries
    if (pattern.toString().includes('[A-Z]')) {
      boundaries.push(/A+/);  // All A's
      boundaries.push(/Z+/);  // All Z's
    }
    
    return boundaries;
  }
}

// Example usage const emailPattern = /[a-z]{3,8}\.[a-z]{3,8}@[a-z]{3,10}\.(com|org|net)/; const edgeCases = EdgeCaseGenerator.generateEdgeCases(emailPattern);

console.log('Edge cases for email pattern:'); edgeCases.forEach(edge => { console.log(${edge.type}: ${edge.value}); });

Performance Optimization

1. Pattern Compilation Caching

Optimize regex performance for high-volume generation:

class OptimizedPatternGenerator {
  constructor() {
    this.compiledPatterns = new Map();
    this.generationStats = new Map();
  }
  
  compilePattern(name, pattern, options = {}) {
    const compiled = {
      regex: new RandExp(pattern),
      originalPattern: pattern,
      options: options,
      compiledAt: Date.now()
    };
    
    // Apply optimization options
    if (options.maxLength) {
      compiled.regex.max = options.maxLength;
    }
    
    this.compiledPatterns.set(name, compiled);
    this.generationStats.set(name, { generated: 0, totalTime: 0 });
    
    return compiled;
  }
  
  generate(patternName, count = 1) {
    const pattern = this.compiledPatterns.get(patternName);
    if (!pattern) {
      throw new Error(Pattern '${patternName}' not compiled);
    }
    
    const startTime = Date.now();
    const results = [];
    
    for (let i = 0; i < count; i++) {
      results.push(pattern.regex.gen());
    }
    
    // Update statistics
    const stats = this.generationStats.get(patternName);
    stats.generated += count;
    stats.totalTime += Date.now() - startTime;
    
    return count === 1 ? results[0] : results;
  }
  
  getPerformanceStats(patternName) {
    const stats = this.generationStats.get(patternName);
    if (!stats) return null;
    
    return {
      totalGenerated: stats.generated,
      totalTimeMs: stats.totalTime,
      averageTimeMs: stats.totalTime / stats.generated,
      generationsPerSecond: Math.round(stats.generated / stats.totalTime * 1000)
    };
  }
  
  optimizePattern(patternName) {
    const pattern = this.compiledPatterns.get(patternName);
    if (!pattern) return false;
    
    // Apply common optimizations
    let optimizedPattern = pattern.originalPattern.toString();
    
    // Replace expensive quantifiers with fixed ranges
    optimizedPattern = optimizedPattern.replace(/\+/g, '{1,5}');
    optimizedPattern = optimizedPattern.replace(/\*/g, '{0,5}');
    
    // Create optimized version
    const optimized = new RegExp(optimizedPattern.slice(1, -1));
    pattern.regex = new RandExp(optimized);
    
    return true;
  }
}

// Usage example const optimizer = new OptimizedPatternGenerator();

// Compile frequently used patterns optimizer.compilePattern('userId', /USER_\d{6}_[A-Z]{3}/, { maxLength: 15 }); optimizer.compilePattern('sessionId', /[a-f0-9]{32}/, { maxLength: 32 });

// Generate data console.time('generation'); const userIds = optimizer.generate('userId', 10000); const sessionIds = optimizer.generate('sessionId', 10000); console.timeEnd('generation');

// Check performance console.log('User ID generation stats:', optimizer.getPerformanceStats('userId')); console.log('Session ID generation stats:', optimizer.getPerformanceStats('sessionId'));

Conclusion

Regular expressions unlock powerful customization capabilities for fake data generation, enabling you to create data that precisely matches your application's requirements:

Key Benefits:

  • Precision: Generate data that exactly matches validation rules
  • Compliance: Ensure data meets industry standards and formats
  • Flexibility: Handle complex business logic and conditional patterns
  • Testing: Create comprehensive edge cases and boundary conditions
  • Performance: Optimize generation for high-volume scenarios
  • Best Practices:

  • • Start with simple patterns and iterate toward complexity
  • • Validate your patterns thoroughly with edge cases
  • • Cache compiled patterns for performance in high-volume generation
  • • Document complex patterns for team understanding
  • • Test generated data against your actual validation logic
  • Common Use Cases:

  • • Industry-specific identifiers (medical, financial, legal)
  • • Custom format validation testing
  • • Business rule compliance verification
  • • Edge case and boundary testing
  • • Integration with existing validation systems
  • Ready to create custom data patterns for your specific needs? Start building with our advanced pattern generator that supports regex-driven data generation.

    Related Articles:

  • The Ultimate Guide to Test Data Generation
  • Generating Realistic User Data for Web Applications
  • Techniques for Generating Large Volumes of Test Data
  • Need help implementing complex regex patterns for your specific business requirements? Contact our pattern experts for specialized assistance.

    Ready to Generate Test Data?

    Put these best practices into action with our comprehensive data generation tools.

    Related Articles

    Development
    8 min read

    FakerBox vs Mockaroo

    Compare Mockaroo vs FakerBox: features, pricing & limits. Discover why FakerBox is the smarter, free choice for test data generation.

    Development
    8 min read

    Fake Name Generator vs FakerBox

    Fake Name Generator vs FakerBox: see key differences in features, usability & pricing. Learn why FakerBox is the best all-in-one solution.

    Development
    20 min read

    The Ultimate Guide to Test Data Generation

    Comprehensive resource covering everything from basic fake data generation to advanced synthetic data strategies for modern development teams.