πŸ” Detective’s Quick Reference

Essential pandas commands every accounting detective needs to know


🏁 Getting Started - Crime Scene Setup

Task Pandas Command Example
Load CSV file pd.read_csv() df = pd.read_csv('transactions.csv')
Load Excel file pd.read_excel() df = pd.read_excel('ledger.xlsx', sheet_name='GL')
First look at data df.head() df.head(10) # First 10 rows
Data shape df.shape df.shape # (rows, columns)
Column info df.info() Shows data types and null counts
Quick stats df.describe() Summary statistics for numeric columns

πŸ”Ž Investigation Basics - Excel to Pandas

Excel Function Pandas Equivalent Detective Example
VLOOKUP merge() df.merge(vendor_df, on='vendor_id')
Pivot Table pivot_table() df.pivot_table(values='amount', index='account', aggfunc='sum')
Filter df[condition] df[df['amount'] > 10000] # Large transactions
SUMIFS groupby().sum() df.groupby('department')['amount'].sum()
COUNTIFS groupby().count() df.groupby('account')['transaction_id'].count()
Remove Duplicates drop_duplicates() df.drop_duplicates(['date', 'amount', 'vendor'])
Sort sort_values() df.sort_values('date', ascending=False)
IF Formula np.where() df['flag'] = np.where(df['amount'] < 0, 'Credit', 'Debit')

πŸ•΅οΈ Advanced Detective Work

Data Quality Investigation

# Find missing values (the vanishing data mystery)
df.isna().sum()                           # Count missing per column
df[df.isna().any(axis=1)]                # Rows with any missing data
df.dropna()                              # Remove rows with missing values
df.fillna(0)                             # Fill missing with zeros

# Spot duplicates (the clone case)
df.duplicated().sum()                    # Count duplicates
df[df.duplicated(keep=False)]            # Show all duplicate rows
df.drop_duplicates(keep='first')         # Keep first occurrence

Pattern Recognition

# Text contains investigation
df[df['description'].str.contains('fraud', case=False)]   # Find fraud mentions
df['description'].str.extract(r'(\$[\d,]+)')             # Extract dollar amounts
df['account'].str.startswith('5')                        # Accounts starting with 5

# Date detective work  
df['month'] = df['date'].dt.month                        # Extract month
df['quarter'] = df['date'].dt.quarter                    # Extract quarter
df[df['date'].between('2023-01-01', '2023-12-31')]      # Date range filter

Reconciliation Magic

# The merge mystery techniques
pd.merge(df1, df2, on='id', how='left')                 # Left join
pd.merge(df1, df2, on='id', how='outer', indicator=True) # Show merge status
pd.merge_asof(df1, df2, on='date', by='customer')       # Nearest date match

# Set operations (finding the difference)
set(df1['id']) - set(df2['id'])                         # IDs in df1 but not df2
df1[~df1['id'].isin(df2['id'])]                         # Rows not in df2

πŸ’° Accounting-Specific Mysteries

Financial Analysis

# Monthly summaries
df.groupby(df['date'].dt.to_period('M'))['amount'].sum()

# Running totals (bank reconciliation style)
df['running_total'] = df['amount'].cumsum()

# Percentage of total
df['pct_of_total'] = df['amount'] / df['amount'].sum() * 100

# Top N analysis
df.nlargest(10, 'amount')                               # Top 10 by amount
df.nsmallest(5, 'amount')                               # Bottom 5 by amount

Legacy System Cleanup

# Fix negative formatting (Mystery #002 style)
df['amount'] = df['amount'].str.replace(r'[-CR]+$', '', regex=True)
df.loc[df['amount'].str.endswith('-'), 'amount'] = '-' + df['amount'].str[:-1]

# Clean currency formatting
df['amount'] = df['amount'].str.replace('$', '').str.replace(',', '')
df['amount'] = pd.to_numeric(df['amount'], errors='coerce')

# Extract numbers from text
df['extracted'] = df['description'].str.extract(r'(\d+\.?\d*)')

🚨 Fraud Detection Patterns

# Benford's Law check (first digit analysis)
first_digits = df['amount'].astype(str).str[0]
first_digits.value_counts().sort_index()

# Round number bias
df[df['amount'] % 100 == 0]                             # Exactly divisible by 100
df[df['amount'].astype(str).str.endswith('00')]         # Ends in 00

# Weekend transactions (unusual timing)
df[df['date'].dt.dayofweek >= 5]                        # Saturday/Sunday transactions

# Statistical outliers
Q1 = df['amount'].quantile(0.25)
Q3 = df['amount'].quantile(0.75)
IQR = Q3 - Q1
df[(df['amount'] < Q1 - 1.5*IQR) | (df['amount'] > Q3 + 1.5*IQR)]

πŸ“Š Quick Visualizations

# Detective's visual toolkit
df['amount'].hist(bins=20)                              # Distribution plot
df.groupby('account')['amount'].sum().plot(kind='bar')  # Account totals
df['date'].value_counts().plot()                        # Transaction frequency
df.boxplot(column='amount', by='department')            # Department comparison

⚑ One-Liner Power Moves

# Quick wins for busy detectives
df.groupby('vendor')['amount'].agg(['count', 'sum', 'mean'])    # Vendor analysis
df.assign(month=df['date'].dt.month).pivot_table('amount', 'account', 'month')  # Monthly pivot
df.query('amount > 1000 and department == "IT"')               # Multiple conditions
df.sample(100)                                                  # Random sample
df.drop_duplicates().reset_index(drop=True)                    # Clean duplicates & reset index

πŸ”§ Data Export for Reports

# Share your detective findings
df.to_csv('investigation_results.csv', index=False)
df.to_excel('audit_findings.xlsx', sheet_name='Summary', index=False)

# Multi-sheet Excel export
with pd.ExcelWriter('full_investigation.xlsx') as writer:
    summary_df.to_excel(writer, sheet_name='Summary', index=False)
    detail_df.to_excel(writer, sheet_name='Detail', index=False)

πŸ’‘ Pro Detective Tips

Performance Boosters:

  • Use df.query() instead of df[df['col'] == value] for complex conditions
  • Chain operations: df.groupby('x').sum().sort_values('y', ascending=False)
  • Use pd.cut() for binning amounts into ranges

Memory Savers:

  • Use pd.read_csv(chunksize=1000) for huge files
  • Convert text to categories: df['category'] = df['category'].astype('category')
  • Use appropriate data types: pd.to_numeric(downcast='integer')

Debugging Helpers:

  • df.dtypes - Check data types
  • df.memory_usage() - Memory consumption
  • df.nunique() - Count unique values per column

πŸ“š Mystery-Specific References

  • Rounding Issues (Mystery #001): df.round(2) vs round(df.sum(), 2)
  • Legacy Formats (Mystery #002): str.replace() with regex patterns
  • Historical Matching (Mystery #003): pd.merge_asof() for time-based joins
  • Missing Data (Mystery #004): isna(), fillna(), dropna()
  • Text Extraction (Mystery #005): str.extract() with regex patterns
  • Nearest Matching (Mystery #006): merge_asof() with direction parameters

Keep this reference handy during your data investigations! Every great detective needs reliable tools. πŸ”


Need help with a specific case? Email us at hello@pandaudit.com with your data mystery!