#041 - Python Essentials | 04 - Data Types: Strings
A Guide to Understanding and Utilizing Python Strings for Various Engineering Tasks. Time for String Theory!
"String theory is the most promising candidate for a complete theory of the universe."
- Stephen Hawking
Couldn’t agree more Stephen, today’s discussion is on ‘Strings’, one of the key data types in Python. If you’re looking for the quantum variety, you’ve been hoodwinked and you’re in the wrong place.
This article covers 95% of how I use strings. Once you read through it, I recommend you experiment with some of the code examples mentioned herein, which you can do in this Colab notebook or you can fork this GitHub Repo.
If you are new to this, it’s a lot to take in but the first step is to understand the scope of possibilities, then you can choose what’s relevant to you, and progress from there.
Up front, I can tell you that understanding strings has massive benefits in managing and using data effectively.
Introduction
Strings are easy to understand from a fundamental perspective, but they are so flexible that many of their convenient properties and use cases are often overlooked. We will briefly review some of the basics and then explore a couple of interesting uses for strings, from an engineering perspective.
Strings are one of the elemental data types in Python programming, used to represent and manipulate text-based information, which is useful for many purposes. From creating and documenting calculations and generating reports to interfacing with other software tools.
For context, Python has other fundamental data types, which we will cover in a future article…
Integers: Whole numbers. Example:
42
Floats: Decimal numbers. Example:
3.14
Booleans:
True
orFalse
. Example:True
Lists: Mutable sequence. Example:
[1, 2, 3]
Tuples: Immutable sequence. Example:
(1, 2, 3)
Dictionaries: Key-value pairs. Example:
{'name': 'John'}
Sets: Unique items. Example:
{1, 2, 3}
NoneType: Represents no value. Example:
None
Understanding Python Strings
Definition and Basics
In Python, a string is a sequence of characters enclosed in either single quotes (' '), double quotes (" "), or triple quotes (''' or """). Strings are used to represent and store text-based information. They can contain letters, numbers, symbols, and whitespace characters.
Here are a few examples of strings (sometimes referred to as ‘string literals’) in Python:
string1 = 'Hello, World!'
string2 = "We are learning Python for Engineers"
string3 = '''This is a
multiline string'''
Creating Strings and Handling Quotes
Python provides multiple ways to define strings. You can use single quotes, double quotes, or triple quotes depending on your preference and the content of the string.
Single and Double Quotes: Used for single-line strings. I almost always use single quotes.
Triple Quotes: Used for multiline strings or strings that contain both single and double quotes.
When dealing with strings that contain single quotes ('
) or double quotes ("
), you can use different types of quotes to enclose the string or use escape characters. If your string contains single quotes, you can use double quotes to enclose the string, and vice versa.
single_quote_inside = "This string contains a single quote: '"
double_quote_inside = 'This string contains a double quote: "'
Using escape characters, i try not to do this. I prefer the method above but sometimes you need options!
escaped_single_quote = 'It\'s a beautiful day!'
escaped_double_quote = "He said, \"Hello!\""
String Operations
Python supports various operations on strings, allowing you to manipulate and combine them.
Concatenation (
+
): You can concatenate (join) two or more strings using the+
operator. Handy for renaming or modifying database columns.
string1 = "Hello"
string2 = "World"
result = string1 + " " + string2 # Note the space as a string too
print(result) # Output: Hello World
Repetition (
*
): You can repeat a string multiple times using the*
operator. I have never used this.
string = 'Hello'
result = string * 3
print(result) # Output: HelloHelloHello
String Length
To determine the length of a string (i.e., the number of characters it contains), you can use the len()
function. This comes in handy in more ways than you think!
Determine string length for validation, like a beam section size or a report ID.
Calculate padding for text alignment in reports.
Measure input size for character constraints.
string = "Python"
length = len(string)
print(length) # Output: 6
Accessing Characters
Python allows you to access individual characters within a string using indexing and slicing.
Indexing: Retrieve specific characters from a database, analysis results or product lists for validation.
string = "Python"
first_char = string[0]
print(first_char) # Output: P
Slicing: Extract substrings from FEM elements or outputs for categorization.
string = "Python"
substring = string[0:3]
print(substring) # Output: Pyt
Negative Indexing: Access the last character of serial numbers or specifications to determine version or location.
shape = "W14X426" # From AISC Shapes Database
# Get the last three characters (typically the full weight for shapes under 1000 lb/ft)
weight = shape[-3:]
print(f"Weight per foot: {weight} lb/ft") # Output: Weight per foot: 426 lb/ft
String Methods and Manipulations
Python provides a set of built-in methods for manipulating and working with strings. These methods allow you to perform various operations such as changing case, removing whitespace, searching for substrings, and more.
Common String Methods
Changing Case:
.upper()
,.lower()
,.title()
string = "Hello, World!"
print(string.upper()) # Output: HELLO, WORLD!
print(string.lower()) # Output: hello, world!
print(string.title()) # Output: Hello, World!
Removing Whitespace:
.strip()
,.lstrip()
,.rstrip()
string = " Hello, World! "
print(f"Original string: '{string}'")
print(f"After strip(): '{string.strip()}'")
print(f"After lstrip(): '{string.lstrip()}'")
print(f"After rstrip(): '{string.rstrip()}'")
Original string: ' Hello, World! '
After strip(): 'Hello, World!'
After lstrip(): 'Hello, World! '
After rstrip(): ' Hello, World!'
Searching and Replacing:
.find()
,.replace()
string = "Hello, World!"
index = string.find("World")
replaced = string.replace("World", "Python")
print(index) # Output: 7
print(replaced) # Output: Hello, Python
String Formatting
Python provides various ways to format strings, allowing you to insert values and create dynamic string representations. Let's explore a few string formatting techniques.
This is a very important concept for engineering.
F-Strings (Formatted String Literals): Introduced in Python 3.6, f-strings
provide a concise and readable way to embed expressions inside strings. I highly recommend them over older methods like %
operator or str.format()
. Don’t bother using these older methods if you can avoid it.
Developing an intuitive understanding of f-strings will be a huge help to your workflow. This will help you to ‘think’ in Python.
Basic Usage:
name = "James"
age = 38
formatted_string = f"My name is {name} and I am {age} years old."
print(formatted_string) # Output: My name is James and I am 38 years old.
Expressions inside f-strings:
length = 10
width = 5
area = length * width
formatted_string = f"The area of the rectangle is {length} * {width} = {area} sq units."
print(formatted_string) # Output: The area of the rectangle is 10 * 5 = 50 sq units.
Common f-String Methods
Decimal Precision and Scientific Notation: Control decimal places in calculations and format using scientific notation.
stress = 12345.6789
print(f"Stress: {stress:.2f} MPa") # Output: Stress: 12345.68 MPa
print(f"Stress: {stress:.2e} MPa") # Output: Stress: 1.23e+04 MPa
Inline Debugging: Quickly debug variables during development.
This is an important and useful distinction. The f-string debugging syntax (f"{variable=}") quickly displays both the variable name and its value in a single, concise line, making it easier to track multiple variables during development and troubleshooting of engineering calculations. An inevitability in engineering.
beam_length = 5.5 # meters
print(f"{beam_length=}") # Output: beam_length=5.5
Thousands Separator with Underscores: Format large numbers with underscores for readability.
load = 1000000
print(f"Load: {load:_} N") # Output: Load: 1_000_000 N
Date Formatting: Format dates in reports or logs.
from datetime import datetime
now = datetime.now()
print(f"Report Date: {now:%c}") # Output: Report Date: Mon Aug 26 21:29:01 2024
Center Alignment: Center-align text within a fixed width.
material = "Steel"
print(f"Material: {material:^20}") # Output: Material: Steel
Practical Use Cases in Engineering
Python strings find numerous applications in various aspects of civil and structural engineering projects. Let's explore a few practical use cases where strings can be leveraged to streamline workflows and automate tasks.
Data Cleaning
In engineering projects, we often encounter datasets that require cleaning before they can be effectively analyzed. Let's examine a real-world example using borehole data from a geotechnical investigation. You can find the sample code for this data cleaning process here.
Identifying Data Issues
Before we start cleaning, it's crucial to identify the problems in our dataset. Here are the main issues we've found in our borehole data. These are common examples of data cleaning problems.
Inconsistent column naming
Inconsistent soil type formatting
Dates stored as strings instead of datetime objects
Inconsistent representation of non-plastic soil properties
Potential issues with numeric data storage
Lack of derived data (e.g., plasticity index)
Cleaning Process
To address these issues, we'll use various string manipulation techniques and pandas functions. Here's a step-by-step approach:
Import Libraries:
import pandas as pd
: Imports the Pandas library.
Define
clean_borehole_data
Function:Reads an Excel file into a DataFrame.
Cleans column names by stripping whitespace, converting to lowercase, and replacing spaces/hyphens with underscores.
Cleans the 'soil_type' column by stripping whitespace and converting to title case.
Converts the 'date_sampled' column to a datetime format.
Replaces
'NP'
with'Non-Plastic'
in the entire DataFrame.
Use the Function:
Calls
clean_borehole_data
with the file path, cleans the data, and stores the result incleaned_df
.Prints the first few rows of the cleaned DataFrame.
Saves the cleaned DataFrame to a new Excel file.
import pandas as pd
def clean_borehole_data(file_path):
# Read the Excel file
df = pd.read_excel(file_path)
print("Original column names:")
print(df.columns.tolist())
# Clean up column names using string methods
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('-', '_')
print("\nCleaned column names:")
print(df.columns.tolist())
# Clean up soil type using string methods
df['soil_type'] = df['soil_type'].str.strip().str.title()
# Convert date to datetime
df['date_sampled'] = pd.to_datetime(df['date_sampled'], format='%d/%m/%Y')
# Replace 'NP' with 'Non-Plastic' for clarity
df = df.replace('NP', 'Non-Plastic')
return df
# Use the function
cleaned_df = clean_borehole_data('../data/borehole_data.xlsx')
print("\nCleaned DataFrame (first few rows):")
print(cleaned_df.head().to_string(index=False))
# Optionally, save the cleaned data to a new Excel file
cleaned_df.to_excel('../data/cleaned_borehole_data.xlsx', index=False)
Benefits of Data Cleaning
By applying these string manipulation and data cleaning techniques, we achieve several benefits:
Improved Consistency: Our data now follows a uniform format, making it easier to work with and understand.
Enhanced Readability: Cleaned column names and standardized soil descriptions improve the dataset's clarity.
Better Data Types: Dates and numeric values are now stored in appropriate formats, facilitating further analysis.
Added Value: The calculated plasticity index provides additional insights without manual computation.
This cleaned dataset is now more suitable for analysis, visualization, and integration with other engineering tools and workflows.
Remember, the specific cleaning steps may vary depending on your dataset, but the principles of using string methods and pandas functions to standardize and enhance your data remain the same. Always inspect your data thoroughly before and after cleaning to ensure the process has improved your dataset without introducing new issues.
Documenting Calculations
This is about how you construct your automations and your outputs to provide clear and readable results. Not to be confused with a Jupyter notebook where we can lean on markdown cells to document our thoughts and work.
With strings, we are literally designing the formats of our programming outputs to generate text outputs for reports and documentation.
Example: Automating beam calculation reports
def generate_beam_report(beam_name, length, width, height, material):
return f"""Beam Calculation Report
Beam Name: {beam_name}
Dimensions:
Length: {length:6.1f} m # 6.1f means:
Width: {width:6.1f} m # 6 characters wide, 1 decimal place, float
Height: {height:6.1f} m # This ensures alignment and consistent decimal places
Material: {material}
"""
# Usage
beam_report = generate_beam_report("B1", 5.0, 0.3, 0.4, "Concrete")
print(beam_report)
Output:
Beam Calculation Report
Beam Name: B1
Dimensions:
Length: 5.0 m
Width: 0.3 m
Height: 0.4 m
Material: Concrete
Note the f-string formatting :6.1f
which ensures that all numerical values are aligned and consistently displayed, which is particularly useful when generating reports with multiple entries or when you want to ensure a neat, tabular appearance for your data.
This might seem like a lot of code for a short section of text, but once written, these functions can be reused indefinitely. You can also create classes to apply this same type of formatting output to columns, slabs, walls, or whatever you need.
Multiline Strings and Documentation
Python allows you to create multiline strings using triple quotes (''' or """). These strings can span multiple lines and are commonly used for documentation purposes, such as docstrings. It helps you, or anyone else reading your code in the future, understand the purpose and functionality of the code without having to decipher it from scratch. Trust me, you won't remember your thought process later, so document it now!
"The first principle is that you must not fool yourself, and you are the easiest person to fool.” - Richard P. Feynman
Example: Documenting Python scripts for engineering projects
def calculate_beam_deflection(length, load, modulus, moment_of_inertia):
"""
Calculate the maximum deflection of a simply supported beam under a uniformly distributed load.
Parameters:
- length (float): The length of the beam in meters.
- load (float): The uniformly distributed load on the beam in N/m.
- modulus (float): The modulus of elasticity of the beam material in Pa.
- moment_of_inertia (float): The moment of inertia of the beam cross-section in m^4.
Returns:
- deflection (float): The maximum deflection of the beam in meters.
"""
deflection = (5 * load * length**4) / (384 * modulus * moment_of_inertia)
return deflection
# Usage
beam_length = 5.0
beam_load = 10000
beam_modulus = 2e11
beam_moi = 1e-4
max_deflection = calculate_beam_deflection(beam_length, beam_load, beam_modulus, beam_moi)
print(f"Maximum deflection: {max_deflection:.5f} m")
Maximum deflection: 0.00407 m
In this example, the calculate_beam_deflection
function is documented using a multiline string (docstring) that provides information about the function's purpose, parameters, and return value. This documentation helps other people understand and use the function correctly. All of this text is ignored in the function, it’s purely for clarity and guidance.
Combining Strings with Data Analysis Libraries
Python strings can be effectively combined with data analysis libraries like Pandas to clean, format, and manipulate engineering datasets.
Example: Cleaning and formatting data in engineering datasets
import pandas as pd
import numpy as np
def clean_dataset(df):
# Remove leading/trailing whitespace from column names
df.columns = df.columns.str.strip()
# Convert all numeric columns to appropriate type
numeric_columns = ['Span', 'Width', 'Height']
df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce')
# Format 'Material' column to capitalize all words
df['Material'] = df['Material'].str.title()
# Strip whitespace from string columns
string_columns = ['Bridge Name', 'Material']
df[string_columns] = df[string_columns].apply(lambda x: x.str.strip())
# Remove rows with missing values
df.dropna(inplace=True)
return df
# Create synthetic data
data = {
'Bridge Name': ['Golden Gate ', ' Brooklyn', ' London', ' Sydney Harbor', 'Forth '],
'Span': [1280, 1595, 283, 503, 521],
'Width': [27, 26, 32, 49, 37],
'Height': ['227', '84', '13', '134', '110'],
'Material': ['Steel', 'steel', 'concrete', 'STEEL', 'steel']
}
# Convert to DataFrame
df = pd.DataFrame(data)
# Clean the dataset
cleaned_data = clean_dataset(df)
# Print the cleaned data
print(cleaned_data)
Output:
Bridge Name Span Width Height Material
0 Golden Gate 1280 27 227 Steel
1 Brooklyn 1595 26 84 Steel
2 London 283 32 13 Concrete
3 Sydney Harbor 503 49 134 Steel
4 Forth 521 37 110 Steel
In this example, the clean_dataset
function uses Pandas along with string methods to clean and format an engineering dataset. The function removes whitespace from column names, converts the 'Height' column to numeric type, capitalizes the 'Material' column, and removes rows with missing values. The resulting cleaned dataset is more consistent and easier to work with.
When I think of the countless hours I spent manually cleaning and wrestling with Excel spreadsheets earlier in my career, it fills me with sadness and regret. If you're a young engineer, avoid this pitfall. Use Python instead.
I’ve been playing with Polars as a data management library, some very interesting capabilities when compared to Pandas, although no native seaborn support yet! Will write about this soon.
Case Study: Automating Report Generation
Let’s demonstrate how to generate a structured PDF report for a structural analysis project using the fpdf
library.
There are many other libraries that do similar things (e.g. ReportLab) but this one is simple.
I should preface this by mentioning that my most common approach is to write my design documents in a Jupyter notebook using markdown. I really like writing in markdown, it’s intuitive, consistent and simple.
Rarely do I need to generate entire reports using Python. However, for larger finite element models or robust CFD work, automating report generation can be helpful.
In this very simple example, we'll create a Python script that generates a structural report as a PDF. This script demonstrates how to automate report generation for engineering applications, combining text and visual data. Here's what the script does:
Creates a custom PDF layout with headers and footers
Compiles structural data into a report
Project Name, Number of floors, floor height, beam spacing, column spacing
Generates a bar chart to visualize building properties
Embeds the chart into the PDF
Outputs a complete report
from fpdf import FPDF
import matplotlib.pyplot as plt
class PDF(FPDF):
def header(self):
self.set_font('Arial', 'B', 12)
self.cell(0, 10, 'Flocode | Example Structural Analysis Report', 0, 1, 'C')
def footer(self):
self.set_y(-15)
self.set_font('Arial', 'I', 8)
self.cell(0, 10, f'Page {self.page_no()} - Flocode Sample Report', 0, 0, 'C')
def create_structural_report(project_name, num_floors, floor_height, beam_spacing, column_spacing):
pdf = PDF()
pdf.add_page()
# Project Details
pdf.set_font('Arial', 'B', 14)
pdf.cell(0, 10, 'Project Details', 0, 1)
pdf.set_font('Arial', '', 12)
pdf.cell(0, 10, f'Project Name: {project_name}', 0, 1)
# Building Specifications
pdf.set_font('Arial', 'B', 14)
pdf.cell(0, 10, 'Building Specifications', 0, 1)
pdf.set_font('Arial', '', 12)
specs = [
f'Number of Floors: {num_floors}',
f'Floor Height: {floor_height} m',
f'Beam Spacing: {beam_spacing} m',
f'Column Spacing: {column_spacing} m',
f'Total Height: {num_floors * floor_height} m'
]
for spec in specs:
pdf.cell(0, 10, spec, 0, 1)
# Create a simple plot
plt.figure(figsize=(8, 4))
plt.bar(['Floors', 'Floor Height', 'Beam Spacing', 'Column Spacing'],
[num_floors, floor_height, beam_spacing, column_spacing])
plt.title('Building Specifications')
plt.savefig('building_specs.png')
plt.close()
# Add plot to PDF
pdf.image('building_specs.png', x=10, w=190)
pdf.output('structural_report.pdf')
# Usage
create_structural_report("Flocode Example Tower", 10, 3.5, 5, 6)
Automating report generation using Python can save significant time and effort. By leveraging libraries like fpdf
for PDF generation, you can produce consistent and professional reports relatively easily.
This example is bare bones but you can your own bells and whistles like branding and formatting, as needed.
The best way to learn is by doing. Use this code as a starting point, experiment with it, and adapt it to your specific needs.
Closing
We did it.
We've covered the basics of string creation, manipulation, and formatting, along with practical examples tailored to engineering scenarios.
If you want to play with the code examples mentioned herein, check out the GitHub Repository or Google Colab and go nuts.
In summary, Python strings are an essential data type that enables you to:
Represent and manipulate text-based information.
Format and align outputs for clear and readable results.
Automate repetitive tasks, such as generating reports and processing data.
While this article focused on the basics and practical applications of strings, there is much more to explore. Advanced string usage includes regular expressions for pattern matching, more sophisticated text processing techniques, and integration with other data types for complex data structures.
In the next installment of the "Essentials Series," we will examine lists, dictionaries, and tuples. These data structures provide powerful ways to organize and manage data in your Python projects. We'll explore their unique properties, common methods, and practical applications in engineering contexts.
Thanks for your time, I know this one was a lot to digest but the juice is worth the squeeze.
Flocode is going well, we have engineers in 122 countries around the world, learning and progressing together. More updates coming soon.
See you in the next one!
James 🌊
https://open.substack.com/pub/shinoj1/p/python-development-with-pymvvm-simplifying?utm_source=app-post-stats-page&r=4auyjk&utm_medium=ios