Back to PROJECTS

Data & Society: CHARTING THE TITANIC

Date
2024
Software
R Studio, Excel, Illustrator

Prompt

You are tasked with exploring, cleaning, and creating a chart (that tells a story) with the Titanic Data Set. Import the data set into Excel or another spreadsheet program (the data is a CSV file) and look at the metadata to understand the data.
Metadata
Data covers passengers only, not crew.
Each row is a different passenger.
survival - Survival (0 = No; 1 = Yes)
class - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
name - Name
sex - Sex
age - Age
sibsp - Number of Siblings/Spouses Aboard
parch - Number of Parents/Children Aboard
ticket - Ticket Number
fare - Passenger Fare (how much paid)
cabin - Cabin
embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
boat - Lifeboat (if survived)
body - Body number (if did not survive and body was recovered)

My Process

This was a quarter-long project where the students had multiple iterations leading to the final chart.
Attempt #1
I explored the Titanic data by first looking at the metadata to see what everything means to gain an understanding about the data. I then imported the CSV file into Microsoft Excel and to see if there were any inconsistencies in the data, I used the “Sort” feature in Excel and sorted each column to find missing information in the cells. I noticed that in some categories such as age or embarkment location was missing so I highlighted those rows in gray to know there is missing data (I did not do this for the metadata such as cabin).
excel sheet screenshot
In order to further clean the data, I looked at some of the metadata to see if anything could be turned into a logical argument (in the case of importing it into R). I made a “survivedL” column and changed “survive” to either “yes” or“no”. Additionally, I created two columns: “ageClassC” for children and “ageClassA” for adults to turn them into logical statements too. I also made a column labeled “embarkedA” to abbreviate “Cherbourg” to “C”, “Queenstown” to “Q”, and “Southampton” to “S”. Then I imported the data set into R Studio by setting a working directory. I read the file in and then use the head(), tail(), and print() commands to make sure the data was read in correctly. I imported the tidyverse package in order to make a chart to visualize the data. To make age groups for the x-axis labeling, I created an age group variable then factored it and applied labels. Then I used the ggplot() function to create the graph.
Rstudio screenshot
graph made in Rstudio
Attempt #2
To start off my second attempt at analyzing the Titanic dataset and making a chart, I thought of a useful question that would help guide me towards creating a better chart where this time–using Excel (as my professor wanted everyone to use Excel for this instead of R). The question I used was: Did the place you embarked from help you survive more if you were first class or not? This question uses the “survival”, “embarked”, and “pclass” variables within the data set. It raises the idea that possibly first class passengers had a higher probability of surviving compared to those in second or third class due to the fact they (most likely) had higher prioritization when being evacuated off the boat.
sketch of graphexcel screenshotexcel graph of 1st classexcel graph of 2nd and 3rd class
After coming up with a question, I developed a sketch of a possible new chart. For the sketch, I decided to stick with a bar graph because it makes the most sense in showing the survival rate of passengers and being able to compare it with passengers from different classes. I wanted to highlight the different places of embarkment in the x-axis, separated by class and the number of passengers in the y-axis. I created a title that gave a clear understanding and answer of the question but I was not completely sure it told a story. That is one thing I kept struggling with for this entire assignment (until attempt #3). I gave the graph a key/legend that helps the viewer understand the difference between the bars on whether passengers died or survived. The title of the graph gives context on what the numbers on the y-axis are so I excluded it from the actual graph. I used the PivotTable feature in Excel for organizing the data to make these charts. Below is the final chart with 1st, 2nd, and 3rd Class combined into one chart.
excel graph 1
Attempt #3
My third attempt and revised Titanic chart is based off of feedback I got from my peer reviewers and teacher's assistant (TA). My peer reviewers and TA recommended: changing the title so it conveys a story, add absolute data values, separate classes, increase type size of data, and add labels to y-axis. Here is the chart developed from their feedback.
revised final excel graph (white version)
Final Attempt
For my final chart, I changed the title to be much bigger than I originally had it after receiving feedback from my TA. I also made the title caption smaller and nestled it under the title. I played around with the background color of my chart to increase contrast and settled on a final version with a black background. I also decided to change the chart lines that line up with the numbers on the y-axis from solid lines to dotted in addition to playing with the transparency of them to make them less noticeable while still being able to see them. And voilà! My final chart is done.
final excel graph (black version)