Data Extraction in JSON format through Star Wars API
Star Wars API (SWAPI) is the world's first quantitative Star Wars data set that can be used for programming. The developers have aggregated multiple types of entity data involved in the Star Wars series of movies. It also provides a Python programming language package, swapi-python
, which is built by the author of swapi, Paul Hallett.
The six APIs correspond to six types of entities:
- Films:http://swapi.co/api/films/1
- People:http://swapi.co/api/people/1
- Starships:http://swapi.co/api/starships/1
- Vehicles:http://swapi.co/api/vehicles/1
- Species:http://swapi.co/api/species/1
- Planets:http://swapi.co/api/planets/1
You can find my Github repository for this blog here.
Get information for all characters
It is quite convenient to get data about characters through SWAPI. Just one line code.
characters = requests.get('https://swapi.dev/api/people/').json()
Here, we extracted all pages that contains characters. But the format is annoying and has no structure. So I wrote a small loop to extract all characters and convert them into a pandas DataFrame.
people = []
for i in range(len(characters['results'])):
print(len(characters['results']), ' people are found!')
people.append(characters['results'][i])
while characters['next'] is not None:
characters = requests.get(characters['next']).json()
print(len(characters['results']), ' people are found!')
for i in range(len(characters['results'])):
people.append(characters['results'][i])
len(people) ## 82
people = pd.json_normalize(people)
There are 82 characters in the Star Wars universe.
Find the oldest character
In the codebook, the birth year of each character uses the in-universe standard of BBY or ABY - Before the Battle of Yavin or After the Battle of Yavin. The Battle of Yavin is a battle that occurs at the end of Star Wars episode IV: A New Hope. In this case, characters who were born in BBY are older than those born in ABY. However, there are several characters that we don't know their birth year, so we can't determine whether they are younger or older than the characters with a birth year.
I first split those characters with a birth year into two groups, BBY_people
and ABY_people
. The former one includes all characters who were born before the Battle of Yavin, and the latter one includes all characters who were born after the Battle of Yavin.
BBY_people = people[people.birth_year.str.contains('BBY')]
ABY_people = people[people.birth_year.str.contains('ABY')]
BBY_people.shape[0], ABY_people.shape[0] ## (43, 0)
There are 43 characters born before the Battle of Yavin and there are no characters born after the Battle of Yavin.
BBY_people = BBY_people.assign(birth_year = BBY_people.birth_year.str.extract('(\d+)'))
BBY_people['birth_year'] = BBY_people['birth_year'].astype('int32')
BBY_people.loc[BBY_people['birth_year'].idxmax()]
name | Yoda |
---|---|
height | 66 |
mass | 17 |
hair_color | white |
skin_color | green |
eye_color | brown |
birth_year | 896 |
gender | male |
homeworld | http://swapi.dev/api/planets/28/ |
films | [http://swapi.dev/api/films/2/, http://swapi.d... |
species | [http://swapi.dev/api/species/6/] |
vehicles | [] |
starships | [] |
created | 2014-12-15T12:26:01.042000Z |
edited | 2014-12-20T21:17:50.345000Z |
url | http://swapi.dev/api/people/20/ |
Yoda is the oldest one among all characters with a birth year.
Find the titles of all the films where the oldest character appeared
[requests.get(url).json()['title'] for url in BBY_people.loc[BBY_people['birth_year'].idxmax()]['films']]
['The Empire Strikes Back', 'Return of the Jedi', 'The Phantom Menace', 'Attack of the Clones', 'Revenge of the Sith'] |
We're done! Yoda is the oldest character among characters with a known birth year. And it appeared in four films.