Deliverables
Due date: Friday, April 17
By the end of this assignment, these things will have been done:
- Inside your Github repo named compjour-hw, create a subfolder named: json-quiz/
- Inside json-quiz/ will be five files, named and numbered as
1.py
,2.py
, etc, each corresponding to the problems listed below.
Each problem has a list of tasks (lettered A
through G
, or so) and the expected output. When you run each script, it should generate the exact output as expected, i.e. you will be using print()
calls.
Some of the problems have partial solutions. The first problem is completely solved for you as an example. Copy that into your repo (as json-quiz/1.py
) and run it and see that you get the expected output.
Quick intro to JSON, Dicts, Lists, etc.
Understanding how text turns into data structures is pretty much fundamental to doing anything useful in programming, especially in the journalism/scientific domain.
Lists are how ordered collections of objects (numbers, strings, other data objects) are represented:
[1, 2, 3, 'bingo']
And Dictionaries are how Python represents objects with attributes, i.e. a car is an "object" and that "object" has attributes such as color
, mpg
, and make
:
{"color" : "red", "mpg" : 20.5, "make": "Honda"}
In other words, lists and dictionaries can be used to describe pretty much every real-world object there is. Hence, a lot of programming involves turning real-life data objects into lists and dictionaries, so that they can be processed with code. And that's why we learn these data structures.
Why JSON
While there are several different textual data formats, JSON is the most ubiquitous among modern services, and the most versatile. And just as importantly, in Python (as well as Ruby and JavaScript), the dict (dictionary) and list structures look the same in code as they do in JSON text files.
From Python data object to JSON
For instance, this is a Python dictionary object:
d = {"apples": 20, "pears": 40, "kiwis": 90}
print(d['pears'])
# 40
This is how you write that data structure into a JSON formatted text file, which is referred to as, encoding the data object as JSON.
import json
d = {"apples": 20, "pears": 40, "kiwis": 90}
txt = json.dumps(d, indent = 2)
f = open("/tmp/test.json", "w")
f.write(txt)
f.close()
The resulting text file looks like this (i.e. pretty much identical):
{
"apples": 20,
"pears": 40,
"kiwis": 90
}
From JSON to Python data object
A more frequent situation is that you'll have data in a JSON text file and you want to bring it into your Python program. This is referred to as parsing the JSON text file, or: decoding the JSON text file.
For example, the current Github status is found at this URL:
https://status.github.com/api/status.json
Its contents look something like this:
{"status":"good","last_updated":"2015-04-13T16:36:48Z"}
If we want to write a program that tells us if Github's status is good
or not (i.e. a istheLtrainf—ed.com, but for Github), we have to:
- Fetch the text file from the URL
- Use Python's JSON parser to convert that text into a Python data object (in this case, a dict)
Here's one way to do it:
First, we have to get the contents of that URL (https://status.github.com/api/status.json
) as a text string:
import requests
import json
xfile = requests.get("https://status.github.com/api/status.json")
# xfile is not a text file, but a Requests object, so
# we need this intermediary step:
txt = xfile.text
print(txt)
# {"status":"good","last_updated":"2015-04-13T16:36:48Z"}
At this point, txt
is just text. To make it a dict data object, we use Python's json library to decode (or parse) that text string:
data = json.loads(txt)
print(data['last_updated'])
# 2015-04-13T16:36:48Z
print "Is Github f___ed?"
if data['status'] == 'good':
print("Github is not f___ed")
else:
print( "Github may be f___ed")
Helpful references
Zed Shaw's Learn Python the Hard Way has a good chapter on dictionaries. And on lists (and how to loop through them).
Check out the Codecademy lesson on lists and dictionaries.
Problems
1. Realtime visitors to USA.GOV websites
The data is displayed on analytics.usa.gov
Data URL
http://2015.compjour.org/files/code/json-examples/analyticsgov-realtime.json
Original source: https://analytics.usa.gov/data/live/top-pages-realtime.json
Tasks
A. Print the value of the name
attribute of the top-level object.
B. Print the value of the taken_at
attribute of the top-level object.
C. Print the name
attribute of the meta
object.
D. Print the number of active_visitors
, which is part of the data
attribute.
E. Print a comma-separated list of the top-level object's keys (i.e. attributes).*
Note: For E, you may get a slightly different answer because of the fact that the keys/attributes in a Python dict are inherently unordered. In other words, these two dicts are equivalent:
a = {'apples': 10, 'oranges': 50}
b = {'oranges': 50, 'apples': 10}
print(a == b)
# True
Expected output
A. realtime
B. 2015-04-13T03:20:02.401Z
C. Active Users Right Now
D. 78000
E. name, query, meta, data, totals, taken_at
Answer
import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/analyticsgov-realtime.json"
# fetch the data file
response = requests.get(data_url)
text = response.text
# parse the data
data = json.loads(text)
print('A.', data['name'])
print('B.', data['taken_at'])
print('C.', data['meta']['name'])
print('D.', data['data'][0]['active_visitors'])
print('E.', ', '.join(data.keys()))
2. Facebook Graph Object for The White House
Facebook's documentation of the Graph API
Data URL
http://2015.compjour.org/files/code/json-examples/graph.facebook-whitehouse.json
Original source: https://graph.facebook.com/WhiteHouse
Tasks
A. Print the value of the checkins
attribute of the top-level object.
B. Print the value of the likes
attribute of the top-level object.
C. Print the value of the longitude
attribute of the location
object.
D. Print the value of the name
of the last object inside the category_list
list.
Expected Output
A. 1474208
B. 3405748
C. -77.035704501759
D. Government Organization
3. Google Maps Geocoder Response
This is the API that powers the location data behind Google Maps and many other location-focused apps (such as Uber).
Documentation for the Geocoding API Request Format.
Data URL
http://2015.compjour.org/files/code/json-examples/maps.googleapis-geocode-mcclatchy.json
Original source: https://maps.googleapis.com/maps/api/geocode/json?address=McClatchy+Hall+Stanford,CA
Tasks
Note that the top-level object contains a list of results
; this is because for any given address-lookup, the address given may have been vague enough to require more than one result (see the result for "Paris". For this problem, though, results
contains exactly one object, which I refer to as the "result object".
A. Print the value of the formatted_address
attribute for the result object.
B. For the top-level object, print out the value of the status
attribute.
C. For the result object, print out the location_type
, which is part of the geometry
object.
D. Print the value of the lat
attribute of the location
object that is part of the geometry
object (which is, again, part of the result object)
E. Again, inside the geometry
of the result object, print the value of the southwest
lng
attribute, which is inside the geometry's viewport
object.
F. For the result object, print the long_name
values for the first 2 address_components
, joined (by a commma-and-space).
Expected Output
A. Stanford University, Main Quad, 450 Serra Mall McClatchy Hall, Stanford, CA 94305, USA
B. OK
C. ROOFTOP
D. 37.4283125
E. -122.1708429802915
F. McClatchy Hall, Stanford University
Partial answer
import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/maps.googleapis-geocode-mcclatchy.json"
response = requests.get(data_url)
text = response.text
data = json.loads(text)
result_obj = data['results'][0]
print('A.', result_obj['formatted_address'])
print('C.', result_obj['geometry']['location_type'])
4. Artists Related to Beyoncé according to Spotify
The Spotify music service has an API that will return a list of artists, in descending order of "similarity", that are considered similar to a given artist, in this case, Beyoncé.
The documentation for Spotify's Related Artists endpoint.
The "similarity" between artists is "based on analysis of the Spotify community's listening history". From Spotify's 2010 press release on this feature:
Previously we’ve used genre and artist tagging from AllMusic for related artists which worked well, but did not cover a large portion of our catalogue. What we’ve done now is to go through months and months of listening data and look closely at what people listen to.
This allows us to see that users who listen to a lot of The Rolling Stones, for example, are also big fans of Iggy Pop or The Byrds. The new feature pulls some of this information together to show you a range of related artists in one tab.
Data URL
http://2015.compjour.org/files/code/json-examples/spotify-related-to-beyonce.json
Original source: https://api.spotify.com/v1/artists/6vWDO969PvNqNYHIOW5v0m/related-artists
Tasks
For this problem set, I'm being more colloquial and less code-specific in referring to the values that you need to find.
A. How many related artists are in Spotify's response?
B. What is the name of the 5th most-similar artist to Beyoncé?
C. How many followers does the 12th most-similar artist to Beyoncé have?
D. Print a comma-separated list of the genres
of the artist most related to Beyoncé.
E. What is the URL for the largest image file for the artist that is least-related (at least in this result set) to Beyoncé?
Expected Output
A. 20
B. Ciara
C. 74258
D. pop christmas,r&b,urban contemporary
E. https://i.scdn.co/image/7e8849593bcf16705c62128ed749de0e543c2e4e
Partial answer
import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/spotify-related-to-beyonce.json"
data = json.loads(requests.get(data_url).text)
print('A.', len(data['artists']))
5. A single tweet from the Library of Congress
Passing by one vote, US Senate ratifies a treaty to purchase Alaska #OTD 1867. More: #ChronAm http://t.co/2RXNSMmOkf
— Library of Congress (@librarycongress) April 9, 2015
See the Twitter documentation on get/statuses/show endpoint.
Data URL
http://2015.compjour.org/files/code/json-examples/single-tweet-librarycongress.json
Original source: https://twitter.com/librarycongress/status/586227094285225984
Fetching the data programmatically, using the Tweepy library to handle the authentication:
# assuming that client is an authenticated instance of tweepy.api.API
tweet = client.statuses_lookup([586227094285225984])[0]
tweet_dict = tweet._json
print(json.dumps(tweet_dict, indent = 2))
Tasks
A. Print the timestamp of when the tweet was created at.
B. Print the timestamp of when the account that sent the tweet was created at.
C. Print the text of the tweet.
D. Print the Twitter account name of the tweet's author.
E. Print the id
of the tweet.
F. Print the number of users mentioned
G. Print a comma-separated list of the hashtags used in the tweet.
H. Print a comma-separated list of the displayed URLs in the tweet. Note: your answer should be similar to the answer for G.
, in that if there were 0 or 5 URLs, the code would still work the same.
Expected Output
A. Thu Apr 09 18:00:05 +0000 2015
B. Fri Jun 29 14:23:25 +0000 2007
C. Passing by one vote, US Senate ratifies a treaty to purchase Alaska #OTD 1867. More: #ChronAm http://t.co/2RXNSMmOkf
D. librarycongress
E. 586227094285225984
F. 0
G. OTD,ChronAm
H. go.usa.gov/3YD7V
Partial answer
import requests
import json
data_url = 'http://www.compjour.org/files/code/json-examples/single-tweet-librarycongress.json'
data = json.loads(requests.get(data_url).text)
### For G.
hashtag_objs = data['entities']['hashtags']
hashtag_texts = []
for h in hashtag_objs:
hashtag_texts.append(h['text'])
print('G.', ','.join(hashtag_texts))
# alternatively, you could also use the list comprehension syntax:
# hashtag_texts = [h['text'] for h in data['entities']['hashtags']]
All Solutions
1.
import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/analyticsgov-realtime.json"
# fetch the data file
response = requests.get(data_url)
text = response.text
# parse the data
data = json.loads(text)
print('A.', data['name'])
print('B.', data['taken_at'])
print('C.', data['meta']['name'])
print('D.', data['data'][0]['active_visitors'])
print('E.', ', '.join(data.keys()))
File found at: /files/code/answers/json-quiz/1.py
2.
import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/graph.facebook-whitehouse.json"
response = requests.get(data_url)
data = json.loads(response.text)
print('A.', data['checkins'])
print('B.', data['likes'])
print('C.', data['location']['longitude'])
print('D.', data['category_list'][-1]['name'])
File found at: /files/code/answers/json-quiz/2.py
3.
import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/maps.googleapis-geocode-mcclatchy.json"
response = requests.get(data_url)
data = json.loads(response.text)
obj = data['results'][0]
print('A.', obj['formatted_address'])
print('B.', data['status'])
print('C.', obj['geometry']['location_type'])
print('D.', obj['geometry']['location']['lat'])
print('E.', obj['geometry']['viewport']['southwest']['lng'])
a = obj['address_components'][0]['long_name']
b = obj['address_components'][1]['long_name']
print('F.', a + ', ' + b)
# or, to use list comprehensions:
# print('F.', ', '.join(a['long_name'] for a in obj['address_components'][0:2]))
File found at: /files/code/answers/json-quiz/3.py
4.
import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/spotify-related-to-beyonce.json"
data = json.loads(requests.get(data_url).text)
artists = data['artists']
print('A.', len(artists))
print('B.', artists[4]['name'])
print('C.', artists[11]['followers']['total'])
print('D.', ','.join(artists[0]['genres']))
print('E.', artists[-1]['images'][0]['url'])
# Note: the answer to E depends on that images array being sorted by size
# ...if that weren't the case, we'd have to sort it like so:
# from operator import itemgetter
# images = sorted(artists[-1]['images'], key = itemgetter('width', 'height'))
# print('E.', images[-1]['url'])
File found at: /files/code/answers/json-quiz/4.py
5.
import requests
import json
data_url = 'http://www.compjour.org/files/code/json-examples/single-tweet-librarycongress.json'
data = json.loads(requests.get(data_url).text)
print('A.', data['created_at'])
print('B.', data['user']['created_at'])
print('C.', data['text'])
print('D.', data['user']['screen_name'])
print('E.', data['id'])
print('F.', len(data['entities']['user_mentions']))
### For G.
hashtag_objs = data['entities']['hashtags']
hashtag_texts = []
for h in hashtag_objs:
hashtag_texts.append(h['text'])
print('G.', ','.join(hashtag_texts))
# alternatively, you could also use the list comprehension syntax:
# hashtag_texts = [h['text'] for h in data['entities']['hashtags']]
### For H
urls = data['entities']['urls']
urltxts = []
for u in urls:
urltxts.append(u['display_url'])
print('G.', ','.join(urltxts))
File found at: /files/code/answers/json-quiz/5.py