Opener Project Part 1

Table of contents

hello-world.py

A quick demonstration of simple Python 3.x syntax, including how to assign variables, do math, and print things to the screen.

If you don't use the print() function in your program, then the program won't tell you anything. Sometimes, this is just fine.

print("Hello world")
myname = "Dan"
print("My name is", myname)
# Note: probably shouldn't put your real age here if your
# repo is going to be posted online
myage = 42
print("I am", myage, "years old")
print("That is roughly", 42 * 365, "days")
print("Or,", 42 * 365 * 24 * 60 * 60, "seconds" )
        
File found at: /files/code/opener-project/hello-world.py

Relevant reading:

The output:

Hello world
My name is Dan
I am 42 years old
That is roughly 15330 days
Or, 1324512000 seconds

imports-and-setup.py

Relevant reading: LearnPython: Modules and Packages

# the import statement is used to call in other files/libraries of code
import sys
print("I am using Python:", sys.version)

dirname = "data-hold/stuff"
print("I am making a new local directory named", dirname)

import os
# the `makedirs` function takes in options: in this case, the `exist_ok`
# option tells it not to throw an error if the directory exists
os.makedirs(dirname, exist_ok = True)
        
File found at: /files/code/opener-project/imports-and-setup.py

The output:

I am using Python: 3.4.3 |Anaconda 2.2.0 (x86_64)| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)]
I am making a new local directory named data-hold/stuff

stanford-news-download.py

Using the Requests package to fetch and read files.

import requests
dirname = "data-hold/stuff"
print("Downloading the news.stanford.edu homepage")
# fetch the page from the URL
response = requests.get("http://news.stanford.edu/")
# get the "text" attribute of the response
rawhtml = response.text
print("news.stanford.edu has", len(rawhtml), "characters")
# fancy string interpolation
print("Saving the current news.stanford.edu to my %s folder" % dirname)
# open a file and prepare it for writing
f = open(dirname + "/news.stanford.edu.html", "w")
f.write(rawhtml)
f.close()
        
File found at: /files/code/opener-project/stanford-news-download.py

The output:

Downloading the news.stanford.edu homepage
news.stanford.edu has 27422 characters
Saving the current news.stanford.edu to my data-hold/stuff folder

stanford-news-heds.py

This is a simple demonstration of HTML parsing with the BeautifulSoup package.

import bs4
fname = 'data-hold/stuff/news.stanford.edu.html'
f = open(fname).read()
# the BeautifulSoup function turns the raw HTML text into a
# "soup"
soup = bs4.BeautifulSoup(f)
a_string = "The currently downloaded {fname} has {h_count} recent headlines and {img_count} images"
hc = len(soup.select('#more-news h3'))
ic = len(soup.select('img'))
# creating/separating the variables is a matter of style
print(a_string.format(fname = fname, h_count = hc, img_count = ic))

if hc > ic:
    print("There are more headlines")
else:
    print("There are more images")

        
File found at: /files/code/opener-project/stanford-news-heds.py

The output:

The currently downloaded data-hold/stuff/news.stanford.edu.html has 9 recent headlines and 19 images
There are more images

stanford-news-topics.py

import requests
from urllib.parse import urljoin
import bs4

# first, download the news home page at:
# http://news.stanford.edu/news/
home_url = "http://news.stanford.edu/news/"
home_page = requests.get(home_url).text
home_soup = bs4.BeautifulSoup(home_page)
# Then, get the list of topics
topic_links = home_soup.select('.topiclist li a')
print("There are {} topic links total".format(len(topic_links)))

# for each topic link, download the page, and print the first (i.e. latest)
# headline
for link in topic_links:
    # we only want URLs that have /tags as part of the URL
    if link['href'].find('/tags') > -1:
        print("Tagged topic:", link.text, "at URL:", link['href'])
        tag_url = urljoin(home_url, link['href'])
        # now fetch the topic page
        # notice how it's basically the same pattern as the first call
        tpage = requests.get(tag_url).text
        tsoup = bs4.BeautifulSoup(tpage)
        heds = tsoup.select("#main-content .postcard-text")
        # getting the first member of the list
        h = heds[0]
        hed = h.find('h3').text
        print("   Latest headline: {}".format(hed))
        print("")
        
File found at: /files/code/opener-project/stanford-news-topics.py

Sample output:

There are 23 topic links total
Tagged topic: Arts & Creativity at URL: /tags/arts/
   Latest headline: Stanford's Cantor Arts Center presents solo exhibition of Jacob Lawrence's work, Promised Land

Tagged topic: Energy at URL: /tags/energy
   Latest headline: Stanford's Cardinal Cogeneration plant shuts down to make way for SESI

get-er-done.py

import os.path

filenames = [
"hello-world.py",
"imports-and-setup.py",
"stanford-news-download.py",
"stanford-news-heds.py",
"stanford-news-topics.py",
"get-er-done.py",
"fetch-and-unpack-twitter-data.py",
"read-sunlight-csv.py",
"read-twitter-json.py",
"twitter_foo.py",
"twitter_foo_fun.py",
"twitter-tablemaker.py",
"twitter-word-tweets.py"
]


# check to see if directory is named properly
current_path = os.path.dirname(os.path.realpath(__file__))
if current_path.split('/')[-1] != 'opener-project':
    print("Warning: the project directory needs to be named: opener-project")

# check to see if each file exists and if it's at least 100 bytes or bigger
missing_files = []
for fn in filenames:
    if os.path.exists(fn) == False or os.path.getsize(fn) < 100:
        missing_files.append(fn)
        print(fn, "...Unfinished")
    else:
        print(fn, "...Finished!")


# if there are missing files, print out a report
if len(missing_files) > 0:
    print("###################")
    print("{} missing files".format(len(missing_files)))
    for fn in missing_files:
        print(fn)
else:
    print("""
    All done (theoretically...)!
    You should now be able to turn in the assignment by pushing it to Github
    """)
        
File found at: /files/code/opener-project/get-er-done.py

The ideal output:

hello-world.py ...Finished!
imports-and-setup.py ...Finished!
stanford-news-download.py ...Finished!
stanford-news-heds.py ...Finished!
stanford-news-topics.py ...Finished!
get-er-done.py ...Finished!
fetch-and-unpack-twitter-data.py ...Finished!
read-sunlight-csv.py ...Finished!
read-twitter-json.py ...Finished!
twitter_foo.py ...Finished!
twitter_foo_fun.py ...Finished!
twitter-tablemaker.py ...Finished!
twitter-word-tweets.py ...Finished!

    All done (theoretically...)!
    You should now be able to turn in the assignment by pushing it to Github