Week 4


Some notes:

  • The JSON-Part-2 Quiz has been pushed back to next Tuesday
  • You have new assignment, also for Tuesday: Twitterbot Skeleton
  • You will have USAJobs Data assignment that I will post on Thursday, and it will be due on Tuesday. It will look similar to this

Wikipedia’s Women Problem - by James Gleick for the The New York Review of Books. (to be elaborated on later)

Pulitzer stuff


WSJ Medicare News App - Medicare Unmasked: Behind the Numbers

Emails and signals

Random article about digital times of our lives, and the never-ending need for a better filter: How D.C. flacks and reporters manage all that email (or don't)

One of the first things Sarah Corley, the communications director for Rep. Tom Cole of Oklahoma, teaches new press interns is her meticulous email-management system, which involves some 170 folders and subfolders into which she sorts every electronic message she does not immediately delete. Within her folder for media requests, Corley has subfolders for both national media and Oklahoma media, and subfolders within those subfolders for individual publications. Corley aims to leave the office with zero emails in her inbox at the end of every workday.

Think about how a service like GMail uses algorithms to separate spam from legit email, and, more impressively, prioritize the legit emails, so that there are "important" emails – and everything else. Two of the signals they track are almost certainly:

  1. Did the user click on the email to open or otherwise interact with it (other than deleting/sending it to the Spam folder)?
  2. Since the receipt of a given email, how many seconds passed between when the user logged into their email account and then actually clicked on that email?

Ms. Corley's behavior as described not only seems to be a sure recipe to carpal tunnel syndrome, it's also actively thwarting the kind of technology that would make her inbox easier to manage.

Speaking of signals: “It’s not that we control NewsFeed, you control NewsFeed…” Facebook: please stop with this.

Andy Mitchell and Facebook’s weird state of denial about news:

It was difficult to pass a day in Perugia without being reminded of how Facebook is making (usually via its algorithms) news decisions every hour. Someone reminded me of the survey in the US which showed large percentages of respondents quite unaware that Facebook has an adjustable formula which determines what their newsfeed shows. Rasmus Kleis Nielsen mentioned in a presentation the disagreements which temporarily took news from the Danish media company Berlinske off Facebook (at issue was a picture of some hippies in the 1960s frolicking nude in the sea). There was another row in Denmark when Facebook objected to a picture of Michelangelo’s (also nude) statue of David. An editor for the Turkish daily Milliyet reminded me that Facebook has strict rules about how Kurdish flags are seen on its feed in Turkey.

Cool viz

Where Your State Gets Its Money - Nice use of small multiples. And a nice demonstration of when a map doesn't quite need to be a map. What the visualization loses in recognition (none of the states look like what we're used to), it gains in clarity (big land mass states don't dominate the visual narrative).

Seattle Police Will Hire Programmer and Prolific Records Requester Tim Clemans

The Seattle Police Department is taking the unconventional step of bringing a programmer who bombarded it with public records requests in-house. Chief Operating Officer Mike Wagers has led efforts to hire 24-year-old self-taught programmer Tim Clemans—initially, at least, on a three-month trial basis to work on redaction and disclosure of data.

He'll make $22.60 an hour and start on May 6. If all goes well, Clemans will stay on as a full-time staffer.

For much of last year, Clemans was an unknown: an anonymous dude who'd filed a blanket request for virtually all of SPD's electronic data, as well as huge amounts of information from other police departments around the state. One newspaper columnist blasted his tactics as "outlandish” and "gimmicky." The Poulsbo police chief suggested his records requests were profit-motivated and illegitimate.

Records requester Tim Clemans pokes government agencies, then offers to help

Rather than fight the request, Seattle police enlisted his help. If Clemans would stand down, the department would let him try to develop software to help them…After he passed a background check, they turned over a batch of records he could experiment with. His approach is to have police “over-redact” documents and videos to eliminate privacy concerns, then routinely post them online at the end of each shift.


Due on Wednesday

Twitter Bot Memo. You can work with a partner. Don't worry about the code or actual implementation, just write out all the steps. Assume that the medium for receiving/sending messages will be Twitter.

Here's a sample memo.

SF salaries data

San Francisco compensation data: as filed by Michael Morisy of MuckRock

Great example of a public records request, asking for something that has already been released (see the data as released on SF's data portal).

I've put up a Google Spreadsheet version here.

Wikipedia API things: (I'll finish these by Wednesday):