geostreaming olympics

What are people thinking about the olympics right now, and throughout the world? It is truly an event that brings people all around the world together.

Last weekend I decided to refactor an older project I did- real time geolocated tweets – to something relevant to right now: the olympics.   A few weeks after I posted the twitter mapping project, I found a similar twitter mapping app with a beautiful implementation using the nodejs twitter framework Tuiter and socketio.

I rewrote the backend using Tuiter, and hooked it into the frontend, which used Leafletjs.  You can check out the app here: http://geostreaming-olympics.herokuapp.com/.  The app queries the Twitter API for geolocated tweets and displays the ones that contain “olympics”.  After a few seconds you should see tweets populating the map as they come in.

While working on the app I was trying to figure out the best way to display time.  It was a bit confusing since the tweets were all GMT and sometimes the js Date() could not properly parse it, producing an “Invalid Date” output, so in the end I scrapped that UI element .  During that process I came across this:

which I thought was pretty funny. Below are more pics:

 

I like this application design, which I pulled from the Tuiter example.  In the background the Twitter stream is always running.  Opening the website opens up a websocket to receive the streaming tweets- so no need for unique calls triggered on the client side.  This simplifies any issues related to rate limiting.  The app is deployed on Heroku, which was surprisingly fast to get up and running.

introducing: right now all around

site example image

right now all around is a way to view recent public instagram posts, with the option to retweet posts you like.  It’s kind of a mash up of elements of twitter, Instagram, and tumblr.

There is no filtering of the posts, it is just a sample of all the images posted most recently. The result is a tool of serendipity and exploration. What I find compelling is the juxtaposition of so many different captured experiences, like people watching on a train but spanning across the world, and in so many different settings.

In some cases people who shared photos only had 100 or 200 followers to see them. Now many more can see, share, and comment on the photos.

Instagram made it easier for people to take beautiful photos. right now all around builds on that enabled creativity by creating a collective image stream. In a way the result is visual poetry of what people are doing, feeling, or wanting to remember.

The time of day plays a roll- if you look at the app at 4 AM EST for instance, you will see more photos from southeast asia.

In the non-mobile version updating elapsed time is presented. Usually as you scroll down, you are looking at older bits of content. In this case, with every API call, new images are brought up. If you refresh, you will only see new images. The app is always looking forward, and there is no memory.

I’ve found it a great tool to see new memes, especially one’s that are subculture specific, that I would normally never come across due to my most frequent information channels.  Last year there was a lot of talk of the filter bubble- the danger of psilo-ing from recommendation algorithms.  With the growth of APIs we have more control to determine our information sharing experiences.  This is in part an exploration of that which I hope to pursue in other contexts and media (for example news and Facebook) as well.

See right now all around.

Take a look at the code.

Technical details: I used this as an opportunity to play with the backbone js framework, building a client side app with JSONP from the Twitter API.  No authentication is required, and the API is limited to 150 requests per hour per IP address, as described in the Twitter API documentation.

 

twitter applications and oauth in python with tweepy and flask

I want to build a larger scale twitter application, which would require a user to authenticate.  I found some great material on the Tweepy documentation on doing this, but there were some parts of the tutorial I was unfamiliar with, so I thought I’d fill in those holes here.  Most python oauth examples I found online assumed you already had the access tokens.  What I wanted was a way to get that ‘authorize’ button.

Why would you want a user to authenticate?

You can make a certain amount of API calls if you authenticate an app in your own account, but if you really want to scale an app up you would consume too many API calls running methods for everyone who uses your app with your own keys.  This way everyone who uses the app generates keys and can use the api to get information related to their own accounts.

Get authenticated

I went with Flask to write the server side stuff since I am most familiar with python, and it was quick to get started.  I chose Tweepy as a python twitter library since it has oauth support.

You can grab the code from this example on github.  Make sure you have both those libraries installed.  The bulk of what I’ll talk about is in the server.py script, since that’s where the authentication occurs.  The static and template directories contain files that I use to render the results from the api object after authentication.

Here is the script, I’ll follow it with some details.

 

First fill out the consumer token stuff.

CONSUMER_TOKEN=’fill this out’

CONSUMER_SECRET=’and this’

CALLBACK_URL = ‘http://localhost:5000/verify’

I chose the callback at localhost:5000 since that’s the default for Flask.  You can get the consumer token and secret from dev.twitter.com after registering an app.

You’ll see in the code I use a couple dicts to save data, session and db.  You can substitute a database to hold the data instead of dictionaries.

After starting the server with >python server.py , if you head into your browser and go to ‘http://localhost:5000/’ you’ll execute the code in the @app.route(‘/’) block.  This is where authentication begins.  I use the consumer token and secret to generate a request token key and secret, which I save for a later step in the session dict. Then you obtain a redirect url from twitter which brings up a screen asking the user if they want to authenticate your app.  In the URL you’ll see the redirect_url.

You can see here I used the same tokens for a previous app, so thats why you see the title has to do with tweet maps.  After the user clicks ‘Authorize app’ they are redirected to “/verify” as specified in the call back url.  I use the request object to get the oauth_verifier, and set the request tokens based on what I saved them as earlier. I then use the verifier received in the returned url to gain access

auth.get_access_token(verifier)

Finally I save the token, secret, and api object in a db dict to access later.  The access token and secret do not expire.

db[‘api’]=api

db[‘access_token_key’]=auth.access_token.key

db[‘access_token_secret’]=auth.access_token.secret

I reroute the app and retrieve the api object from the database (in start()).  Now you can make API calls that would have required authentication.  In future sessions I can instead authenticate with the access token key and secret.

Make something cool with the data

Now for a quick example of what you can do.  Using this api object  I retrieve the user’s latest tweets with tweets=api.user_timeline(), and send that list of objects tweets to render as ‘tweets.html’ with Flask’s render_template function. In the template I simply have

{% for t in tweets %}

<div id=’tweet’>{{ t.text }}</div>

{% endfor %}

So that after the app is authorized you can see, in this case, my most recent tweets.

You can do many other things, and I point you to the Tweepy API documentation to get some ideas.

Some close up screenshots of real time geolocated tweets

I built a tool yesterday to observe geolocated tweets as they came in on a world map.  Its kind of cool to zoom in on specific cities.  After just 30 seconds I gathered quite a few in New York.  Clicking on a marker reveals the profile pic and the tweet.

And then zooming out I could see clusters along cities in the Northeast.

After several minutes I zoomed out to see the world map, but at that point there were too many tweets and my browser crashed.  This is where selective coarsening would be useful.

You can find the project on github.

Using nodejs event streams and leafletjs to display geotagged tweets on a world map

I thought it would be cool to see tweets come in live using the Twitter Streaming API querying with location.  When there is a location bounding box, all tweets that come in have geocoordinates (though a small fraction are null).

Initially I wanted to focus in on a city- Leaflet maps look incredible when you zoom into a city- but the Twitter Streaming API was taking too long to fetch tweets while testing.  I set the bounding box to the world.  You can change the twitter fetch bounding box, as well as the initial mapping bounding box.

This is my first time using them, but from what I understand nodejs event streams allow you to send chunks of data to the browser from the server, as they come in.  This is pretty cool for real time applications.  I wanted to focus this application on immediate tweets, and right now there is no database.  Whenever you run it, you get whatever is coming in the twitter pipeline.

Take a look at the project here.

UPDATE: Note that you need to edit the config.js file with a twitter name and password because querying the streaming API with location is otherwise forbidden.  If you are using an account that already has many apps that query the API constantly (tweet harvesting for example), then you may experience a lag in rate of fetching.  This should not be an issue for most people and can be easily remedied by creating another twitter account to query the API with.

keep using emoticons, they help me train my classifier

Classifying sentiment is a popular topic in natural language processing research, and is also a valuable tool in industry for its applications in understanding what groups think on a broader scale.  One could read text, and determine if it is positive or negative, but for larger corpuses this becomes impractical.  This is where NLP and machine learning comes in handy.

For a good overview of the value of this type of research, check out this O’Reilly Strate talk on the future of NLP.

I tried an approach using Sentiword and movie reviews in the nltk database to classify twitter sentiment. This did not work- perhaps for larger bits of text it would work, but a tweet is too short, and the way language is used is very different between tweets and movie reviews.

Soon I found recent work that used emoticons to classify tweets.  A tweet is labeled positive if it has a happy emoticon like : ) or : D, and negative if it has a sad emoticon.  Here is one paper using this label method with several classifiers (Twitter Sentiment Classification using Distant Supervision; Go, Bhayani, Huang), and another that furthers that research to account for neutral tweets (Twitter as a Corpus for Sentiment Analysis and Opinion Mining; Pak, Paroubek).

What’s kind of cool about this method is that it can work on any language where the emoticons are the same, generally  : ) for happy and : ( for sad.  The method is called distant supervision with noisy labels, since the emoticons are not a completely accurate labeler, but with a ton of training data, which is easy to get with twitter, accuracy can exceed 80% (see papers for details).

It’s kind of funny- at some point using emoticons may have felt juvenile, and perhaps it signified the degradation of language. Now, in this context, emoticons are providing a way to machine learn sentiment, allowing for an understanding of what larger populations of people are thinking.  So keep using emoticons, they are valuable- they help me train my classifier.

#OccupyData Hackathon at the Center for Civic Media

If you were able to catch the Mapping Media Ecosystems talk at the Center for Civic Media a few weeks back, you may recall all the visualizations and model techniques for understanding social systems by analyzing social media.  In particular, the last couple projects outlined, the Web Ecology Project and Social Flow, both focused on Twitter.  Ethan Zuckerman liveblogged the panel (while hosting, wow!) and you can read the notes here: http://www.ethanzuckerman.com/blog/2011/11/07/mapping-media-ecosystems-at-center-for-civic-media/

It was pretty cool to see what we could understand from looking at tweets,  things like:

  • Dynamics between reporters and protestors in the Arab Spring
  • Whether  or not Twitter censors hashtags as trending topics
  • What audiences of NYtimes, Al Jazeera, the Economist, and Fox News are interested in, based on their twitter behavior
  • Information flows of articles on Twitter, and cool ways to visualize them

Afterwards I got in a few conversations with Pablo (visiting scientist at the Center for Civic Media) about having a twitter hacking session where we could learn some of the techniques of analysis and plotting, i.e. a hackathon.

This aspect of having events that are somewhere in-between a talk and a workshop- where people could come and make something has really intrigued me.  I want to see more of it happening- I want to go to talks and get my hands dirty, even if its something relatively simple, or I don’t fully understand it.

This weekend there will be an OccupyData hackathon across several cities.  In Cambridge, there will be a hackathon at the Media Lab.  If you’re interested in participating, or learning more about twitter analysis techniques, swing by.  We’ll have access to tons of tweets, and also people who know how to work with them.  Should be fun!

I have work during the day, but will swing by before to help set up and organize and after to see whats up.  There will also probably be an Occupy Research call to the other cities on Saturday.  Go here:  http://bit.ly/occupyhackathon  for more details as the event develops.

Tomorrow I’m meeting up with Pablo to go over more ideas of facilitating the hackathon.  There have been quite a few in boston- Music Hack Day, the Health Hackathon, the Synth-in, and hackathons on the side at bar camp, so there’s plenty of inspiration and experiences to draw from.

update (12/8/11): The hackathon has been extended to Saturday. Here are some details

This weekend there will be a hackathon as part of a wider Occupy Data hackathon spanning several cities around the world: Utrecht, Cambridge, Los Angeles, and New York.

In Cambridge, we’ll be hosting a session open to the public:

STARTS: Friday December 9th: 9am
ENDS: Saturday December 10th: 9pm
WHERE: E14-240 at the Media Lab

Updated info: http://bit.ly/occupyhackathon
Post: http://civic.mit.edu/event/occupydata-hackathon-data-mining-and-visualization

There will be several datasets released for the hackathon. R-Shief has been collecting tweets related to the occupy movement, and those will become available.

We want to bring together journalists, activists, scientists, designers, and coders together to share ideas and projects, and build things together. This data can be used for:

+scientific insights on social dynamics
+narratives of the movement
+web or mobile applications

This is not the limit. When we come together, the potential of what we can make exceeds that what we thought was possible. There is an incredible wealth of talent around Boston. Please spread the word to others who may be interested.

For inspiration check out other occupy hack projects:

Creative Action Network

Creative Action Network is a global community of artists and designers, harnessing our collective talents for good.


http://occupyhack.com/?page_id=42

http://occupyresearch.wikispaces.com/data+and+visualization

a snapshot of twitter

This is a snapshot of Twitter. Take some time and observe it.

 

Notice all of the languages, mishmashed side by side.  I like this a lot. Much of what I see of the rest of the world is through the news.  Sometimes I travel, and its great.  But seeing this snapshot affects me in a similar way as reading a fiction or seeing a film from another country. It feels more personal, it feels like I’m getting closer to the heart of another culture, and I feel my perspective widen.

Some of these tweets convey so much more than the tweet itself. Its like when you people watch and someone, just by their look, can evoke an entire narrative.

At any given moment, you can get a stream of 1% of all tweets.   I’m used to seeing tweets either in my feed online, or processed in some form to make cool info graphics.  When I opened up these tweets to take a look, the first few I saw were in Japanese.  I didn’t expect that at first- who would? I’m so used to seeing the web in English.  I moved further and continued to see tweets raw, unprocessed, full of characters and languages I didn’t know.  And the tweets I did recognize juxtaposed an array of emotions and styles.

A collective poem, growing, spanning the world.

Sketches: Twitter Network

I’ve been playing around with the Python Twitter API and made a simple network visualization of people I follow. Links are made between people who follow each other.  Surprisingly, its not as trivial as one might imagine to get a list of names of people one follows, if the number of followees is greater than 100.  For ID’s the case is much simpler.  Here is a sketch of my network in 2D, written in Processing. Disentangling a network, well, that’s the fun part.  I’ll post again in the future as I develop better visualizations (maybe even in 3D!).

twitter network

I’m doing this as a way to explore the information flows around me.  In this case, its pretty easy to get the twitter data.  But on a wider scope, once I have a map of where I am getting information, I can be more attentive to the diversity of sources I actively listen to.  This is in efforts to create tools that seek to move beyond information silos that can be created by social software.