nodejitsu is awesome

I deployed a nodejs app on heroku over the weekend and thought the process was pretty quick.  This was comparing the app deployment process on heroku with the process I went through with a web app on Dreamhost (it was a python app, with some time spent on the server side configuring things).  It was truly much faster.  Make it work locally, then deploy.  Today I deployed on nodejitsu and it was even smoother.

Basically you just install jitsu, and then cd to your app directory and write `jitsu deploy`.  Thats it.  jitsu writes your package.json file for you, automatically increments the version, and lets you immediately customize the name.  Pretty cool.

On the other hand when I deployed on heroku I wrote out the package.json file by hand and ended up having to change it because heroku didn’t have the version of node or some package I was using (I can’t remember if it was the node, npm, or express version I had to change), but I spent some time going through previous versions until I found one where socket.io, express, and played well together.  Basically trial and error since on my local machine things were working fine.  (The package.json file includes the details of what module dependencies the app has, and what versions are required of each module.)

I went with nodejitsu because I wanted to play with real time interactivity with socketio.  You have to configure socketio on heroku to disable websockets and force longpolling, and I didn’t want to use longpolling.

Anyway, it’s exciting to see companies making the deployment process easier.  For python apps I can use heroku, for node apps I can use nodejitsu.

geostreaming olympics

What are people thinking about the olympics right now, and throughout the world? It is truly an event that brings people all around the world together.

Last weekend I decided to refactor an older project I did- real time geolocated tweets – to something relevant to right now: the olympics.   A few weeks after I posted the twitter mapping project, I found a similar twitter mapping app with a beautiful implementation using the nodejs twitter framework Tuiter and socketio.

I rewrote the backend using Tuiter, and hooked it into the frontend, which used Leafletjs.  You can check out the app here: http://geostreaming-olympics.herokuapp.com/.  The app queries the Twitter API for geolocated tweets and displays the ones that contain “olympics”.  After a few seconds you should see tweets populating the map as they come in.

While working on the app I was trying to figure out the best way to display time.  It was a bit confusing since the tweets were all GMT and sometimes the js Date() could not properly parse it, producing an “Invalid Date” output, so in the end I scrapped that UI element .  During that process I came across this:

which I thought was pretty funny. Below are more pics:

 

I like this application design, which I pulled from the Tuiter example.  In the background the Twitter stream is always running.  Opening the website opens up a websocket to receive the streaming tweets- so no need for unique calls triggered on the client side.  This simplifies any issues related to rate limiting.  The app is deployed on Heroku, which was surprisingly fast to get up and running.

writing a web scraper over HTTP with nodejs

While working on a javascript web app I thought it would be pretty cool to write an API that I supply a URL and it would send back the data I want to scrape from the URL.

The basic flow here is that I encode the URL and include it as part of a get request, and then on the server side I retrieve the page source and scrape from it, sending back the data I want.

In this particular case I have Instagram URLs, and I want to scrape the location of the image.  Here is a quick rundown.

I have this URL for example: http://instagr.am/p/L9RF3SxC9H/

I encode the URL with encodeURIComponent and get “http%3A%2F%2Finstagr.am%2Fp%2FL9RF3SxC9H%2F”

I use that in a GET request to call a node script I wrote on the server.  This script fetches the source of the image and sends it back, in this case http://distilleryimage7.instagram.com/8011e852b81f11e1a8761231381b4856_7.jpg

Note: While writing this post I found a quick way to do this particular case with Instagram without having to scrape the page for its source.  You just append the URL with ‘media/?size=l’, so the above becomes ‘http://instagr.am/p/L9RF3SxC9H/media/?size=l’.  You can use that as your img src and it will render the image.  I’ll continue with the post just as an example to represent the idea, you can adjust the script- changing the jQuery selectors- to pull data specific to the page you want to scrape from.

I’ll just include the route that refers to the scraping.  In this I include the packages express, jsdom, util, request, and fs.  You can install them with npm.

I referred to this awesome post and associated code on using jQuery selectors on the server side to pull particular parts of the page source.

Here is the node js code

In the above, the ‘link’ variable in the http request is the encoded url. I retrieve that link from the express req object, req.params.link, then I used request to fetch the source from that URL, and then use jsdom and jQuery to pull the specific data I want.

Some close up screenshots of real time geolocated tweets

I built a tool yesterday to observe geolocated tweets as they came in on a world map.  Its kind of cool to zoom in on specific cities.  After just 30 seconds I gathered quite a few in New York.  Clicking on a marker reveals the profile pic and the tweet.

And then zooming out I could see clusters along cities in the Northeast.

After several minutes I zoomed out to see the world map, but at that point there were too many tweets and my browser crashed.  This is where selective coarsening would be useful.

You can find the project on github.

Using nodejs event streams and leafletjs to display geotagged tweets on a world map

I thought it would be cool to see tweets come in live using the Twitter Streaming API querying with location.  When there is a location bounding box, all tweets that come in have geocoordinates (though a small fraction are null).

Initially I wanted to focus in on a city- Leaflet maps look incredible when you zoom into a city- but the Twitter Streaming API was taking too long to fetch tweets while testing.  I set the bounding box to the world.  You can change the twitter fetch bounding box, as well as the initial mapping bounding box.

This is my first time using them, but from what I understand nodejs event streams allow you to send chunks of data to the browser from the server, as they come in.  This is pretty cool for real time applications.  I wanted to focus this application on immediate tweets, and right now there is no database.  Whenever you run it, you get whatever is coming in the twitter pipeline.

Take a look at the project here.

UPDATE: Note that you need to edit the config.js file with a twitter name and password because querying the streaming API with location is otherwise forbidden.  If you are using an account that already has many apps that query the API constantly (tweet harvesting for example), then you may experience a lag in rate of fetching.  This should not be an issue for most people and can be easily remedied by creating another twitter account to query the API with.