Welcome! Scroll down for the latest blog posts, find out a bit about me, or check out my projects.
SMS to e-mail gateway: The SMS doorbell
Over at NYC Resistor, it was getting cold, and we needed a doorbell so visitors wouldn’t be stranded outside when the building was locked. A standard wireless model didn’t work reliably (the space is on the fifth floor, just out of range), so various members generally resorted to writing their phone numbers on a sign on the front door when they were expecting guests.
Since almost everyone has a mobile phone already, and SMS-based solution seemed appropriate. In order to implement this we need two things:
- An SMS shortcode
- A system to notify when the shortcode is triggered
It’s irritating and expensive to acquire your own shortcode, but there are several services that will allow you to use one in exchange for a small fee or advertisements in your messages. TextMarks is my favorite (I used TextMarks for my WhereAmI project). While TextMarks markets their service as a system for mobile mailing lists, they allow you to reserve a keyword and define a behavior (that can include pulling data from a URL!) to occur when that keyword is triggered.
Configuring TextMarks
Sign up for TextMarks and choose a keyword. Configure the keyword to respond with the “First 120 characters on web page”, and point it at the future home of your script (you can always come back and modify this later).
Note the \0 as the value of the msg parameter — this instructs TextMarks to send along any additional message contents as the value of that parameter. That means if someone were to text 41411 “doorbell hi this is hilary”, TextMarks would call the script with the param msg=hi this is hilary. This can be quite useful.
The Script
This script is written in Python, but you can use any scripting language you like. This particular script just sends an e-mail to an account when the ‘doorbell’ is rung, but you could have it do pretty much anything up to and including ringing a real bell (which may be coming soon!).
#!/usr/bin/env python # encoding: utf-8 """ doorbell.py Created by Hilary Mason, feel free to use this code in your own projects. """ import sys, os import smtplib import cgi import cgitb; cgitb.enable() class Doorbell(object): GMAIL_USERNAME = 'YOURGMAILACCOUNT@gmail.com' GMAIL_PASSWORD = 'YOURPASSWORD' def __init__(self, msg): message = """\ From: YOURGMAILACCOUNT@gmail.com To: YOURGMAILACCOUNT@gmail.com Subject: KNOCK KNOCK, someone is at the door! %s """ % msg server = smtplib.SMTP('smtp.gmail.com:587') server.ehlo() server.starttls() server.ehlo() server.login(self.GMAIL_USERNAME, self.GMAIL_PASSWORD) server.sendmail('YOURGMAILACCOUNT@gmail.com', ['YOURGMAILACCOUNT@gmail.com'], message) server.quit() print "You knocked! You can also call us at 347-586-9270. <3, NYC Resistor" if __name__ == '__main__': print "Content-Type: text/plain\n\n" form = cgi.FieldStorage() if 'msg' in form: w = Doorbell(form['msg'].value) else: w = Doorbell('There is an anonymous monkey at the door.')
And that’s it! Provided you have your keyword configured to point at your script, and the script living at an accessible address, you’ll get an e-mail whenever your SMS doorbell is rung and the person who sent the message will get back a cute response confirming their action.
Finally…
This setup can be easily extended such that a message containing ‘doorbell hilary’ could e-mail only me, or be forwarded to my phone.
I’m curious to see if having a remotely accessible ‘doorbell’ will encourage pranksters — we might need to add a password.
January 3, 2010 4 Comments
IgniteNYC: The video!
The video of my IgniteNYC presentation is up, and has gotten a great response!
I’m working on removing the me-specific bits from the code and I’ll be posting it as open-source very soon!
December 24, 2009 4 Comments
IgniteNYC: How to Replace Yourself with a Very Small Shell Script
I recently gave a talk at IgniteNYC on How to Replace Yourself with a Very Small Shell Script.
The Ignite events are a fun blend of performance, technology, and speaking skill. Each presenter gives a five minute talk with twenty slides that auto-advance after 15 seconds.
The title of my talk is a classic geek reference (you can get the t-shirt). I’m very interested in developing automated techniques for handling the massive and growing amounts of information that we all have to deal with. I started with e-mail and twitter, both of which are easy to access programmatically (via IMAP and the Twitter API).
In the talk, I went through several of the simple and successful e-mail management scripts that I’ve developed.
I decided to talk about this project because I’m not sure where this should go next, but I got some great feedback and I’m looking forward to future work on the project!
The slides are below, and the full talk will be online soon.
November 25, 2009 9 Comments
My code is on TV (and so am I)!
FoxNY did a piece featuring me and Diana as hackers who use our technical powers for good, not evil.
There are way too few female technologists on television, and I’m happy to do what I can to show that women kick ass with code! Look for my mischievous I’m-writing-infinite-nested-loops grin in the clip where I’m programming.
If this looks like fun to you, come join us at NYC Resistor (where the segment was filmed!) for Thursday night craft nights or for one of many awesome classes.
November 10, 2009 3 Comments
Yahoo OpenHackNYC: The Del.icio.us Cake
Last weekend Yahoo came to New York for an Open Hack Day, and it was great!
I was invited to speak on a panel on semantic metadata, moderated by Paul Ford (harpers.org) along with Marco Neumann (KONA) and Paul Tarjan (Yahoo/Search Monkey). The panel was a lively discussion, and we got some great questions from the audience.
After the panel, I stayed around to participate in the hack competition. Yahoo! provided a fantastic space, with free-flowing coffee, snacks, comfy chairs and plenty of Yahoo folks and other hackers around to give advice and play foosball with. I teamed up with Diana Eng, Alicia Gibb, and Bill Ward to create the Del.icio.us Cake!
The cake is attached to a laptop via USB. A program running on the laptop accepts a delicious tag and retrieves a list of recent popular sites for that tag from the delicious API. Finally, it iterates through each URL, downloads the page, and computes the sentiment of that page relative to the tag — basically, is the content of the page positive, neutral or negative?
The signal is output to an ardiuno (hidden in the middle of the cake) which turns on the appropriate set of LEDs. There are four sets of LEDs on the cake, one in each quadrant of the delicious logo, one each for positive sentiment, neutral or inconclusive sentiment, and negative sentiment, and, of course, one to let us know that the cake is turned on.
I wrote the sentiment classifiers between around 3am and 6am Saturday morning, so they really were a hack! I trained them on movie reviews data, working with the assumption that 5-star reviews contain positive terms and 1-star reviews contain negative terms. I wouldn’t recommend this approach for a serious attempt at sentiment analysis, but it worked well enough.
We won the food/hardware hack prize, shared with the awesome MakerBot team!
We had a great time creating and presenting the hack. Thanks, Yahoo, and most of all, thanks to Alicia, Bill, and Diana for a really fantastic, silly weekend.
Further coverage:
- Yahoo’s summary of the Open Hack NYC event
- Diana’s writeup for Eyebeam
- CNN.com: Hackers Take Over Times Square
October 17, 2009 3 Comments
Data: first and last names from the US Census
I’ve found myself in need of a name distribution for a few projects recently, so I thought I would post it here so I won’t have to go looking for it again.
The data is available from the US Census Bureau (from 1990 census) here, and I have it here in a friendly MySQL *.sql format (it will create the tables and insert the data). There are three tables: male first names, female first names, and surnames.
I’ve noted several issues in the data that are likely the result of typos, so make sure to do your own validation if your application requires it.
The format is simple:
- the name
- frequency (percentage of people in the sampled population with that name)
- cumulative frequency (as you read down the list, the percentage of total population covered)
- rank
If you want to use this to generate a random name, you can do so very easily with a query like this:
SELECT name FROM ref_census_surnames n ORDER BY (RAND() * (n.freq + .01)) LIMIT 0,1;
Download it here: census_names.tar.gz
October 16, 2009 No Comments
Hadoop World NYC
Yesterday, I attended the first Hadoop World NYC conference. Hadoop is a platform for scalable distributed computing. In essence, it makes analyzing large quantities of data much faster, and analyzing very large quantities of data possible.
Cloudera did a great job organizing the conference, and managed to assemble a diverse set of speakers. The sessions covered everything from academic research to fraud detection to bioinformatics and even helping people fall in love (eHarmony uses Hadoop)!
I’m not going to review every session, but I saw several themes emerging from the content and conversations.
Hadoop is Getting Easier
New integrated UIs like Cloudera Desktop and Karmasphere mean that developers will no longer be required to use a command-line interface to configure and execute Hadoop jobs. IBM’s M2 project hides Hadoop behind a spreadsheet metaphor, making the collection, analysis and visualization of data as easy as using Excel.
This doesn’t just speed up development time, it puts the tools for manipulating the data directly in the hands of the people who need the results, without requiring them to talk to a database programmer.
Hadoop is a Utility
The only organizations that talked about building their own Hadoop clusters are those who deal with very sensitive data (VISA) and those who deal with very very large quantities of data (Yahoo, Facebook, eBay). Organizations with more manageable data sets, such as eHarmony and the New York Times, use EC2 and Amazon’s Elastic Map-Reduce. Amazon, Rackspace, and Softlayer have offerings in this area and were all event sponsors.
Yes, you can turn on a cluster of nodes from your living room in your PJs!
Hadoop Can Talk to Your Existing Systems
Hadoop has an ecosystem of supporting products that allow organizations to adapt their existing infrastructure. Cloudera’s Sqoop (which is just fun to say out loud) is a tool for importing data from SQL databases, HBase is a Hadoop database, and Pig lets you talk to the system in a SQL-like language.
I expect we’ll see more information available in the near future to clarify which systems are more appropriate for which kinds of users (an ecosystem decision tree?).
Hadoop is Changing Things
I heard the phrase “an order of magnitude improvement in speed” so many times that I lost count. Speaking from personal experience, the difference you see in productivity between waiting minutes and hours for results and waiting days is immense. When you can see the answer to a question shortly after you ask it you can preserve the context you need to act on that answer immediately without having to spend the time to figure out why you were asking that question in the first place.
Most of the projects were doing fairly simple analysis over data like web user sessions or transactions. I was intrigued by Deepak Singh’s talk on bioinformatics and genome sequencing (slides) and Jake Hofman’s talk on social network analysis (slides). More and more massive datasets are becoming available and will drive techniques for new analysis. I do wish there had been a talk about Mahout, which is a very promising approach to developing machine learning algorithms on the Hadoop platform.
I left the event more excited about the technology and very enthusiastic about the community. Thanks for a great day!
Update: A few other people have written up their notes and impressions from the event:
- Stephen O’grady posted The View from HadoopWorld
- Deepak Singh’s Post-HadoopWorld Thoughts
- HubSpot Dev Blog has two write-ups, by Dan and Steve
- Atbrox has notes from the morning session and the application session
- Alexander Sicular’s Are You New to Hadoop? Settle in…
- Pete Skomoroch posted his slides and thoughts
October 3, 2009 5 Comments
Do you do human subject research?
Dear friends and colleagues,
Do you do research that involves gathering data from human participants? This can be anything from marketing surveys to psychology experiments to medical science. If so, please take a short (5 to 10 minute) survey:
The results of the survey will help us design a new platform for online human research!
I’m very excited about this project and would very much appreciate your input. If you have colleagues who do this kind of work, please pass it on.
Thank you!
August 29, 2009 2 Comments
My NYC Python Meetup Presentation: Practical Data Analysis in Python
I gave a talk at the NYC Python Meetup on July 29 on Practical Data Analysis in Python.
I tend to use my slides for visual representations of the concepts I’m discussing, so there’s a lot of content that was in the presentation that you unfortunately won’t see here.
The talk starts with the immense opportunities for knowledge derived from data. I spent some time showing data systems ‘in the wild’ along with the appropriate algorithmic vocabulary (for example, amazon.com’s ‘books you might like’ feature is a recommender system).
Once we can describe the problems properly, we can look for tools, and Python has many! Finally, in the fun part of the presentation, I demoed working code that uses NLTK to build a Twitter spam filter with 90% accuracy*.
Please let me know if you have questions or comments.
* I’ll post the code and training data shortly
August 12, 2009 No Comments
My Barcamp Presentation: Have Data? What Now?!
I gave a talk at BarCampNYC4 on Saturday on common data problems and a very light overview of algorithms that address them.
I delivered the majority of the content verbally, by talking through examples of problems and how to solve them, so there’s no guarantee that these slides will make sense, but they might be funny!
Sanford took some excellent notes during the presentation.
There were some very nice comments on twitter.
The discussion was so lively and engaging that I’m planning to expand on this content — I really welcome your suggestions and comments!
June 1, 2009 No Comments





