Monday, July 29, 2013

Amazon EC2 with R, FastRWeb, Shiny server

I have no idea if this Blog gets much traffic these days.  But a few people found the FastRWeb "howto" to be helpful, so I'll try again.

Working with graduate student Xiaofei Wang, we compiled a "howto" for the use of Amazon EC2: using Ubuntu, and installing Apache, R, FastRWeb, and Shiny server.   The write-up and associated scripts can be found at http://www.stat.yale.edu/~jay/EC2/.  All of this information can be found elsewhere in bits and pieces, but we've found it helpful to put this pipeline in writing.

Of course, you could try to rely upon images (AMIs) built by others.  There may be some security risks, though, and it isn't terrible to do it on your own.

Enjoy.

Jay

Friday, June 22, 2012

Update: Rserve/FastRWeb

I just installed Ubuntu 12.04 LTS and Rserve/FastRWeb per my own instructions, below in the blog.  Installing cairo (required for package Cairo) was more work than I remembered, but perhaps I just didn't remember.

However, I had a real gotcha when testing Simon's example1.png.R: the page displayed with a black background, the red points visible but the axes (in black) almost completely invisible.  I was sure it was my fault.  Nope.

Turns out to be an issue with the newest Mozilla, and has an easy fix:

https://addons.mozilla.org/en-US/firefox/addon/old-default-image-style/

I hope that helps someone.

Thursday, October 6, 2011

Setting up FastRWeb/Rserve on Ubuntu

This blog entry documents my recent (successful) attempt to use Simon Urbanek's Rserve and FastRWeb for CGI scripting with R.  This is a working blog entry and will be updated or replaced as needed (last updated 4:15 PM 10/6/2011).

#### Helpful documentation:

    http://rforge.net/FastRWeb/
    http://urbanek.info/research/pub/urbanek-iasc08.pdf
    http://www.rforge.net/Rserve/
    http://cran.r-project.org/web/packages/Rserve/
    (Plus personal communications with Simon, the results
     of which are included in the summary below)

#### The steps used (your configuration probably varies):

0. Ubuntu Linux, 64-bit, Version 10.04 LTS (plus updates).  I did the following steps as root, but will return to security issues below.

1. I did a fresh installation of the apache2 web server.  I noted that the default location of the cgi-bin (used later) is /usr/lib/cgi-bin; yours may vary.  I confirmed that this was up and running and that I could use the toy CGI script foo.cgi placed in the cgi-bin:

    #!/usr/bin/perl
    print "Content-type: text/html\n\n";
    print "<html>Hello World</html>";

To test this I pointed my browser to http://localhost/cgi-bin/foo.cgi; if there are problems, consult your system administrator or do detective work (probably in the log files, /var/log/apache2 on my system).  Do not continue until you have Hello World working!

2. I did a fresh installation of R, version 2.13.2, using the required --enable-R-shlib option to configure.

3. I installed R packages Rserve, Cairo, FastRWeb, and (though not required) XML (this required installing some libxml2... package in Ubuntu, first, but again is NOT required for Rserve/FastRWeb).

4. After installing FastRWeb, I went into the inst directory of the package and ran the install.sh script; this created /var/FastRWeb, used extensively below.

5. I went into /var/FastRWeb/code and examined the files; in a slightly older version of FastRWeb I commented out a few lines, but the current (10/6/2011) version removed that need for me.

6. I fired up R, and per Simon's instructions did the following:

    system.file("cgi-bin", package="FastRWeb")

This revealed the location of a binary called Rcgi.  I copied this into /usr/lib/cgi-bin, and renamed it R (instead of Rcgi).

7. Finally, I created a file /var/FastRWeb/web.R/foo.png.R:

    # foo.png.R:
    run <- function(n=100, ...) {
      n <- as.integer(n)
      p <- WebPlot(800, 600)
      plot(rnorm(n), rnorm(n), pch=19, col=2)
      p
    }

8. I tested it with the URL: http://localhost/cgi-bin/R/foo.png?n=500

#### Security Issues

I have a feeling that if you have a "trusted machine" without user access, the steps above may not technically pose security risks (even as root); but they do not represent good security practices and *would* introduce security risks on shared servers.  For my purposes, I added to the beginning of /var/FastRWeb/code/rserve.conf:

    gid 33
    uid 33

because www-data (uid and gid 33) is the username for my apache2 instances and it seemed like a reasonable choice.  For good measure, I also changed permissions in /var/FastRWeb:

    chown www-data:www-data .
    chown -R www-data:www-data ./*

Finally, I set

    sockmod 0660
    umask 0007

based on Simon's recommendation for further security. To stop Rserve and FastRWeb:

    killall -INT Rserve

Monday, September 26, 2011

The Inaugural "Least Interesting Stat" Award

I hereby give the first award to the Yale Daily News for its sports page caption, Monday, September 26, 2011:

"STAT OF THE DAY 4: THE NUMBER OF YEAR SINCE THE FOOTBALL TEAM HAS SCORED 70 POINTS AFTER THE FIRST TWO GAMES OF THE SEASON.  The Bulldogs have scored 74 points after two weeks, a total that was last matched in 2007, when Yale put up 79 in what would become a 9-1 season."

For a slightly more invigorating use of statistics and Yale football, see my Yale-Harvard graphical exploration.  I need to update it with the last few years of results.

Sunday, September 4, 2011

New York Predictive Analytics Talk

I'll be giving an evening talk at the New York Predictive Analytics World, http://www.predictiveanalyticsworld.com/newyork/2011/.  The rough plan:

This talk will touch upon topics in data analysis, statistics, and computing relating to modern massive data challenges.  How do classical theories in statistical inference and asymptotics translate into statistical practice in the modern world?  What role should complex Bayesian procedures and other cutting-edge methodologies have in the data analyst toolkit? Computationally, how can we manage the data deluge and how is statistical software evolving?   What are the implications for the data analyst?  What are the dangers posed by
addressing these very questions?  I'll suggest possible answers to some of these questions, and hope to spur further debate by posing others.


Wednesday, August 17, 2011

Blogs on Trade and the Environment

http://environment.yale.edu/envirocenter/

This blogging on the Yale Center for Environmental Law & Policy site discusses issues arising from our recent study of linkages between trade and the environment.

Tuesday, August 16, 2011

Fantasy Football 2011

It's that time of year again!  Yesterday I scraped some ranking and points projection data from http://fftoolbox.com.

I was interested in how the projected points declined with rank, across the player positions. The plot, below, helps explain why running backs are selected ahead of wide receivers, for example: the decline in production of wide receivers is much more shallow than for running backs.  You get hurt less (in expectation) by taking lower-ranked wide receivers than you do by taking lower-ranked running backs.  What I'd really like to do is integrate weekly variation into the analysis... but this requires a more substantial data scrape than I had time for.