I purchased a pair of Raspberry Pis (Model B) from ModMyPi to use for my data gathering. My doctoral work requires data from bike-share systems such as Velib’ in Paris. I will be scraping many sites repeatedly for the next couple of years. I wanted a distributed system that could resist theft/fire, hard-drive failure and internet service interruptions while still being affordable. Going with my current host would not have been possible due to my long term capacity requirements of about 40GB and the costs associated with those requirements. I considered using Amazon’s Web Service but don’t want to lose control of my data and I’m not sure the cost is worthwhile compared to using a couple Raspberry Pis (RPi). I wanted to share my process because most existing tutorials ask you to connect your RPi to a monitor/TV for setup. That’s no longer required since SSH is enabled by default. I also desired documenting my steps so that I can exactly replicated my setup for my second RPi.
There are all sorts of instructions online about needing to hook your RPi to a screen for setup. Since my SD cards already have the OS installed I decided to just attach the RPi to my home network and see if I could connect to it using my mac. After connecting the LAN cable and power source, I checked my router’s LAN client list and there was my raspberrypi at IP 192.168.1.42. The IP address will be different for you based on router brand/settings and random IP allocation. From what I’ve read ssh was not typically enabled by default. You were supposed to enable it. This must be a new setting to enable ssh setup directly.
Using terminal I ssh’d into it my RPi:
Accepted the key and logged on with the password raspberry
The authenticity of host '192.168.1.42 (192.168.1.42)' can't be established. RSA key fingerprint is XXXXXXXX. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.1.42' (RSA) to the list of known hosts. email@example.com's password: raspberry Linux raspberrypi 3.2.27+ #250 PREEMPT Thu Oct 18 19:03:02 BST 2012 armv6l The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. NOTICE: the software on this Raspberry Pi has not been fully configured. Please run 'sudo raspi-config' <h3>Blue setup menu</h3> pi@raspberrypi ~ $ sudo raspi-config
In the blue screen menu I expanded the root partition to fill the card. Changed my password. Set my locale/language – this is important even if you just want the default English. I set my timezone to Luxembourg. Having the correct time will be important for my data scraping. It is also important that daylight savings changes be applied and on the correct day. I disabled the booting to desktop option as I will be strictly using the RPis as servers. Finally I performed an update.
So to recapitulate what I did on the blue screen:
I exited the blue menu and rebooted:
pi@raspberrypi ~ $ sudo reboot
This closed my connection. After a minute I proceeded to ssh back into my RPi, enter my new password and perform an update.
i@raspberrypi ~ $ sudo apt-get upgrade
This causes a long list of files to download and then it takes even longer to unpack and replace the old with the new. This takes about 25 minutes.
I received many warnings about locale not being set. After unsuccessfully trying to fix it I came across a solution which turned out to not be a good idea. I edited /etc/environment:
sudo vi /etc/environment
I wrote my locale manually there: LC_ALL=”en_CA.UTF-8″
This actually overrides some settings that can be negative if you have people from different languages using your server.
After reading up it seems ssh on my local machine/mac may be causing the problem. To fix the real source of the problem, which you won’t get if you are using your pi directly rather than sshing, I modified /etc/ssh_config on my mac:
sudo vim /etc/ssh_config
and commented out the 21st line:
# SendEnv LANG LC_*
I actually went and undid my earlier changes in /etc/environment
If at any time you wish to shutdown (turn it off, not rebooting) your RPi use the following command:
sudo shutdown -h now
In order to always shut down with the -h flag I modified the global profile /etc/profile and added a new alias:
alias shutdown='shutdown -h'
I spuriously changed the name of my box from raspberrypi to vegetable:
sudo vi /etc/hostname sudo vi /etc/hosts
In the two files above change the name raspberrypi to your new server name. You should reboot before proceeding.
sudo apt-get install ca-certificates sudo apt-get install git-core sudo wget https://raw.github.com/Hexxeh/rpi-update/master/rpi-update -O /usr/bin/rpi-update && sudo chmod +x /usr/bin/rpi-update sudo rpi-update sudo reboot
It downloads about 40 MB of data and takes 5 minutes to update the firmware. I then rebooted my machine and ssh’d back in.
At this point the RPi is up to date and we can begin customization. I have purchased a pair of powered D-Link DUB-H4 (Ver/Rev. C1) four port hubs and two 2.5″ 250 GB USB powered hard drives to create some servers. These will be connected with some super short/customized length USB cables into a tight package mini-server package.
At this point you need to decide if you will serve dynamic content using PHP or Python (other options exist – but I’ll only cover these two). I will be doing my requests using Python. My server is not meant to be serving pages to the public. I will stick with PHP but the instructions for using the WSGI/python-apache plugin are included below.
Regardless of the customization path you probably want to install apache2 and php5:
sudo apt-get install apache2 php5
This will also install libapache2-mod-php5, a dependency. If you are curious what dependencies exist for a package, such as php5, simply enter the command:
apt-cache depends php
I can test that apache was successful by visiting the IP of my server (192.168.1.42 – yours will be different) with a web browser. This brings up the standard Apache It works! page located in /var/www/index.html.
I would like my pi user to have a web folder. For now let’s just make a directory:
I created a simple HTML page, named index.html, in the public_html folder:
<html> <body> <h1>Pi's home</h2> </body> </html>
I need to enable the user directory module so each user’s ~/public_html folder will be accessible.
sudo a2enmod userdir sudo service apache2 restart
Restarting apache and check the pi user home page (192.168.1.42/~pi/) should be visible.
PHP should be working in the root directory but not for users. To enable php in each user’s public_html directory we also need to modify a file. Open /etc/apache2/mods-available/php5.conf and follow the directions in the file to comment out the last section so it looks like this:
#<IfModule mod_userdir.c> # <Directory /home/*/public_html> # php_admin_value engine Off # </Directory> #</IfModule>
Restart apache and you should have user pages functioning.
If you want to be able to run a python optimized Apache server follow these directions. Using python and the WSGI python-apache module should allow you to serve more requests per second.
sudo apt-get install libapache2-mod-wsgi
This installs the python module WSGI.
Just as we enabled the user directory module, we can now enable the WSGI module. Let’s create a simple page to test it first. Again in my public_html directory I will create a file named index.wsgi with the following contents:
def application(environ, start_response): """"Simplest possible application object""" status = '200 OK' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return ['Hello, World!\n']
We enable WSGI (it may be already):
sudo a2enmod wsgi sudo service apache2 restart
And do some more changes to apache. Change the file /etc/apache2/apache2.conf and add at the bottom:
WSGIRestrictEmbedded On # Duplicate for each user as needed WSGIDaemonProcess pi user=pi home=/home/pi/public_html <Directory /home/pi/public_html> WSGIProcessGroup pi </Directory>
Note the second section. You will need to duplicate this for each user. Replace ‘pi’ with your username. This indicates where WSGI files will be executed from.
Now we need to edit /etc/apache2/sites-available/default and add at the bottom of the file again, but within/above the tag:
<Directory /home/*/public_html> Options Indexes FollowSymLinks MultiViews ExecCGI AddHandler wsgi-script .wsgi Order allow,deny Allow from all </Directory>
Restart apache and take a look at your WSGI file (192.168.1.42/~pi/hello.wsgi).
MySQL or some database type is likely required. I will be using it to keep all my cleaned data easily accessible. I will mainly be accessing the MySQL DB using python so I’ve also installed the python-mysqldb interface.
sudo apt-get install mysql-server mysql-client php5-mysql python-mysqldb
We now have a functioning basic server. The typical required programs have been installed. As it stands the server is limited to the SD card size of 8 GB. This is insufficient for almost all applications. In my next post I’ll explain how I connected my hub, external hard drive and RPi together as well as setting up MySQL and data archives to be located on the external drive.