I previously completed the steps of setting up a Raspberry Pi from an unconfigured box to a stable simple server over the network. In this post I will mount a hard drive and place mysql on it as well as data on a separate partition. We’ll also enable outbound emails, cron jobs as well as a few other bits. The topics covered will be a bit scattered. I should mention that as we dive a little deeper into the OS it’s important to note that this Raspberry Pi is running Raspian, a Debian derivative.
Scraping often needs a good html/xml parser – for Python BeautifulSoup is best resource. If you need to install it for your user account because you don’t have root you can do so. After extracting (gzip -d, tar -xf) BeautifulSoup we can install it to the user account by using the –user flag:
python setup.py install --user
I can download Beautiful soup and install it normally on my RPis:
wget http://www.crummy.com/software/BeautifulSoup/bs4/download/beautifulsoup4-4.1.3.tar.gz gzip -d beautifulsoup4-4.1.3.tar.gz tar -xf beautifulsoup4-4.1.3.tar cd beautifulsoup4-4.1.3/ sudo python setup.py install
I would like my RPi to have a static ip on my router, while I could set the IP to be static by following these directions and editing /etc/network/interfaces. I chose to simply have my router always allocate the same IP to the MAC address of my raspberry pi.
Connecting the RPi to the D-Link DUB-H4 hub was more complex than I would like. The hub’s high powered USB cuts power to the RPi when you plug the RPi port USB (not the power USB) into the back of the hub into the control port. It continually reboots. You have to plug the RPi into one of the other three ports – this may have repercussions if we draw too much power. It’s too bad because it’s a beautiful hub. Next time I’ll buy one of these RPi hubs.
You can mount the external drive to any location you desire. I decided to not be too creative and place it in /mnt:
sudo mkdir /mnt/mysql
I changed user to root for the next step. This required that I first define a password for root. I also formatted the hard drive to ext4 rather than the FAT32 which was installed on the drive.
# set/change root password, as there is none (unable to login) sudo passwd root # switch user to root, enter password su # partitioned the drive by following the sfdisk instructions sfdisk /dev/sda # I created two partitions # One for the mysql data of ~ 80 GB /mnt/mysql # Another ~170 GB, for data located at /home/pi/somepath/data/ # # Formatting partitions # If you need to install the file system type that is not present (I will use ext4) apt-get install ntfs-3g # You can determine which file systems are installed by typing mkfs. # and tab autocomplete to see the list # I formatted my drive as ext4 mkfs.ext4 /dev/sda1 mkfs.ext4 /dev/sda2 # # Then mounted the HDD partitions mount /dev/sda1 /mnt/mysql mount /dev/sda2 /home/pi/somepath/data/ # we can see our mounted drives and file format by typing: mount # let's go back to being pi, exit root exit
We want the partition mounted each time our Pi restarts so we should edit our /etc/fstab file:
sudo vim /etc/fstab # add the following in your fstab file # <file system> <dir> <type> <options> <dump> <pass> /dev/sda1 /mnt/mysql ext4 defaults 0 0 /dev/sda2 /home/pi/somepath/data/ ext4 defaults 0 0
You can read more about formatting the fstab file on Debian’s fstab page.
If you’re curious as to how much space you have, or are using, use:
To see the same info for a specific folder, use:
du -h somefolder
MySQL is currently storing data on the SD card. Surprisingly it’s quite easy to move. We would like to move mysql to the path of the hard drive we just mounted. We first need to stop MySQL:
sudo /etc/init.d/mysql stop
We will give ownership of our mysql destination directory/drive to mysql, move the data from the old location to the new and change some settings.
# make the directory to be the destination for mysql mkdir /mnt/mysql # change owner to of destination mysql directory sudo chown mysql:mysql /mnt/mysql
Move the original mysql data to the new path to the external hard drive.
# need to be root su mv /var/lib/mysql/* /mnt/mysql
We can find the configuration file containing the path by searching for it:
# still as root find / -name my.cnf # result /etc/mysql/my.cnf
I edited my.cnf by changing the path and socket values to /mnt/mysql
# from datadir = /var/lib/mysql # Some tutorials suggest changing it, but I don't see the need #socket = /var/run/mysqld/mysql.sock # # to datadir = /mnt/mysql # I could change the below as well, but again there is no need # Note that the mysql.sock file only exists when mysql is running #socket = /mnt/mysql/mysql.sock
Let’s restart MySQL:
sudo /etc/init.d/mysql start # If something was completed incorrectly you will get an error # You should receive a message saying: [info] Checking for tables which need an upgrade, are corrupt or were not closed cleanly.. # This is normal
To setup mysql we need to log into mysql using root:
mysql -u root -p # has the same password as your server root account # this need not be the case for your account
Welcome to mysql – let’s create another user with some privileges:
CREATE USER 'pi'@'localhost' IDENTIFIED BY 'somepassword'; GRANT ALL PRIVILEGES ON *.* TO 'pi'@'localhost'; # you can always change the password again: SET PASSWORD FOR 'pi'@'localhost' = PASSWORD('new-password-here');
We should now have a functioning user with permission to create a database and tables. If you check the /mnt/mysql directory you should see new directories existing for any new databases you have created. Success.
I need my scraping to reoccur on a regular basis. Let’s see if cron is already running. Here are a few ways of doing it:
# see if crontab, used to load the tables to cron is functioning (debian) # see if cron is running pgrep cron # see if the cron service is running the 'proper' way service cron status
If you encounter difficulties, cron messages are in /var/log/syslog. Check them out:
cat /var/log/syslog | grep -i cron
Inside my cron file I will do two things. State which email to contact in the case of any output/error from a job, and define the reocurrence of the job.
MAILTOfirstname.lastname@example.org' # m h dom mon dow command */10 * * * * /usr/bin/python /home/pi/python_script.py
I discovered that the system default text editor is not what I desire. I set vim as the default by selecting it from the prompted list.
sudo update-alternatives --config editor
If you would like to log on to your RPi without using a password you can generate key pairs for your local machine and RPi. First generate a key on your laptop/desktop:
ssh-keygen -t rsa
Simply hit return when prompted, it will create the .ssh file in your home directory. The second prompt asks for a pass phrase, this isn’t necessary either as this is what our key is mean to replace.
# create a directory on your RPi named .ssh ssh email@example.com mkdir -p .ssh # copy your public key you generated into the directory you just created cat .ssh/id_rsa.pub | ssh firstname.lastname@example.org 'cat >> .ssh/authorized_keys'
You should now be able to ssh into your RPi without using a password from this computer. If you would like to access your RPi from additional computers without using your password you just repeat the steps above but must append the new public key to the .ssh/authorized_keys file.
If you do go ahead with making your server public it’s a good idea to disable password authentication. Passwords can be guessed through brute force. Edit /etc/ssh/sshd_config parameters to the following values:
ChallengeResponseAuthentication no PasswordAuthentication no UsePAM no PubkeyAuthentication yes PermitRootLogin no
Then restart the service:
sudo /etc/init.d/ssh reload
If you get the following errors when restarting the service it is because you are not using sudo.
Could not load host key: /etc/ssh/ssh_host_rsa_key Could not load host key: /etc/ssh/ssh_host_dsa_key Could not load host key: /etc/ssh/ssh_host_ecdsa_key
I want to know if my cron job encounters and error. I can enable email sending by installing ssmpt and configuring a few files.
# update sudo apt-get update # install ssmtp sudo apt-get install ssmtp # edit /etc/ssmtp/ssmtp.conf mailhub=smtp.gmail.com:587 AuthUser=YourGMailUserName@gmail.com AuthPass=YourGMailPassword UseSTARTTLS=YES #optional rewriteDomain=something_other_than_gmail.com
Some further edits:
# edited /etc/ssmtp/revaliases to add: root:root@DOMAINNAME:smtp.gmail.com:587 pi:pi@DOMAINNAME:smtp.gmail.com:587 # replace DOMAINNAME with what you wish # allow all users to send emails sudo chmod 774 /etc/ssmtp/ssmtp.conf # give username a pretty name by editing passwd file sudo vim /etc/passwd # example pi:x:1000:1000:Yummy Pi:/home/pi:/bin/bash
Currently I can only connect to my RPi from within my home network. My home has a dynamic IP address. Without being home I cannot know what IP address my router at home temporarily has. Using a dynamic DNS service is the typical route used to solve this problem. I don’t plan on doing it at this point as I don’t need or plan to access my server externally. I will however gloss over the the steps to get it working.
You will need to install ddclient:
sudo apt-get install ddclient
During the installation you will be prompted for configuration. Your dynamic DNS provider should have some suggested settings.
You will then need to configure your ddclient configuration file (/etc/ddclient.conf) further again following the directions from your dns provider. If you are behind a router it will be important to reset the method of obtaining the IP address:
use=web, web=myip.dnsdynamic.com # get ip from server.
You should also take a look at the /etc/default/ddclient file for deamon settings.
Restart the service and it should be working (hopefully):
Alternatively you could do something similar without ddclient. Simply retrieving your ip regularly using an api or site.
#!/bin/sh EMAILemail@example.com PASSWORD=changeit DOMAIN=user.dnsdynamic.com IP=`curl --silent http://myip.dnsdynamic.com/` curl --silent --user "$EMAIL:$PASSWORD" -k "https://www.dnsdynamic.org/api/?hostname=$DOMAIN&myip=$IP"
Called regularly with cron this could be a good solution.
I plan on appending further steps to setting up a scrapping server when I encounter the need myself.