I’m really excited that An Introduction to Machine Learning with Web Data is now available for purchase!
This is a 2 hour and 43 minute instructional video that walks you through basic machine learning algorithms, first theoretically and mathematically, and then with Python example code (which is available here).
This video is an instructional take and builds on the material I covered in my Strange Loop 2010 keynote Machine Learning: A Love Story and the Data Bootcamp I did with Joe Adler, Drew Conway, and Jake Hofman at the Strata Conference in February.
I’d also like to acknowledge the many collaborators, colleagues, and friends who have made definite contributions to my thinking about this material and how best to present it, particularly Chris Wiggins who co-authored A Taxonomy of Data Science and Andrew, Dennis, Jan, Jesse, and Julie, the members of the studio audience for the class (who were amazing).
If you like it, please leave it a good review! As always, questions and comments are welcome here or by e-mail.
This document will step through the process of accessing a Linux server, and several basic commands.
We will access the server via the SSH, or “Secure SHell”, protocol. SSH provides encrypted communication between two terminals. It is an alternative to telnet or rlogin.
First, make sure that you have an SSH client. If you are running Windows, I highly recommend downloading Putty. Click on the top Windows binary and save it to the desktop. You can run Putty.exe directly, without installing.
If you are using OS X, open a terminal and run ‘ssh’.
Type your hostname into the box. You can find your hostname by asking your ISP, but a good bet is that “www.yourdomain.com” will resolve to your server. You may need to request shell access
Click open. You’ll be presented with a warning, because you haven’t connected to that machine before. Approve the connection.
Next, you’ll see a login prompt. Enter your username. These will be assigned in class, but a good guess is your first initial and last name.
Enter your password. Both your username and password are case-sensitive.
You should see a screen like the one to the left. Congratulations! You’re logged into the system.
You’re now looking at an interactive shell prompt. The shell accepts the commands that you type and sends them to the kernel, which executes them. This is how you interact with the operating system.
The prompt currently shows your username, the name of the machine (PVDACDLN-01), and your current directory (~, see below).
By default, our system assigns users to the bash shell. There are many different shells available. In general, they all do the same thing, so most users choose a shell based on what they are most comfortable with.
Tip: By default, OS X uses bash as the shell in a terminal. All of these commands will work nearly identically in OS X.
The remainder of this document is an introduction to Linux commands and a reference.
The Linux filesystem is organized into a heirarchical series of folders, or directories. ‘/’ indicates the root directory, which contains all other files and directories.
pwd – pwd stands for “print working directory”. It echoes the current directory to the screen (this is where, in the filesystem, you currently are).
cd – cd stands for “change directory”. It changes your current working directory to a new directory. If you try to change to a directory that does not exist, you’ll get a No such file or directory error.
Tip: you can type “cd ~” at any time to return to your home directory, or, if you know someone’s username, type “cd ~username” to cd to their home directory.
ls – ls stands for “list”. It prints a list of files and directories in the current working directory to the screen. This version of ls presents directories in blue.
The ls command also accepts various arguments that change the output of the command. For example, the “-a” argument means “display hidden files”, while the “-l” argument means “use long listings”. Command-line switches can be used singly or combined.
Filenames that begin with a “.” are hidden by default. Also note the “.” (current directory) and “..” (parent directory) operators.
clear – The clear command requires no parameters. It clears the terminal and presents a prompt. It’s almost never necessary to do this, but it can be good for mental clarity.
who – The who command requires no parameters. It prints a list of all logged in users to the screen.
top – every program or command that runs on the machine is a process. The top command displays running processes in order of the CPU cycles consumed.
ps – ps stands for “process snapshot”. It prints a list of the current running processes. Use the “-e” argument to see everyone’s processes.
man – man stands for “manual”. It accepts one command name as a parameter. It will less the man page for that command. Use the spacebar to page through it, or type ‘q’ to quit. Try typing ‘man ls‘.
Man pages are generally dense and occassionally uncomprehensible, but they do usually present every possible option for a command.
help – help is a bash command that provides help for builtin commands. It accepts a command name or pattern (partial name) as an argument.
Unfortunately, help only documents certain commands. Try ‘help help‘.
Manipulating Files and Directories
cp – the cp command accepts two parameters. It copies a file to a file with the name specified in the second parameter. Each parameter may include a directory (relative or absolute) and a filename.
Use the -R (recursive) switch to copy entire directories.
mv – the mv, or “move” command, accepts two parameters. It renames the first file to the second filename. Each parameter may include a directory (relative or absolute) and a filename.
The mv command is also used for renaming files.
rm – the rm, or “remove” command, deletes a file.
Resources and Help
Explore these links for additional information and explanations.