Password protecting entire directories in MacOSX

Every once in a while I have some files in a directory that need a password. I am not looking for a fancy encryption mechanism like PGP… I just want to compress the entire directory, put in a strong password and forget about it. Anyway, there are several applications that do this on the mac store… but none provide the flexibility that I was looking for. Plus, I realized this can be done in a couple of lines in the terminal.

Lets say you have a directory tmp with a bunch of files you want to compress and password protect it. You can go the standard route using the zip utility. Just type the following:

zip storage.zip -R tmp/* -e

This will compress the directory tmp/ and store all the files in the storage.zip. It will also ask you for the password you would like to use. Unfortunately you cannot give the password as an argument. So, you are stuck typing the password every time you want to compress a new directory.

There is a hack you can use to bypass the manually password submission… It involves using expect. This program is a really neat utility. It basically allows you to create interactive dialogues with your terminal programs, which makes task automation a walk in the park. You can check its manual pages here.

Step #1: Go into the mac terminal and find out where the expect utility is located. This is done using the whereais command. Type the following on your terminal window.

whereis expect

In my macosx version, the program expect is located at /usr/bin/expect.
Step #2: Create a file (e.g. protect_directory.sh) with the following lines. Make sure you modify the location of the expect program in the first line of the script.

Step #3: Make sure your script is executable and run it.

chmod u=+rwx protect_directory.sh
./protect_directory.sh file.zip tmp hello

The first argument will be the output filename (file.zip), the second argument is the directory you wish to compress (tmp) and the last argument is the password you wish to use (hello). Neat no? Here is a screenshot of all the steps.





Autonomously crawling through DICE job postings

I am currently working on a book that requires me to search through thousands of job advertisements. For the last couple of days I have been looking at the various websites, collecting data and looking for patterns in employment listings. Even if you are not working on a book, I am sure at some point in time you will be looking for a new job online. I love searching for jobs, and if you don’t love it too… you are probably doing it wrong.
First of all don’t manually search for jobs! It is a waste of time and it will drive you insane. Instead use a scripting language, such as PERL, that mines website databases and outlines the best matches. In fact I wrote a post a few days ago about mining employment postings on craigslist. If you are new to this entire field of data-mining, I recommend the book “mining the social web” by Russell… Nice chap…. Met him at Harvard Square a couple of years ago.



Here I outline the steps I took to extract all job postings from DICE. First of all you have to know how everything is stored in the database. Make any random search on the initial screen (e.g. embedded).

"Embedded" search on dice.com

This particular search generated the following very-long URL… so long that I had to include spaces:

http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&N=0&Hf=0&NUM_PER_PAGE=30&Ntk=JobSearchRanking&Ntx=mode+matchall &AREA_CODES=&AC_COUNTRY=1525&QUICK=1&ZIPCODE=&RADIUS=64.37376 &ZC_COUNTRY=0&COUNTRY=1525&STAT_PROV=0&METRO_AREA=33.78715899%2C-84.39164034&TRAVEL=0&TAXTERM=0&SORTSPEC=0&FRMT=0 &DAYSBACK=30&LOCATION_OPTION=2&FREE_TEXT=embedded&WHERE=

Since this particular search detected 1689 job postings, we just have to change NUM_PER_PAGE=30 from 30 to 1689, in order to see every single job post on a single page. Save that file into your hard-disk in the HTML format. For completeness, here is the file with all 1689 postings I just downloaded. The following PERL script parses the contents of this file and looks for the associated URL for each job posting.

Save the file (e.g. dice.pl) and execute it with the following command:

perl dice.pl embedded_Jobs_at_Dice.html > embedded_url.txt

This will store every single URL, one per line, in the file embedded_url.txt. Once again… for completeness, here is my generated file.

The next step is to download every single job posting onto a separate file. Since, I am a macsox user, I need to download a the contents of each of the URLs from the web via the OS X command line. This is easy accomplished with the following bash script:

On the same directory as the output of the previous PERL script (e.g. embedded_url.txt), save this bash scrip (e.g. download_all_jobs.sh) and execute it with the following commands:

chmod u=+rwx download_all_jobs.sh
./download_all_jobs.sh embedded_url.txt

The compressed outcome of this last step is a file of 27 MBs.

Now that I have all this data, I need to extract the skills required for each advertised position. So, I placed all the compressed files in the sub-directory dice_jobs and ran the following script:

The skill extraction is actually done on the following PERL script (extract_data.pl).

I then feed the extracted data into a mathematica script; a (readable) pdf version of the Mathematica script is here, and the source is here. In this script, I combined all found skills, ignored skills that were required in less than 30 distinct advertisement (e.g. COBOL and Pascal). Below is the resulting piechart.

Most requested skills in embedded computing jobs.

As expected the most sought after skills in “embedded computing” jobs are C,C++ and Linux. Java, mysql and kernel development is also very strong in demand these days. Surprisingly I saw lots of mobile computing and networking skill requests. However the most surprisingly requested skills is databases (mysql)!

Finally, I am aware that I could have done everything on this post on a single PERL script. However, writing a post about a single script would get tedious very quickly. I also wanted to save the outcome of every single step in my hard-disk so I could perform some additional data tests, without having to connect to dice.com each time.



Automated craigslist job search with Perl and Bash

Most of my students are in the job market and after suggesting them websites where they could look for jobs, I took a peak at craigslist. I like craigslist; its a simple, bare-bones website with pure text. However, the search functionality is a bit awkward, and it is hard to find a good match between the candidate skills and a particular job posting. If you are seriously looking at every single “filtered” post, it may still take you over an hour to look for the best skill-to-job matches. So I created two scripts, one in Perl and the other one in Bash, that scavenge all the job postings for skill matches, and create a new webpage with all the appropriate positions and matched skills in a ready to click link. Data mining at its best!

There are some “limitations” of these scripts. First of all they were only tested in macOSX and Linux, however I am sure you can convert them quite easily to Windows. Secondly, I’ve focused all the craigslist searches around New England. You may add other craigslist locations quite easily by following the instructions on the perl script.

This automated job search requires two files: search_jobs.sh and craigslist.pl. Both can be found below, or at my github repository. To run the code place both files in the same directory, and edit the search_jobs.sh, shown below, with a text editor (after emacs, my second favorite text editor is TextWrangler). In this file modify the appropriate keywords that are being assigned to the variable SEARCH_SKILLS. Currently the search skills are the standard qualifications for an engineering graduate.

This script runs with the following command line:

chmod u=+rwx search_jobs.sh
./search_jobs.sh

Running the scripts on a macosx terminal window.

After it is done executing it will create two files engineering.html and finance.html, where the candidate can see his best job matches.

Two html files are created with the best job matches

Generated HTML file with the best job matches


Below is the Perl script that parses the craigslist job postings.


Using a SD card in Mac OS X terminal

Since all Macbook Pros, come with a SD card drive, the other day I started to work on a set of automated scripts that would backup all my data in a SD card. This means I had to learn how to read and write data into a SD card using the Mac OS X terminal.

Step #1- When you insert a card into the system, you need to find out what is the system identifier for that particular SD card. So go to the console and type:

diskutil list

And you will get something like:

In my case, the SD card is identified as disk1.

Step #2- In order for you to write anything directly into the SD card, you need to know where it is mounted. Go to the console and type:

mount

And you will get something like:

In my case, the mount point for the SD card is /Volumes/BEAGLE_BONE.

Step #3- To actually copy something into the SD card, just type any unix command considering the mount point for the SD card as a normal directory. For example:

ls > /Volumes/BEAGLE_BONE/tmp.txt

This command will create a new file (tmp.txt) in the SD card with a listing of all files in the current directory.

Step #4- When you are done, make sure to unmount the SD card by typing, something like:

diskutil umountDisk /dev/disk1

File related Linux bash snippets

Here are some of extremely useful Linux bash snippets I use all the time to parse experimental data from my simulations.


How to extract the top (insert number here) lines from a file
Consider a file named test-file.txt. You can extract the top 14 lines from that file using the following:


How to extract specific lines in a file using regular expressions.
Consider a file, named test-file.txt, with the following lines:
_N37_:0:_N262_:1:_N696_:0
_N37_:0:_N233_:0
_N37_:0:_N263_:0:_N694_:0
_N37_:1:_N113_:0

To extract the lines that have 5 elements we can type:

To extract the other lines, we can simply negate that regular expression:


How to search and replace text on a file

Where string1 is what you are searching for, string2 is what you want it to be replaced with and file.txt is the file you want to perform this operation on.


How to print a specific line number inside a file using a variable

Where $LINENBR is the number of the line you want to print.


How to append 2 files, column by column, keeping particular columns


How to transfer files across computers with ssh
The scp command copies files to a remote Linux system.

To copy files from a remote system to your local system:


How to remove a file extension


How to remove redundant lines inside a file
In the terminal type:


How to delete the last line of a file
In the terminal type:


How to count the number of lines in a file and write that number into another file
In the terminal type:

Alternatively you can also store the number of lines into a shell variable.


How to perform the same operation over several files
In a bash script type:

this will print out all filenames via an echo command, that will have the regular expression FAULTY*


How to write a loop inside a bash file


How to split a string inside a bash file


How to do input parameter error testing
The variable ${CIRCUIT} is the first command line argument (${1})


How to perform a particular operation on each line of a file


How to ensure that two files have the same number of lines

Where file1.txt and file2.txt are the filenames you wish to compare.


How to count the number of characters in file
In this particular example, the character I am counting is the 0.


How to replace characters in a file
This particular command will replace all “:” with the newline character.


How to delete all instances of a particular character from a file
This particular command will delete all “:” from the file longString.txt and it will write it on the file readableString.txt.

Read the rest of this entry »