Converting base-16 roman numbers to arabic numbers (and vice-versa)
Posted: July 2, 2012 Filed under: Programming Languages | Tags: Mac OS X, Python, Unittest Leave a comment »Here is neat python programming challenge.
A hex roman numeral is very much like the standard roman numeral, except with different values. In normal roman numerals, I = 1, V = 5, X = 10 and so on. In hex roman numerals, I = 1, V = 8, X = 16, L = 128, C = 256, D = 2048 and M = 4096. So for example:
VIIII = 8 + 1 + 1 + 1 + 1 = 12
IX = 16 – 1 = 15
XV = 16 + 8 = 24
XL = 128 – 16 = 112
The goal is to write a program in python that converts it in either direction. If given a decimal number, it should return the hex roman numeral version of the number and if given a hex roman numeral, it should return the decimal version of the number.
I started this by creating a program that performs a normal roman to arabic conversion. This wasn’t too hard, especially since python has a ton of neat features such as as dictionaries and solid string parsing methods. Since I am using unittest to test my code, I’ve named this file roman_numerals.py.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
import sys, re def roman_to_arabic(number): """return the roman numeral string representation of integer number""" roman_dict={"I":1,"V":5,"X":10,"L":50,"C":100,"D":500,"M":1000} lst = [ roman_dict[i] for i in list(number) ] for n in xrange(len(lst)-1): if (lst[n]<lst[n+1]): lst[n]=-lst[n] return(sum(lst)) def arabic_to_roman(number): """return the arabic numeral integer representation of roman string number""" units = ("I","II","III","IV","V","VI","VII","VIII","IX","") tens = ("X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC", "") hundreds = ("C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM", "") thousands = ("M", "MM", "MMM", "MMMM","MMMMM","MMMMM","MMMMMM","MMMMMMM","MMMMMMMM","") #not quite sure how the romans dealt with very small or very large #numbers... also not quite sure how they worked with floating points assert(number<=7000) assert(number>0) a=list(str(number)) #string-ify numbers b=a[-1::-1] #reverse order of list conversion="" if (len(b)>0): conversion=units[eval(b[0])-1]+conversion if (len(b)>1): conversion=tens[eval(b[1])-1]+conversion if (len(b)>2): conversion=hundreds[eval(b[2])-1]+conversion if (len(b)>3): conversion=thousands[eval(b[3])-1]+conversion return(conversion) if __name__== '__main__': try: if (re.match("I|V|X|L|D|C|M", sys.argv[1])): print roman_to_arabic(sys.argv[1]) else: print arabic_to_roman(eval(sys.argv[1])) except: print "Error: You either specified an: \n\t-invalid number \n\t-out of [1 to 7000] range \n\t-inexistent number" |
The package unittest provides a great way to test your programs. I love it. You can pretty much run a another script and it will perform all the necessary assertions as it tests the proper package. Here is my unittest code, which I named test_roman_numerals.py.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import unittest, roman_numerals class ProductTestCase(unittest.TestCase): def test_arabic_to_roman(self): self.failUnless("I"==roman_numerals.arabic_to_roman(1)) self.failUnless("III"==roman_numerals.arabic_to_roman(3)) self.failUnless("V"==roman_numerals.arabic_to_roman(5)) self.failUnless("X"==roman_numerals.arabic_to_roman(10)) self.failUnless("XI"==roman_numerals.arabic_to_roman(11)) self.failUnless("VIII"==roman_numerals.arabic_to_roman(8)) self.failUnless("IX"==roman_numerals.arabic_to_roman(9)) self.failUnless("XV"==roman_numerals.arabic_to_roman(15)) self.failUnless("XL"==roman_numerals.arabic_to_roman(40)) self.failUnless("CXV"==roman_numerals.arabic_to_roman(115)) self.failUnless("XLVI"==roman_numerals.arabic_to_roman(46)) self.failUnless("MMXII"==roman_numerals.arabic_to_roman(2012)) def test_roman_to_arabic(self): self.failUnless(1==roman_numerals.roman_to_arabic("I")) self.failUnless(3==roman_numerals.roman_to_arabic("III")) self.failUnless(5==roman_numerals.roman_to_arabic("V")) self.failUnless(10==roman_numerals.roman_to_arabic("X")) self.failUnless(11==roman_numerals.roman_to_arabic("XI")) self.failUnless(8==roman_numerals.roman_to_arabic("VIII")) self.failUnless(9==roman_numerals.roman_to_arabic("IX")) self.failUnless(15==roman_numerals.roman_to_arabic("XV")) self.failUnless(40==roman_numerals.roman_to_arabic("XL")) self.failUnless(115==roman_numerals.roman_to_arabic("CXV")) self.failUnless(46==roman_numerals.roman_to_arabic("XLVI")) self.failUnless(2012==roman_numerals.roman_to_arabic("MMXII")) if __name__== '__main__': unittest.main() |
Here are some screenshots of the program in action: first testing through the command line, and then testing it with unittest.

Taking this base code and making it compatible with base-16 numerals was trivial. All that I needed to do was make a minor modification to the dictionary roman_dict and adding extra elements to the lists units, tens, hundreds and thousands. Of course I had to perform a base conversion with the hex2dec function each time I wanted to access a position in the list.
Here is the code that converts base-16 roman numbers to arabic numbers. I saved this file as roman_numerals_base16.py.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
import sys, re def roman_to_hex_arabic(number): """return the roman numeral string representation of integer number""" roman_dict={"I":1,"V":8,"X":16,"L":128,"C":256,"D":2048,"M":4096} lst = [ roman_dict[i] for i in list(number) ] for n in xrange(len(lst)-1): if (lst[n]<lst[n+1]): lst[n]=-lst[n] return(sum(lst)) def arabic_to_hex_roman(number): """return the arabic numeral integer representation of roman string number""" units = ("I", "II", "III", "IIII", "IIIII", "IIIIII", "IV", "V", "VI", "VII", "VIII", "VIIII", "VIIIII", "VIIIIII", "IX", "") tens = ("X", "XX", "XXX", "XXXX", "XXXXX", "XXXXXX", "XL", "L", "LX", "LXX", "LXXX", "LXXXX", "LXXXXX", "LXXXXXX", "XC", "") hundreds = ("C", "CC", "CCC", "CCCC", "CCCCC", "CCCCCC", "CD", "D", "DC", "DCC", "DCCC", "DCCCC", "DCCCCC", "DCCCCCC", "CM", "") thousands = ("M", "MM", "MMM", "MMMM","MMMMM","MMMMM","MMMMMM","MMMMMMM","MMMMMMMM","") #not quite sure how the romans dealt with very small or very large #numbers... also not quite sure how they worked with floating points assert(number<=7000) assert(number>0) a=list(str(dec2hex(number))) #string-ify numbers b=a[-1::-1] #reverse order of list conversion="" if (len(b)>0): conversion=units[hex2dec(b[0])-1]+conversion if (len(b)>1): conversion=tens[hex2dec(b[1])-1]+conversion if (len(b)>2): conversion=hundreds[hex2dec(b[2])-1]+conversion if (len(b)>3): conversion=thousands[hex2dec(b[3])-1]+conversion return(conversion) def dec2hex(n): """return the hexadecimal string representation of integer n""" return "%X" % n def hex2dec(s): """return the integer value of a hexadecimal string s""" return int(s, 16) if __name__== '__main__': try: if (re.match("I|V|X|L|D|C|M", sys.argv[1])): print roman_to_hex_arabic(sys.argv[1]) else: print arabic_to_hex_roman(eval(sys.argv[1])) except: print "Error: You either specified an: \n\t-invalid number \n\t-out of [1 to 7000] range \n\t-inexistent number" |
And here are my test cases, taken directly from the problem statement and saved as test_roman_numerals_base16.py.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import unittest, roman_numerals_base16 class ProductTestCase(unittest.TestCase): def test_arabic_to_hex_roman(self): self.failUnless("VIIII"==roman_numerals_base16.arabic_to_hex_roman(12)) self.failUnless("IX"==roman_numerals_base16.arabic_to_hex_roman(15)) self.failUnless("XV"==roman_numerals_base16.arabic_to_hex_roman(24)) self.failUnless("XL"==roman_numerals_base16.arabic_to_hex_roman(112)) self.failUnless("XI"==roman_numerals_base16.arabic_to_hex_roman(17)) def test_roman_to_hex_arabic(self): self.failUnless(12==roman_numerals_base16.roman_to_hex_arabic("VIIII")) self.failUnless(15==roman_numerals_base16.roman_to_hex_arabic("IX")) self.failUnless(24==roman_numerals_base16.roman_to_hex_arabic("XV")) self.failUnless(112==roman_numerals_base16.roman_to_hex_arabic("XL")) self.failUnless(17==roman_numerals_base16.roman_to_hex_arabic("XI")) if __name__== '__main__': unittest.main() |
Finally, here is a screenshot of the program in action.
Password protecting entire directories in MacOSX
Posted: June 26, 2012 Filed under: Operating Systems, Programming Languages | Tags: Expect, Mac OS X Leave a comment »Every once in a while I have some files in a directory that need a password. I am not looking for a fancy encryption mechanism like PGP… I just want to compress the entire directory, put in a strong password and forget about it. Anyway, there are several applications that do this on the mac store… but none provide the flexibility that I was looking for. Plus, I realized this can be done in a couple of lines in the terminal.
Lets say you have a directory tmp with a bunch of files you want to compress and password protect it. You can go the standard route using the zip utility. Just type the following:
This will compress the directory tmp/ and store all the files in the storage.zip. It will also ask you for the password you would like to use. Unfortunately you cannot give the password as an argument. So, you are stuck typing the password every time you want to compress a new directory.
There is a hack you can use to bypass the manually password submission… It involves using expect. This program is a really neat utility. It basically allows you to create interactive dialogues with your terminal programs, which makes task automation a walk in the park. You can check its manual pages here.
Step #1: Go into the mac terminal and find out where the expect utility is located. This is done using the whereais command. Type the following on your terminal window.
In my macosx version, the program expect is located at /usr/bin/expect.
Step #2: Create a file (e.g. protect_directory.sh) with the following lines. Make sure you modify the location of the expect program in the first line of the script.
|
1 2 3 4 5 6 7 |
#!/usr/bin/expect -f eval spawn zip [lindex $argv 0].zip -R [lindex $argv 1]/* -e expect "password:" send "[lindex $argv 2]\r" expect "password:" send "[lindex $argv 2]\r" interact |
Step #3: Make sure your script is executable and run it.
./protect_directory.sh file.zip tmp hello
The first argument will be the output filename (file.zip), the second argument is the directory you wish to compress (tmp) and the last argument is the password you wish to use (hello). Neat no? Here is a screenshot of all the steps.
Choose your own adventure… in audio
Posted: May 17, 2012 Filed under: Programming Languages | Tags: Mac OS X, Python Leave a comment »One of the things that bores me the most is driving. Ten minutes into my daily commute and I am checking out my email and reading my twitter feed. Anyway, the other day when I was coming from Boston I had this interesting idea; why not create a choose your own adventure game, which could be played while driving. Instead of reading a book, the book would be read to us by a speech synthesizer. The choices would be done by pressing buttons instead of manually flipping pages. The idea was so neat that I proceeded to create a prototype, so I could send it to a magazine, but the outcome was so bad that I pushed it aside until I find a better technology. For my failed prototype I used the linux open-source flite speech synthesizer and a beagleboard XM. As the driving engine I created a simple script in python that read a particular story and used buttons to control the flow of the adventure. The first problem is that beagleboard requires a pretty clean 5V power supply which is a mess to get in a car. Also, the beagleboard is fairly expensive ($120) to use as a dedicated game engine. Finally, creating a customized OS that is fast enough to launch the a particular application on the beagleboard is not trivial.
Anyway, I still think that audio based interactive entertainment systems have potential, but my technical solution I chose was not the best. Most definitely I will revisit this idea soon with Android phones and tablets. Implementing this in android seems to be very easy, thanks to the IOIO connectors. For those interested, this connector is available for purchase through sparkfun. Regardless, here is a very simple code for my choose your own adventure. It is done in python and if you have a macosx it will read out the entries and choices using the built in speech synthesizer.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
import os #reads the current entry def read_entry(entry_number, filename): f=open(filename,'r') found=False for line in f.readlines(): line=line.strip('\n') if line == '<'+str(entry_number)+'>': found=True if (line == '[choices]') or line == '[end]': found=False if (found==True) and (line!='<'+str(entry_number)+'>'): print line saythis="say %s" % line os.system(saythis) #creates a dictionary with the choices at each entry def read_choices(entry_number, filename): f=open(filename,'r') entry_found=False choice_section=False items={} for line in f.readlines(): line=line.strip('\n') if line == '<'+str(entry_number)+'>': entry_found=True if (entry_found==True) and (choice_section==True) and (line!='[end choices]'): #remove ">" from line line=line.replace('>','') #remove "-" from line line=line.replace('-','') line=line.split('<') x={line[0] : line[1]} items.update(x) if (entry_found==True) and (line=='[choices]'): choice_section=True if (entry_found==True) and (line=='[end choices]'): choice_section=False entry_found=False if (line=='[end]'): entry_found=False return items def ask_choices(choices): saythis="What do you want to do?" print saythis saythis="say %s" % saythis os.system(saythis) i=1 for current_choice in list(choices): saythis='Choice number %d:' % i + '%s' %current_choice print saythis saythis="say %s" % saythis os.system(saythis) i+=1 saythis="What is your choice?" print saythis saythis="say %s" % saythis os.system(saythis) user_choice=input('') key=list(choices)[user_choice-1] #the function return value is next story page return choices.get(key) story_page=1 terminate=False while terminate==False: read_entry(story_page,'test_story.txt') choices=read_choices(story_page,'test_story.txt') #if there are no choices, then we reached the end if len(list(choices))==0: terminate=True else: story_page=ask_choices(choices) |
The story itself (hardwired on the previous file as test_story.txt) is pretty self-explanatory.
You are in the top of a very tall building.
[choices]
- Jump <2>
- Yell <3>
- Do nothing <1>
[end choices]
<2>
You decided to jump… thats too bad.
[end]
<3>
You yell something. No one replies.
[choices]
- Jump <2>
- Do nothing <1>
[end choices]
Finally, here is a screenshot of the program in action.
Enjoy.
Autonomously crawling through DICE job postings
Posted: April 13, 2012 Filed under: Operating Systems, Programming Languages | Tags: Bash, Mac OS X, Mathematica, Perl Leave a comment »I am currently working on a book that requires me to search through thousands of job advertisements. For the last couple of days I have been looking at the various websites, collecting data and looking for patterns in employment listings. Even if you are not working on a book, I am sure at some point in time you will be looking for a new job online. I love searching for jobs, and if you don’t love it too… you are probably doing it wrong.
First of all don’t manually search for jobs! It is a waste of time and it will drive you insane. Instead use a scripting language, such as PERL, that mines website databases and outlines the best matches. In fact I wrote a post a few days ago about mining employment postings on craigslist. If you are new to this entire field of data-mining, I recommend the book “mining the social web” by Russell… Nice chap…. Met him at Harvard Square a couple of years ago.
Here I outline the steps I took to extract all job postings from DICE. First of all you have to know how everything is stored in the database. Make any random search on the initial screen (e.g. embedded).
This particular search generated the following very-long URL… so long that I had to include spaces:
Since this particular search detected 1689 job postings, we just have to change NUM_PER_PAGE=30 from 30 to 1689, in order to see every single job post on a single page. Save that file into your hard-disk in the HTML format. For completeness, here is the file with all 1689 postings I just downloaded. The following PERL script parses the contents of this file and looks for the associated URL for each job posting.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
#neat function that helps catching scripting errors like undefined variables... use strict; #use the generated HTML as an input argument my $input_file=$ARGV[0]; #parse through every single line in the file open (MYFILE, "<$input_file") or die $!; while (<MYFILE>) { chomp; #each job posting starts with a well defined pattern... #search for <div><a href="/jobsearch/servlet/Jo if (/<div><a href=\"\/jobsearch\/servlet\/Jo/) { #...the following line: #print "$_\n"; #...will print for example: #<div><a href="/jobsearch/servlet/JobSearch?op=302&dockey=xml/7/0/70b58be12a872e8268939a525389b927@endecaindex&source=19&FREE_TEXT=embedded&rating=99">ASIC/FPGA Verification Engineer</a></div> #get the job title my $job_title; my @tmp_data; @tmp_data=split(/">|<\/a><\/div>/,$_); $job_title=$tmp_data[1]; #...the following line: #print $job_title . "\n"; #...will print for example: #ASIC/FPGA Verification Engineer @tmp_data=split(/">|div><a href="\//,$_); #...the following line: #print $tmp_data[1] . "\n"; #...will print for example: #jobsearch/servlet/JobSearch?op=302&dockey=xml/7/0/70b58be12a872e8268939a525389b927@endecaindex&source=19&FREE_TEXT=embedded&rating=99 #but that is not the correct URL... instead we want something like: #http://seeker.dice.com/jobsearch/servlet/JobSearch?op=302&dockey=xml/b/4/b4ba4b9ed60baf2cf7a3397f336e451e@endecaindex&source=19&FREE_TEXT=embedded&rating=0 #in essence, we must replace all & with & $tmp_data[1] =~ s/\&/\&/g; $tmp_data[1] =~ s/\/jobsearch\/servlet//g; print "http://seeker.dice.com/" . $tmp_data[1] . "\n"; } } close (MYFILE); |
Save the file (e.g. dice.pl) and execute it with the following command:
This will store every single URL, one per line, in the file embedded_url.txt. Once again… for completeness, here is my generated file.
The next step is to download every single job posting onto a separate file. Since, I am a macsox user, I need to download a the contents of each of the URLs from the web via the OS X command line. This is easy accomplished with the following bash script:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
FILENAME=embedded_url.txt NUMBERLINES=`wc -l < ${FILENAME}` echo ${NUMBERLINES} > xxx.tmp PNUMBERLINES=`perl -n -e '@splitline=split(/ /,$_); $splitline[1]=~s/ //g; print $splitline[0] ."\n"; ' xxx.tmp` rm -f xxx.tmp echo ${PNUMBERLINES} RUN=0 until [ ${RUN} -eq ${PNUMBERLINES} ] do RUN=$(( $RUN + 1 )) LCONTENTS=`sed -n $RUN'p' ${FILENAME}` #echo "line # ${RUN} with contents : ${LCONTENTS}" wget "${LCONTENTS}" done |
On the same directory as the output of the previous PERL script (e.g. embedded_url.txt), save this bash scrip (e.g. download_all_jobs.sh) and execute it with the following commands:
./download_all_jobs.sh embedded_url.txt
The compressed outcome of this last step is a file of 27 MBs.
Now that I have all this data, I need to extract the skills required for each advertised position. So, I placed all the compressed files in the sub-directory dice_jobs and ran the following script:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#!/bin/bash JOBS_DIRECTORY=dice_jobs ls ${JOBS_DIRECTORY}/* > all_jobs.txt #count the number of lines on each file impcount=`wc -l < all_jobs.txt` echo ${impcount} > xxx.tmp NIMPS=`perl -n -e '@splitline=split(/ /,$_); $splitline[1]=~s/ //g; print $splitline[0] ."\n"; ' xxx.tmp`; rm -f xxx.tmp #rename jobs postings into something more readable RUN=1 until [ $RUN -gt ${NIMPS} ] do FNAME=`sed -n $RUN'p' all_jobs.txt` mv ${FNAME} ${JOBS_DIRECTORY}/${RUN}.txt RUN=$(( $RUN + 1 )) done #extract the job title and necessary skills rm -f parsed_job_data.txt RUN=1 until [ $RUN -gt ${NIMPS} ] do FNAME=`sed -n $RUN'p' all_jobs.txt` perl extract_data.pl ${JOBS_DIRECTORY}/${RUN}.txt >> parsed_job_data.txt RUN=$(( $RUN + 1 )) done rm -f all_jobs.txt echo "...parsed job data is @ parsed_job_data.txt" |
The skill extraction is actually done on the following PERL script (extract_data.pl).
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
use strict; #use the generated HTML as an input argument my $input_file=$ARGV[0]; my $next_line_is_job_title=0; my $jobTitle=""; my $next_line_is_area_code=0; my $areaCode=""; my $next_line_is_skills=0; my $skills=""; my $next_line_is_company_name=0; my $companyName=""; #parse through every single line in the file open (MYFILE, "<$input_file") or die $!; while (<MYFILE>) { #chomp; if ($next_line_is_skills==1) { $skills=$_; $skills =~ s/<dd>|<dt>|<\/dt>|<\/dd>| |\t|\n//g; $skills =~ s/\/assets\/images\/detail\/default\/highlite.gif//g; $skills =~ s/<b style="background:url\(//g; $skills =~ s/\); font-weight: bold;">|<\/b>//g; #print "skills=" . $skills . "\n"; $next_line_is_skills=0; } if ($next_line_is_company_name==1) { $companyName=$_; $companyName =~ s/<dd>|<dt>|<\/dt>|<\/dd>| |\t|\n//g; my @splitString=split(/>|</,$companyName); $companyName = $splitString[2]; $next_line_is_company_name=0; #print "companyName = " . $companyName . "\n"; } if ($next_line_is_area_code==1) { $areaCode=$_; $areaCode =~ s/<dd>|<\/dd>|<dt>|<\/dt>| |\t|\n//g; $next_line_is_area_code=0; #print "areaCode=" . $areaCode . "\n"; } if (/<h1 id="jobTitle">/) { my @split_tmp=split(/>|</,$_); $jobTitle=@split_tmp[2]; #print "jobTitle=" . $jobTitle . "\n"; } if (/<dt>Skills:<\/dt>|<dd>Skills<\/dd>/) { $next_line_is_skills=1; } if (/<dt>Company:|<dd>Company/) { $next_line_is_company_name=1; } if (/<dt>Area Code:|<dd>Area Code/) { $next_line_is_area_code=1; } } close (MYFILE); print "$input_file\t$companyName\t$jobTitle\t$areaCode\t$skills\n"; |
I then feed the extracted data into a mathematica script; a (readable) pdf version of the Mathematica script is here, and the source is here. In this script, I combined all found skills, ignored skills that were required in less than 30 distinct advertisement (e.g. COBOL and Pascal). Below is the resulting piechart.
As expected the most sought after skills in “embedded computing” jobs are C,C++ and Linux. Java, mysql and kernel development is also very strong in demand these days. Surprisingly I saw lots of mobile computing and networking skill requests. However the most surprisingly requested skills is databases (mysql)!
Finally, I am aware that I could have done everything on this post on a single PERL script. However, writing a post about a single script would get tedious very quickly. I also wanted to save the outcome of every single step in my hard-disk so I could perform some additional data tests, without having to connect to dice.com each time.
Automated craigslist job search with Perl and Bash
Posted: March 30, 2012 Filed under: Operating Systems, Programming Languages | Tags: Bash, Linux, Mac OS X, Perl Leave a comment »Most of my students are in the job market and after suggesting them websites where they could look for jobs, I took a peak at craigslist. I like craigslist; its a simple, bare-bones website with pure text. However, the search functionality is a bit awkward, and it is hard to find a good match between the candidate skills and a particular job posting. If you are seriously looking at every single “filtered” post, it may still take you over an hour to look for the best skill-to-job matches. So I created two scripts, one in Perl and the other one in Bash, that scavenge all the job postings for skill matches, and create a new webpage with all the appropriate positions and matched skills in a ready to click link. Data mining at its best!
There are some “limitations” of these scripts. First of all they were only tested in macOSX and Linux, however I am sure you can convert them quite easily to Windows. Secondly, I’ve focused all the craigslist searches around New England. You may add other craigslist locations quite easily by following the instructions on the perl script.
This automated job search requires two files: search_jobs.sh and craigslist.pl. Both can be found below, or at my github repository. To run the code place both files in the same directory, and edit the search_jobs.sh, shown below, with a text editor (after emacs, my second favorite text editor is TextWrangler). In this file modify the appropriate keywords that are being assigned to the variable SEARCH_SKILLS. Currently the search skills are the standard qualifications for an engineering graduate.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
#!/bin/bash ################################################################################# #Functions (do not modify anything here) ################################################################################# SEARCH_SKILLS="" function search_jobs { SEARCH_NAME=${1} perl craigslist.pl ${SEARCH_SKILLS}> raw_data.txt #create the header of an html file echo "<html><title>Job Search Results</title><body>" > job_data.html #sort the entire file contents and make sure the best matches are on top sort -t! -n -r -k3 raw_data.txt >> job_data.html #clean up the file perl -p -i -e "s/!/\ \ \ \ \ \ /g" job_data.html #terminate the html file echo "</body></html>" >> job_data.html mv job_data.html ${SEARCH_NAME}.html rm -f raw_data.txt } ################################################################################# #You may modify your skills below ################################################################################# SEARCH_SKILLS="embedded, circuit, transistor, VLSI, firmware, RTOS, kernel, MacOSX, JTAG, oscilloscope, HDL, FPGA, Arduino, MSP430, OMAP3540, micro-controllers, microcontrollers, SVN, programmer, Perl, linux, Mathematica, LabVIEW, schematics, Verilog, VHDL" SEARCH_NAME="engineering" search_jobs ${SEARCH_NAME} SEARCH_SKILLS="quantitative, mathematica, finance, programmer, developer, high-frequency, fpga, microcontroller" SEARCH_NAME="finance" search_jobs ${SEARCH_NAME} |
This script runs with the following command line:
./search_jobs.sh
After it is done executing it will create two files engineering.html and finance.html, where the candidate can see his best job matches.
Below is the Perl script that parses the craigslist job postings.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
#This script fetches the last 2 days new job postings from craigslist that match #a specific criteria and reports the URLs that correspond to that match. #The search criteria comes from the input arguments. The cragislist sites #are hardwired to the New England area. You may change them by manually #altering the variables in Section #3. # #Version 0.2 30/march/2012 #Author: Nuno Alves # ############################################################################# #Section #1 - load libraries ############################################################################# use strict; use POSIX; use LWP::Simple; ############################################################################# #Section #2 - input arguments are your skillsets ############################################################################# my $num_args = $#ARGV + 1; if ($num_args == 1) { print "You must add some skills as arguments\n"; exit; } ############################################################################# #Section #3 - defining variables ############################################################################# #what cragislist sites my @search_site=("http://boston.craigslist.org","http://nh.craigslist.org","http://maine.craigslist.org","http://burlington.craigslist.org","http://westernmass.craigslist.org","http://worcester.craigslist.org"); #type what positions you are looking for (egr = engineering, sof = software) my @positions=("egr","sof","bus","acc"); #this array contains the arguments which are your resume skills my @skills=@ARGV ; ############################################################################# #Section #4 - debug code ############################################################################# #instead of work on every single URL, setting $debug=1, will just scan #two webpages my $debug=0; my @debug_urls=("http://boston.craigslist.org/gbs/egr/2902012136.html","http://boston.craigslist.org/bmw/egr/2929181526.html","http://boston.craigslist.org/gbs/egr/2926742528.html"); ############################################################################# #Section #5 - subroutines for collecting craigslist data ############################################################################# sub collect_job_posting_http { my $url=$_[0]; my $content = get $url; #print $content . "\n"; my @splitcontents=split(/<h4 class=\"ban\"/,$content); my $size_splitcontents=@splitcontents; my @url_data=(); for (my $i=1 ; $i<$size_splitcontents ; $i++) { #just want the last 2 days of postings if ($i<3) { #print "============\n\n\n"; #print $splitcontents[$i] . "\n"; #get all the posting urls for this particular day my @postingdata=split(/<p><a href=\"|\">/,$splitcontents[$i]); for (my $j=0; $j<@postingdata ; $j++) { #print ">>[$j]>>" . $postingdata[$j] . "<<<\n"; if ($postingdata[$j]=~m/^http/) { push(@url_data,$postingdata[$j]); } } } } return(@url_data); } sub extract_date { my @url_data=$_[0]; my @date_data=split(/Date: 2012-|EDT<br>/,$url_data[0]); return("2012-" . $date_data[1]); } ############################################################################# #Section #6 - main program: collecting http data for each job posting ############################################################################# my @urls=(); if ($debug == 0) { for (my $k=0;$k<@search_site;$k++) { for (my $z=0;$z<@positions;$z++) { my $base_url=$search_site[$k]."/".$positions[$z]; my @tmp_data=collect_job_posting_http $base_url; push(@urls,@tmp_data); } } } else { @urls=@debug_urls; } #foreach (@urls) #{ # print $_ . "\n"; #} ############################################################################# #Section #7 - check if each posting matches at least one skill ############################################################################# my @matched_skills=(); my @skill_type=(); my @post_date=(); for (my $i=0 ; $i<@urls ; $i++) { my $url=$urls[$i]; my $content = get $url; my $counter=0; my $date; # print $url . "\n"; # print $content . "\n"; my $skill_type_desc=""; for (my $k=0; $k<@skills ; $k++) { if ($content =~ m/$skills[$k]/i) { $counter++; $skill_type_desc = $skill_type_desc . $skills[$k] . " "; } } push(@matched_skills,$counter); push(@skill_type,$skill_type_desc); push(@post_date,extract_date($content)); } ############################################################################# #Section #8 - print results to the screen ############################################################################# for (my $i=0; $i < @matched_skills ; $i++) { if ($matched_skills[$i]>0) { print "<li><a href=\"$urls[$i]\">site #$i\<\/a\>" . "!" . $post_date[$i] . "!" . $matched_skills[$i] . "!" . $skill_type[$i] . "\n"; } } |
Using a SD card in Mac OS X terminal
Posted: March 11, 2012 Filed under: Operating Systems | Tags: Mac OS X 1 Comment »Since all Macbook Pros, come with a SD card drive, the other day I started to work on a set of automated scripts that would backup all my data in a SD card. This means I had to learn how to read and write data into a SD card using the Mac OS X terminal.
Step #1- When you insert a card into the system, you need to find out what is the system identifier for that particular SD card. So go to the console and type:
And you will get something like:

In my case, the SD card is identified as disk1.
Step #2- In order for you to write anything directly into the SD card, you need to know where it is mounted. Go to the console and type:
And you will get something like:
In my case, the mount point for the SD card is /Volumes/BEAGLE_BONE.
Step #3- To actually copy something into the SD card, just type any unix command considering the mount point for the SD card as a normal directory. For example:
This command will create a new file (tmp.txt) in the SD card with a listing of all files in the current directory.
Step #4- When you are done, make sure to unmount the SD card by typing, something like:












