|
|||||
Commands and references ( GNU / linux kernel 2.4.18-3 and
2.4.18-14 )
Linux is a registered trademark of Linus Torvalds The commands with their most common usage are in brackets like this: [ command ]. Don't type the brackets, just what is inside of them. Back to the index page |
|||||
page last modified: 2005-02-04 Can I tell if visitors are finding my site witout a search? Below is a shell script that might help find repeat visitors and / or those who access your site without searching for it. To make it executable type chmod +x and then the name of the file.For example: name the file check.sh then type [ chmod +x check.sh ] To run it type [ ./chech.sh ] Have check.sh in the same directory as the log file you want to process and don't run it as root. It took 2 minutes to process an 8mb log file, and 8 minutes to process a 39mb log file on a 900mhz machine. The input must be a file named combined_log as this is what the script looks for. The output is a file named unique_visits containing a single column of ip addresses that are for the most part unique visitors who have accessed your site without searching for it.If you have several month's worth of log info in one file you can determine if there are unique ip addresses accessing your site repeatedly. This is far from perfect because of the dynamic ip thing, but it does give you a good idea if the info on your site is of use to people for more that just a quick look. Lots of traffic and hits is not necessarily a sign that people are interested in the content of your site. The search engines allow people to find your site by the words and or phrases typed in; but do they like what they see and do they tell other people about your site? This script separates unique ip addresses from the thousands that are logged each day. It also separates hits that do not come from search engines from those that do. You end up with a file named "unique_visits" with ip addresses in a column. If you compare the contents of this file each month using [ sdiff -s ] you can find repeat visitors. By counting the number of lines in the file (gvim will tell you this when you open the file with it) you can tell how many hits you are getting that don't come from search engines. Don't forget to rename the "unique_visits" file before you re-run the script or it will over-write it and you won't know what happened the last time you ran the script. The formatting takes up lots of room because I have not changed the line breaks in the script. It should work if you copy and paste it into a text editor.This works on RedHat with kernel 2.4.20-6. I haven't tried it on others yet. I do not know if it will work on any other system. #!/bin/sh #eliminate search engine referals and zombie hunters. combined_log is the original file egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search #now sort them to eliminate duplicates and put them in order sort -un search > search_sort #do the same with original file sort -un combined_log > combined_log_sort #now get all the ip addresses. only the numbers grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip #get rid of the extra column grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip #remove stuff like browser versions and system versions egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits exit 0 It does not take out the extra column. I do not know if it will work on any other system. #!/bin/sh #eliminate search engine referals and zombie hunters. combined_log is the original file egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search #now sort them to eliminate duplicates and put them in order sort -un search > search_sort #do the same with original file sort -un combined_log > combined_log_sort #now get all the ip addresses. only the numbers grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip #remove stuff like browser versions and system versions egrep -v '(4.4.2.0)|(1.6.3.1)|(1.6.4.0)|(1.3.3.7)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)|(1.305.2.148)|(4.2.8.0)|(4.2.13.0)|(4.4.7.0)|(4.5.0.0)' final_result_ip > unique_visits rm final_result_ip rm search_sort_ip rm combined_log_sort_ip rm search rm search_sort rm combined_log_sort exit 0 |
|||||
Perpetual PC's home page Perpetual PC's link page |