Log Analysis

Discuss and get help to implement a CacheGuard Gateway into your networks
User avatar
david
Posts: 163
Joined: 08 Aug 2015 20:38

Re: Log Analysis

Post by david »

Dear Miguel

Thanks for your comments. I think that we can abandon Webalizer and test only Stone Steps Webalizer and Hadoop/Pig. Have you tried the following configuration with Stone Steps Webalizer (I replaced white spaces by '\t')?

Code: Select all

LogType apache
ApacheLogFormat %h\t%u\t%t\t\"%r\"\t%>s\t%b\t-\t-
I look forward to having your feedback after your tests...

Best Regards,
David Janeway
CacheGuard Technical Team
https://www.cacheguard.com
miguelp
Posts: 46
Joined: 17 Aug 2015 13:06

Re: Log Analysis

Post by miguelp »

Hi,
With the latest version of Stone Steps Webalizer now finally is working. (check their forum)

This is the valid working configuration

LogType apache
ApacheLogFormat %h %u %t \"%r\" %>s %b %- %-
DNSChildren 1

Cheers,
Miguel
User avatar
david
Posts: 163
Joined: 08 Aug 2015 20:38

Re: Log Analysis

Post by david »

Hello Miguel

Good news :-) Thanks for the update.

Do you still think that we need the possibility to have the "|" or "\t" (<TAB>) as a delimiter in log files? Don't you think that Hadoop/Pig should be able to import CacheGuard log files as is? After all CacheGuard log files are very similar to Apache Web server log files and Hadoop/Pig is an Apache project.

Best Regards,
David Janeway
CacheGuard Technical Team
https://www.cacheguard.com
miguelp
Posts: 46
Joined: 17 Aug 2015 13:06

Re: Log Analysis

Post by miguelp »

Hi,
Too philosophical question for me :). (about apache products working togehter).
I will try to write the PIG script for the current format, and see what happens.
Keep you updated
Cheers,
Miguel
miguelp
Posts: 46
Joined: 17 Aug 2015 13:06

Re: Log Analysis

Post by miguelp »

Hi,
You were right, APACHE products talk between each other.

With this command in PIG
raw_logs = LOAD '/user/admin/data/log.csv' USING org.apache.pig.piggybank.storage.apachelog.CommonLogLoader() AS (PCAddr: chararray, user: chararray, time: chararray, request: chararray, status: int, bytes: int, dummy1: chararray, dummy2: chararray);
I can use apachelog.CommonLogLoader, but as expected the date time part causes problems, it´s not an apache common log.

For example this is detected as one field:

,0400] "CONNEC

I will see what else can be done.
Thanks,
Miguel
miguelp
Posts: 46
Joined: 17 Aug 2015 13:06

Re: Log Analysis

Post by miguelp »

Hi,
I found the Illustrate commad, here is easier to see:
(192.168.15.94,justigab,[15/Sep/2015:08:18:51,0400] "CONNEC,sl.gstatic.com:443,HTTP/1.0",200 6201,TCP_MISS,HIER_DIRECT)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| raw_logs | PCAddr:chararray | user:chararray | time:chararray | request:chararray | status:int | bytes:int | dummy1:chararray | dummy2:chararray |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | 192.168.15.94 | justigab | [15/Sep/2015:08:18:51 | 0400] "CONNEC | sl.gstatic.com:443 | HTTP/1.0" | 200 6201 | TCP_MISS |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Cheers,
Miguel
User avatar
david
Posts: 163
Joined: 08 Aug 2015 20:38

Re: Log Analysis

Post by david »

Hi Miguel

Sorry I'm a little confused. Is it working with Hadoop/Pig now or not? And do you still need some upgrade in CG?

Best Regards,
David Janeway
CacheGuard Technical Team
https://www.cacheguard.com
miguelp
Posts: 46
Joined: 17 Aug 2015 13:06

Re: Log Analysis

Post by miguelp »

Hello David,

It did not work, with the module for analysing Apache logs.

But this is the regular expression for your log format:

(\S+)\s+(\S+)\s\[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] \"(.+?)\" (\S+) (\S+) (\S+) (\S+)

You can test it https://regex101.com/ and as text you paste one line of your log file.
This regular expression can be used in any tool / language.

For loading using PIG you can use:

Code: Select all

raw_log = LOAD 'log.csv' USING org.apache.pig.piggybank.storage.MyRegExLoader('(\\S+)\\s+(\\S+)\\s\\[([^:]+):(\\d+:\\d+:\\d+) ([^\\]]+)\\] \\"(.+?)\\" (\\S+) (\\S+) (\\S+) (\\S+)') AS (remoteAddr,  user, date, time, timezone, url, statusCode, bytes: int, code1, code2);
DUMP raw_log;
is the same regular expression but with \ double for escaping.
Now we have Stone steps analyser and Hadoop via Pig working, more that enough for me.

Cheers,
Miguel
User avatar
david
Posts: 163
Joined: 08 Aug 2015 20:38

Re: Log Analysis

Post by david »

Dear Miguel
Thanks for the update and the solutions you proposed. I'm sure they can helps others.
Best Regards
David Janeway
CacheGuard Technical Team
https://www.cacheguard.com
Post Reply