Hello David,
It did not work, with the module for analysing Apache logs.
But this is the regular expression for your log format:
(\S+)\s+(\S+)\s\[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] \"(.+?)\" (\S+) (\S+) (\S+) (\S+)
You can test it
https://regex101.com/ and as text you paste one line of your log file.
This regular expression can be used in any tool / language.
For loading using PIG you can use:
Code: Select all
raw_log = LOAD 'log.csv' USING org.apache.pig.piggybank.storage.MyRegExLoader('(\\S+)\\s+(\\S+)\\s\\[([^:]+):(\\d+:\\d+:\\d+) ([^\\]]+)\\] \\"(.+?)\\" (\\S+) (\\S+) (\\S+) (\\S+)') AS (remoteAddr, user, date, time, timezone, url, statusCode, bytes: int, code1, code2);
DUMP raw_log;
is the same regular expression but with \ double for escaping.
Now we have Stone steps analyser and Hadoop via Pig working, more that enough for me.
Cheers,
Miguel