Wednesday, November 6, 2013

R installation and usage for data analysis


Introduction R


Today I provide a brief introduction about recent popular R language.
R is very an effective tool for statistics and graphics.
There are existing statistical programs such as SAS, SPSS.
However R is open source and it has a lot of packages.



The site of R is http://r-project.org.
You can download R program, documents, add-on packages, etc.

If you need to search for R, visit http://rseek.org.

Using R for Big-Data

You should consider something when R is used by Hadoop for big data processing.
R works based on a single core and in memory.
In other words, it is not composed of the distributed environment.

Thus some vendors are using R-Hadoop, R-Hive.



R Installation

Installation of R is very simple.
When you get install file from the R project site, you can install any OS like Mac, Windows, and Linux.


Now you can test R programming.


Saturday, November 2, 2013

Real-Time Technology for Big Data



Hadoop is used to process big-data.
However according to the needs for real-time,  there is growing interest in the in-memory technology.

OLTP Database for real-time processing was used in times past.
We need a new approach in order to handle quickly data stream in Big data era.

Google is often processed numerous data in a short period through Dremel.
Let's look at technologies for real-time.

Redis

Redis is BSD-based open source and is an acronym for "Remote Dictionary System".
It classifies as No-SQL database because it has a key-value store.
It is also used for Message Queue and Shared Memory.
So it's sometimes used as a real-time processing.

This is architecture using node.js and redis. (Source: http://simonhampshire.wordpress.com/)

It is similar to memcached in the part of in-memory technology but it has saved the data onto disk unlike memcached.

Apache Kafka


Kafka was made by Linkedin for improving their log and tracking system.
Now it works on Apache Project.

It is Publish/Subscribe messaging system and it is used in conjunction with Hadoop.


I discovered that Netflix processes data log using Kafaka.





Esper

I beleive that the best technology about in-memory and real-time is the CEP(Complex Event Processing).
Oracle and SAP are now focused on CEP technology.

As an open source, Esper has a configuration of EPL such as SQL like script language.


CEP is the technology to filter specific event from numerous event.
It's a real-time processing using event-driven architecture.

Esper has architecture like this.


Sometimes to get the data into Espter used "storm".



Monday, October 14, 2013

Using eclipse shorcut!



Why do we often use shortcut key during programming?
It maybe reduce the time of coding and reduce mistake.

It is not necessary to make use of shortcut, but we have to know essential shortcut.

Blue are often used.  

단축키
설명
F3 Open Declaration: Jump to Declaration of selected class, method, or parameter
[Ctrl + Mouse Click is the same]
F4 Open Type Hierarchy window for selected item
F5 Step Into function in debug
F6 Next step (line by line) in debug
F7 Step out in debug
F8 Skip to next Breakpoint in debug
F11 Debug 
F12 Jump to Editor Window
Ctrl + 1 Open Quick Fix and Quick Assist
Ctrl + 7 Comment / uncomment line or selection ( adds '//' )
Ctrl + . (next)
Ctrl + , (previous)
Jump to next / jump to previous compiler syntax warning or error
Ctrl + Space Opens Content Assist (e.g. show available methods or field names)
Ctrl + T Show / open Quick Type Hierarchy for selected item
Ctrl + O Show code outline / structure
Ctrl + L Jump to Line Number. To hide/show line numbers, press ctrl+F10 and select 'Show Line Numbers'
Ctrl + F Open find and replace dialog
Ctrl + J
Ctrl + Shift + J
Incremental search forward / backward. Type search term after pressing ctrl+j, there is now search window
Ctrl + K
Ctrl + Shift + K
Find previous / find next occurrence of search term (close find window first)
Ctrl + H Search Workspace (Java Search, Task Search, and File Search)
Ctrl + F6 Show list of open Editors. Similar to ctrl+e but switches immediately upon release of ctrl
Ctrl + F7
Ctrl + Shift + F7
Switch forward / backward between views (panels). Useful for switching back and forth between Package Explorer and Editor.
Ctrl + F8
Ctrl + Shift + F8
Switch forward / backward between perspectives
Ctrl + F11 Save and launch application (run)
Ctrl + Shft + F Autoformat all code in Editor using code formatter
Ctrl + Shift + O Remove unnecessary import declare
Ctrl + Alt + H Open Call Hierarchy
Alt + Enter Show and access file properties
Alt + Arrow Left
Alt + Arrow Right
Go to previous / go to next Editor Window
Alt + Shift + R Rename selected element and all references
Alt + Shift + M Extract selection to method

Sunday, October 6, 2013

The Code that remove HTML tag using regular expression



I need some source code that extract text from HTML.
Using regular expression it is very simple.

It removes scripts, style, tags, entities, and whitespace.

 private String getText(String content) {  
      Pattern SCRIPTS = Pattern.compile("<(no)?script[^>]*>.*?</(no)?script>",Pattern.DOTALL);  
      Pattern STYLE = Pattern.compile("<style[^>]*>.*</style>",Pattern.DOTALL);  
      Pattern TAGS = Pattern.compile("<(\"[^\"]*\"|\'[^\']*\'|[^\'\">])*>");  
      Pattern nTAGS = Pattern.compile("<\\w+\\s+[^<]*\\s*>");  
      Pattern ENTITY_REFS = Pattern.compile("&[^;]+;");  
      Pattern WHITESPACE = Pattern.compile("\\s\\s+");  
        
      Matcher m;  
        
      m = SCRIPTS.matcher(content);  
      content = m.replaceAll("");  
      m = STYLE.matcher(content);  
      content = m.replaceAll("");  
      m = TAGS.matcher(content);  
      content = m.replaceAll("");  
      m = ENTITY_REFS.matcher(content);  
      content = m.replaceAll("");  
      m = WHITESPACE.matcher(content);  
      content = m.replaceAll(" ");             
        
      return content;  
 }  

If you are interested, please try to test it.

Monday, September 30, 2013

The Function That Changes Global Time to Local Time in Java

The time zone is the issue when you make global service.

There are four time zone in U.S.
PSF:  Pacific Standard Time
MST: Mountain Standard Time
CST: Central Standard Time
EST: Eastern Standard Time

The function of transformation from UTC/GMT to local time

Sometimes you need to transform from standard time to local time.
You can use the function bellow.

 // Local Time -> UTC/GMT Time  
 public static long convertLocalTimeToUTC(long pv_localDateTime)  
 {  
   long lv_UTCTime = pv_localDateTime;  
     
   TimeZone z = TimeZone.getDefault();  
   //int offset = z.getRawOffset(); // The offset not includes daylight savings time  
   int offset = z.getOffset(pv_localDateTime); // The offset includes daylight savings time  
   lv_UTCTime = pv_localDateTime - offset;  
   return lv_UTCTime;  
 }  
   
 // UTC/GMT Time -> Local Time  
 public static long convertUTCToLocalTime(long pv_UTCDateTime)  
 {  
   long lv_localDateTime = pv_UTCDateTime;  
     
   TimeZone z = TimeZone.getDefault();  
   //int offset = z.getRawOffset(); // The offset not includes daylight savings time  
   int offset = z.getOffset(pv_UTCDateTime); // The offset includes daylight savings time  
     
   lv_localDateTime = pv_UTCDateTime + offset;  
     
   return lv_localDateTime;  
 }  
   

It is very easy to understand.
You carefully check that the parameter with long type is return value from getDate() method in Date format.


Transform from UTC/GMT to local time using Date Format

Actually there is a need to change the form of the type such as "20120713064755".
This is the solution using SimpleDateFormat.

 import java.text.ParseException;  
 import java.text.SimpleDateFormat;  
 import java.util.Date;  
 import java.util.TimeZone;  
   
 public class ConverTimeZone {  
   
      public static void main(String[] args) {  
           // changes from UTC to local time  
           //String utcTime = "20120713184755";  
           String utcTime = "20120713064755";  
             
           String localTime = convertUtcToLocal(utcTime);  
             
           System.out.println("GMT/UTC: " + utcTime + ", local time: " + localTime);  
      }  
   
      /**  
       * the function that changes from UTC to local time   
       * @author xmlmanager  
       * @param utcTime GMT/UTC (format: 20120713064755)  
       * @return String local time   
       */  
      private static String convertUtcToLocal(String utcTime) {  
           String localTime = "";  
             
           // declare date format   
           SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddHHmmss");  
             
           try {  
                // changes from UTC to Date format  
                Date dateUtcTime = dateFormat.parse(utcTime);  
                  
                // changes from UTC Date format to the time of long type  
                long longUtcTime = dateUtcTime.getTime();  
                  
                // calculate the difference through TimeZone (if it is summer time, use getOffset than getRawOffset)  
                TimeZone zone = TimeZone.getDefault();  
                int offset = zone.getOffset(longUtcTime);  
                long longLocalTime = longUtcTime + offset;  
                  
                // changes from local time of long type to Date format  
                Date dateLocalTime = new Date();  
                dateLocalTime.setTime(longLocalTime);  
                  
                // return the string of local time  
                localTime = dateFormat.format(dateLocalTime);  
                  
           } catch (ParseException e) {  
                e.printStackTrace();  
           }             
             
           return localTime;  
      }  
 }  
   

That's all!!

Sunday, September 29, 2013

The Tip of Programming Comment

There are two type of comments in java and C.
"//" is the inline comment and "/* .. */" is the block comment.

Sometimes mixing the two type of comments is very useful.

If you use like this, the first line is comment and second line runs.
/*
This is comment.
/*/
This runs.
//*/

When you add "/" in front of the "/*" at the beginning, it changes.

//*
This runs.
/*/
This is comment.
//*/

I used usually it when I need the test account.