상세 컨텐츠

본문 제목

자바 스크립트 공부<2> - 정규 표현식을 이용한 검사

html&css&js

by oimb 2018. 8. 9. 14:53

본문



<1> 에 이어서 이번에는 정규 표현식을 이용해 검사를 해보자


1. 정규 표현식


정규 표현식(Reg Exp)은 입력 요소에 데이터가 규칙에 맞게 작성 했는지 판단하는 방식? 이라고 보면된다

(https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp)


이 정규 표현식을 사용하지 않고 데이터를 검열 하는 경우에는 지난번에 본 것과 같이 허점도 있을 수 있고 또 어려울 수 있다.

따라서 정규표현식을 익혀 좀더 쉽게 검사를 할 수있도록 하자.


var 참조변수 = new RegExp(패턴,검색 옵션)

var 참조변수 = /패턴/검색 옵션


형태의 객체를 생성 한후

test() 또는 exec() 매서드를 사용하여 검사를 진행 하면된다. 또 참고로 String 객체의 indexof , replace , split 메서드에서도 쓰일 수 있다는 점을 알고 있자


그러면 정규식 객체에서 쓰이는 것들을 좀더 자세하게 알아보자.


2. 패턴 , flag


패턴은 간단하게 정규식을 이용한 문자열이라고 생각하면된다.


flag는 패턴을 찾는데 있어 추가적인 옵션 이라고 생각하면된다.


g
global match; find all matches rather than stopping after the first match
i
ignore case; if u flag is also enabled, use Unicode case folding
m
multiline; treat beginning and end characters (^ and $) as working over multiple lines (i.e., match the beginning or end of each line (delimited by \n or \r), not only the very beginning or end of the whole input string)
u
Unicode; treat pattern as a sequence of Unicode code points
y
sticky; matches only from the index indicated by the lastIndex property of this regular expression in the target string (and does not attempt to match from any later indexes).
하나 하나  살펴보자
g - 옵션에 없을 경우 일치하는 문자 한개만 찾지만 g를 넣을 경우 모든 문자를 찾는다.
1
2
3
4
5
6
7
8
9
10
11
12
13
/**
 * 
 */
//<![CDATA[
var str = "Html Css Jquery css"
 
var reg1 = /css/i;
document.write(str.replace(reg1,"a"),"<br/>");
 
var reg2 = /css/ig;
document.write(str.replace(reg2, "a"));
 
//]]>
cs

결과:

Html a Jquery css
Html a Jquery a


i - 위의 예에서 보시다시피 찾난 문자가 영문일 경우 대소문자를 구분하지 않는다.


m - 멀티 라인이라고 해서 데이터 행이 바뀌어도 규칙에 맞는 문자를 찾는다


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
 * 
 */
//<![CDATA[
var str2 = " Html Css JavaScript\n";
str2 += "Html Css JavaScript";
 
//
var reg1 = /^(\w|\ )*/ig;
document.write(str2.match(reg1), "<br/>");
 
var reg2 = /^(\w|\ )*/igm;
document.write(str2.match(reg2), "<br/>");
 
//]]>
cs

결과:

Html Css JavaScript
Html Css JavaScript,Html Css JavaScript


u,y 의 쓰임에 대해 좋은 예가 떠오르지 않아서 후에 쓰게되면 올리도록하겠습니다.


3. 정규표현식에 사용되는 문자


먼저, 사용되는 문자들의 종류가 너무 많아 링크를 남기겠습니다.(https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp)


그래도 제 기준으로 많이 쓰는 문자들을 정리 하자면


Character Classes

Character

Meaning

.

(The dot, the decimal point) matches any single character except line terminators:\n\r\u2028 or \u2029.

Inside a character set, the dot loses its special meaning and matches a literal dot.

Note that the m multiline flag doesn't change the dot behavior. So to match a pattern across multiple lines, the character set [^] can be used (if you don't mean an old version of IE, of course), it will match any character including newlines.

For example, /.y/ matches "my" and "ay", but not "yes", in "yes make my day".

\d

Matches any digit (Arabic numeral). Equivalent to [0-9].

For example, /\d/ or /[0-9]/ matches "2" in "B2 is the suite number".

\D

Matches any character that is not a digit (Arabic numeral). Equivalent to [^0-9].

For example, /\D/ or /[^0-9]/ matches "B" in "B2 is the suite number".

\w

Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_].

For example, /\w/ matches "a" in "apple", "5" in "$5.28", and "3" in "3D".

\W

Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_].

For example, /\W/ or /[^A-Za-z0-9_]/ matches "%" in "50%".

\s

Matches a single white space character, including space, tab, form feed, line feed and other Unicode spaces. Equivalent to [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

For example, /\s\w*/ matches " bar" in "foo bar".

\S

Matches a single character other than white space. Equivalent to [^ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

For example, /\S\w*/ matches "foo" in "foo bar".

\t

Matches a horizontal tab.

\r

Matches a carriage return.

\n

Matches a linefeed.

\

For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally.

For example, /b/ matches the character "b". By placing a backslash in front of "b", that is by using /\b/, the character becomes special to mean match a word boundary.

or

For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally.

For example, "*" is a special character that means 0 or more occurrences of the preceding character should be matched; for example, /a*/ means match 0 or more "a"s. To match * literally, precede it with a backslash; for example, /a\*/matches "a*".



Character Sets

Character

Meaning

[xyz]

A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen.

For example, [abcd] is the same as [a-d]. They match the "b" in "brisket" and the "c" in "chop".

[^xyz]

A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen.

For example, [^abc] is the same as [^a-c]. They initially match "o" in "bacon" and "h" in "chop".

Alternation

Character

Meaning

x|y

Matches either x or y.

For example, /green|red/ matches "green" in "green apple" and "red" in "red apple".

Boundaries

Character

Meaning

^

Matches beginning of input. If the multiline flag is set to true, also matches immediately after a line break character.

For example, /^A/ does not match the "A" in "an A", but does match the first "A" in "An A".

$

Matches end of input. If the multiline flag is set to true, also matches immediately before a line break character.

For example, /t$/ does not match the "t" in "eater", but does match it in "eat".

\b

Matches a zero-width word boundary, such as between a letter and a space. (Not to be confused with [\b])

For example, /\bno/ matches the "no" in "at noon"; /ly\b/matches the "ly" in "possibly yesterday".

\B

Matches a zero-width non-word boundary, such as between two letters or between two spaces.

For example, /\Bon/ matches "on" in "at noon", and /ye\B/matches "ye" in "possibly yesterday".

Grouping and back references

Character

Meaning

(x)

Matches x and remembers the match. These are called capturing groups.

For example, /(foo)/ matches and remembers "foo" in "foo bar". 

The capturing groups are numbered according to the order of left parentheses of capturing groups, starting from 1. The matched substring can be recalled from the resulting array's elements [1], ..., [n] or from the predefined RegExpobject's properties $1, ..., $9.

Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).

\n

Where n is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses).

For example, /apple(,)\sorange\1/ matches "apple, orange," in "apple, orange, cherry, peach". A more complete example follows this table.

(?:x)

Matches x but does not remember the match. These are called non-capturing groups. The matched substring can not be recalled from the resulting array's elements [1], ..., [n] or from the predefined RegExp object's properties $1, ..., $9.

Quantifiers

Character

Meaning

x*

Matches the preceding item x 0 or more times.

For example, /bo*/ matches "boooo" in "A ghost booooed" and "b" in "A bird warbled", but nothing in "A goat grunted".

x+

Matches the preceding item x 1 or more times. Equivalent to {1,}.

For example, /a+/ matches the "a" in "candy" and all the "a"'s in "caaaaaaandy".

x*?
x+?

Matches the preceding item x like * and + from above, however the match is the smallest possible match.

For example, /".*?"/ matches '"foo"' in '"foo" "bar"' and does not match '"foo" "bar"' as without the ? behind the *.

x?

Matches the preceding item x 0 or 1 time.

For example, /e?le?/ matches the "el" in "angel" and the "le" in "angle."

If used immediately after any of the quantifiers *+?, or {}, makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times).

x{n}

Where n is a positive integer. Matches exactly n occurrences of the preceding item x.

For example, /a{2}/ doesn't match the "a" in "candy", but it matches all of the "a"'s in "caandy", and the first two "a"'s in "caaandy".

x{n,}

Where n is a positive integer. Matches at least n occurrences of the preceding item x.

For example, /a{2,}/ doesn't match the "a" in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy".

x{n,m}

Where n and m are positive integers. Matches at least n and at most m occurrences of the preceding item x.

For example, /a{1,3}/ matches nothing in "cndy", the "a" in "candy", the two "a"'s in "caandy", and the first three "a"'s in "caaaaaaandy". Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more "a"'s in it.


Assertions

Character

Meaning

x(?=y)

Matches x only if x is followed by y.

For example, /Jack(?=Sprat)/ matches "Jack" only if it is followed by "Sprat".
/Jack(?=Sprat|Frost)/ matches "Jack" only if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" is part of the match results.

x(?!y)

Matches x only if x is not followed by y.

For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point.
/\d+(?!\.)/.exec('3.141') matches "141" but not "3.141".



이 각 옵션들을 연습 할 수 있는 사이트가 있다 (https://regexr.com/)

한번 하나씩 연습해보자


마지막으로 이 정규식 표현을 이용해 간단한 검증?을 만들어 보자


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
 * 
 */
 
//<![CDATA[
var userName = prompt("당신의 이름은""");
var str = "abcd";
var reg1 = /^[가-힣]{2,5}$/;
 
while (true) {
    if (reg1.test(userName)) {
        break;
    }
    alert("이름 입력 형식이 잘못 되었습니다.");
    userName = prompt("당신의 이름은");
}
var userCell = prompt("당신의 휴대폰은?");
var reg2 = /^(010)\d{3,4}\d{4}$/;
while (true) {
    if (reg2.test(userCell)) {
        break;
    }
    alert("휴대폰 입력 형식이 잘못 되었습니다.");
    userCell = prompt("당신의 휴대폰 번호는:");
}
var userEmail = prompt("당신의 이메일 주소는?");
reg3 = /^\w{5,12}@[a-z]{2,10}[\.][a-z]{2,3}[\.]?[a-z]{0,2}$/;
while (true) {
    if (reg3.test(userEmail)) {
        break;
    }
    alert("메일 입력 형식이 잘못 되었습니다.");
    userEmail = prompt("당신의 메일 주소는 :");
}
 
document.write(userName,"<br/>");
document.write(userCell,"<br/>");
document.write(userEmail,"<br/>");
 
//]]>
cs










관련글 더보기