How to Use Regular Expression in Java That Reads the First Value

Read Time: nine mins Languages:

Regular expressions are a language of their own. When you learn a new programming language, they're this little sub-language that makes no sense at get-go glance. Many times you lot have to read another tutorial, article, or book only to understand the "elementary" pattern described. Today, we'll review eight regular expressions that you should know for your next coding project.

Earlier nosotros start, you might desire to check out some of the regex apps on Envato Market, such as RegEx Extractor. This powerful script lets you extract emails, proxies, IPs, telephone numbers, addresses, HTML tags, URLs, links, dates, etc. Just insert one or multiple regular expressions and sources URLs, and starting time the process.

Extract, scrape, parse, harvest.

Usage Examples

  • Excerpt emails from an old CSV address book.
  • Extract image sources from HTML files.
  • Extract proxies from online websites.
  • Excerpt URL results from Google.

Background Info on Regular Expressions

This is what Wikipedia has to say about them:

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as item characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted past a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that friction match the provided specification.

At present, that doesn't really tell me much nigh the bodily patterns. The regexes I'll exist going over today contains characters such as \w, \southward, \1, and many others that represent something totally dissimilar from what they wait similar.

If you'd similar to acquire a little about regular expressions before you lot continue reading this article, I'd advise watching the Regular Expressions for Dummies screencast series.

The eight regular expressions we'll be going over today will allow you to match a(n): username, countersign, email, hex value (similar #fff or #000), slug, URL, IP address, and an HTML tag. Equally the list goes down, the regular expressions get more and more than disruptive. The pictures for each regex in the start are easy to follow, simply the last iv are more easily understood by reading the explanation.

The key thing to retrieve near regular expressions is that they are about read forwards and backwards at the same time. This sentence volition make more sense when we talk nigh matching HTML tags.

Note: The delimiters used in the regular expressions are forward slashes, "/". Each pattern begins and ends with a delimiter. If a forward slash appears in a regex, nosotros must escape it with a backslash: "\/".


i. Matching a Username

Matching a username Matching a username Matching a username

Pattern:

Description:

We begin by telling the parser to find the beginning of the string (^), followed by whatsoever lowercase letter (a-z), number (0-9), an underscore, or a hyphen. Next, {3,sixteen} makes sure that are at least 3 of those characters, merely no more than 16. Finally, we desire the cease of the string ($).

String that matches:

my-us3r_n4m3

String that doesn't match:

th1s1s-wayt00_l0ngt0beausername (as well long)


2. Matching a Password

Matching a password Matching a password Matching a password

Pattern:

Clarification:

Matching a password is very similar to matching a username. The just divergence is that instead of 3 to 16 messages, numbers, underscores, or hyphens, we want six to xviii of them ({6,18}).

String that matches:

myp4ssw0rd

String that doesn't match:

mypa$$w0rd (contains a dollar sign)


3. Matching a Hex Value

Matching a hex valud Matching a hex valud Matching a hex valud

Pattern:

Description:

Nosotros begin by telling the parser to find the first of the string (^). Next, a number sign is optional because it is followed a question mark. The question mark tells the parser that the preceding character — in this case a number sign — is optional, only to be "greedy" and capture it if it'due south in that location. Adjacent, within the first group (beginning group of parentheses), we tin have two unlike situations. The first is any lowercase letter betwixt a and f or a number vi times. The vertical bar tells united states of america that we tin also have three lowercase messages between a and f or numbers instead. Finally, nosotros want the end of the string ($).

The reason that I put the half dozen grapheme before is that parser will capture a hex value similar #ffffff. If I had reversed it so that the three characters came first, the parser would only pick upward #fff and not the other iii f'due south.

String that matches:

#a3c113

String that doesn't match:

#4d82h4 (contains the letter h)


4. Matching a Slug

Matching a slug Matching a slug Matching a slug

Pattern:

Description:

You will be using this regex if yous ever take to piece of work with mod_rewrite and pretty URL'south. We begin past telling the parser to find the offset of the cord (^), followed past one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the cease of the string ($).

String that matches:

my-title-here

Cord that doesn't match:

my_title_here (contains underscores)


5. Matching an E-mail

Matching an email Matching an email Matching an email

Design:

Description:

We begin by telling the parser to detect the showtime of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot ways any character. Straight later on that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. And so another (escaped) dot, with the extension being two to six letters or dots. I take ii to six because of the country specific TLD's (.ny.united states of america or .co.uk). Finally, we want the terminate of the cord ($).

String that matches:

john@doe.com

String that doesn't match:

john@doe.something (TLD is likewise long)


half dozen. Matching a URL

Matching a url Matching a url Matching a url

Pattern:

Description:

This regex is almost like taking the ending part of the to a higher place regex, slapping it between "http://" and some file structure at the end. It sounds a lot simpler than it really is. To beginning off, we search for the commencement of the line with the caret.

The start capturing grouping is all option. Information technology allows the URL to begin with "http://", "https://", or neither of them. I have a question mark afterward the s to let URL'south that have http or https. In social club to make this entire grouping optional, I merely added a question mark to the end of it.

Next is the domain proper noun: one or more numbers, letters, dots, or hypens followed by some other dot then two to half-dozen letters or dots. The following section is the optional files and directories. Inside the group, we desire to lucifer any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. So we say that this grouping can be matched as many times as we want. Pretty much this allows multiple directories to exist matched along with a file at the end. I have used the star instead of the question marker because the star says zero or more, not zero or i. If a question mark was to be used there, only i file/directory would be able to exist matched.

Then a trailing slash is matched, just it can be optional. Finally we terminate with the cease of the line.

Cord that matches:

https://net.tutsplus.com/almost

String that doesn't match:

http://google.com/some/file!.html (contains an exclamation indicate)


7. Matching an IP Address

Matching an IP address Matching an IP address Matching an IP address

Pattern:

Description:

At present, I'1000 not going to lie, I didn't write this regex; I got information technology from hither. Now, that doesn't mean that I can't rip it apart character for graphic symbol.

The first capture group actually isn't a captured grouping because

was placed inside which tells the parser to non capture this group (more than on this in the last regex). We also want this non-captured group to be repeated 3 times — the {3} at the end of the group. This group contains another group, a subgroup, and a literal dot. The parser looks for a match in the subgroup and so a dot to move on.

The subgroup is also another not-capture group. It'southward just a bunch of character sets (things inside brackets): the string "25" followed by a number between 0 and 5; or the string "ii" and a number between 0 and 4 and any number; or an optional zero or i followed by two numbers, with the 2d being optional.

After we lucifer iii of those, it's onto the next non-capturing group. This ane wants: the string "25" followed by a number betwixt 0 and 5; or the string "2" with a number between 0 and iv and some other number at the end; or an optional zero or ane followed past two numbers, with the second being optional.

Nosotros end this confusing regex with the end of the cord.

Cord that matches:

73.sixty.124.136 (no, that is not my IP address :P)

Cord that doesn't match:

256.lx.124.136 (the offset group must be "25" and a number betwixt zero and v)


8. Matching an HTML Tag

Matching an HTML tag Matching an HTML tag Matching an HTML tag

Design:

Description:

1 of the more than useful regexes on the list. Information technology matches any HTML tag with the content within. Equally usually, nosotros begin with the offset of the line.

Outset comes the tag's name. It must be one or more messages long. This is the commencement capture grouping, it comes in handy when nosotros have to grab the closing tag. The next thing are the tag's attributes. This is any character merely a greater than sign (>). Since this is optional, merely I desire to lucifer more than one character, the star is used. The plus sign makes up the attribute and value, and the star says every bit many attributes as you want.

Next comes the 3rd non-capture group. Inside, it will contain either a greater than sign, some content, and a closing tag; or some spaces, a forrad slash, and a greater than sign. The first option looks for a greater than sign followed by whatever number of characters, and the closing tag. \1 is used which represents the content that was captured in the first capturing grouping. In this example information technology was the tag'south name. At present, if that couldn't be matched we desire to look for a cocky endmost tag (like an img, br, or hr tag). This needs to accept one or more spaces followed by "/>".

The regex is ended with the end of the line.

String that matches:

Nettuts">http://cyberspace.tutsplus.com/">Nettuts+

String that doesn't match:

<img src="img.jpg" alt="My image>" /> (attributes can't contain greater than signs)


Conclusion

I hope that you lot accept grasped the ideas behind regular expressions a petty bit better. Hopefully you lot'll be using these regexes in future projects! Many times y'all won't need to decipher a regex character by grapheme, but sometimes if yous practise this it helps you lot learn. Just remember, don't be afraid of regular expressions, they might non seem it, merely they make your life a lot easier. But try and pull out a tag's name from a string without regular expressions!

Follow us on Twitter, or subscribe to the NETTUTS RSS Feed for more than daily spider web development tuts and articles. And check out some of those regex apps on Envato Marketplace.

lightlewelit1954.blogspot.com

Source: https://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149

0 Response to "How to Use Regular Expression in Java That Reads the First Value"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel