Introduction
In modern web design, most components are modular, resulting in repeated structures like task lists, product information, or navigation menus. These consistent frameworks simplify extracting data in a structured format.
Examples:
- CoolPC CPU Product Page: How can we quickly extract price details for products?
- Agile development Kanban tasks: How to consolidate tasks relevant to your annual objectives?
- Other use cases: Searching across multiple files for specific content and parsing it.
These scenarios can be addressed by efficiently extracting necessary data in VS Code, enabling further analysis.
This article uses practical examples to demonstrate how to leverage regular expressions in VS Code to extract meaningful information from raw data and transform it into analyzable formats.
Preparation
Install VS Code
Ensure VS Code is installed on your computer. If not, refer to the Visual Studio Code section in “Awesome Windows - Essential Productivity Software Installation and Guide”.
Common Regular Expressions
In VS Code, we use the search functionality to find strings that match specific patterns. By utilizing regular expressions, we can select the content we need. Below is a compilation of commonly used regular expressions that every developer should be familiar with:
Description | Regex | Example | Explanation |
---|---|---|---|
Match any character | . | a.c matches abc or a3c | Any single character (excluding line breaks) |
Match one or more times | + | a+ matches a or aaa | Matches at least 1 occurrence |
Match zero or more times | * | a* matches aaa , a , or an empty string | Matches 0 or more occurrences |
Match zero or one time | ? | a? matches a or an empty string | Matches at most 1 occurrence |
Match start of a line | ^ | ^hello matches hello world | Matches start of a line |
Match end of a line | $ | world$ matches hello world | Matches end of a line |
Match digits | \d | \d+ matches 123 or 56 | Matches digits 0-9 |
Match alphanumeric | \w | \w+ matches hello123 | Includes letters, digits, and underscores |
Match specific counts | {n} | a{3} matches aaa | Matches exactly n occurrences |
Match at least n times | {n,} | a{2,} matches aa or aaaa | Matches at least n occurrences |
Match range of times | {n,m} | a{2,4} matches aa or aaa | Matches at least n but no more than m occurrences |
Match from set | [abc] | [abc] matches a , b , or c | Matches characters from the set |
Exclude set | [^abc] | [^abc] does not match a , b , or c | Matches characters not in the set |
Match new line | \n | a\nb matches a + newline + b | Matches new line character |
Lazy matching | *?, +?, ??, {n,m}? | a+? matches a (non-greedy) | Prioritizes minimum matches |
Grouping | () | (ab)+ matches abab or ab | Groups content for reuse |
Escape special chars | \ | \[ matches [ or \) matches ) | Escapes special regex characters |
There are also some more specialized regular expressions, such as lookaheads, word boundaries, and non-digits, which are less commonly used in searches and will not be covered here. For those interested, refer to Microsoft’s documentation.
Extracting Product Information
This example uses the CoolPC CPU page, which displays product names and pricing information. How can we extract product names and prices from the webpage and convert them into an Excel file for further analysis? Below are the detailed steps and workflow:
Retrieve Web Page Source Code
Navigate to the CoolPC CPU Product Page.
Right-click on any blank area of the webpage and select View Page Source, or use the shortcut
Ctrl + U
.Use
Ctrl + A
to select all the page source content, thenCtrl + C
to copy it.Paste the copied content into VS Code using
Ctrl + V
.
Search Target
In VS Code, use Search (Ctrl + F)
. A search bar will appear at the top-right of the editor, with an option to enable regular expressions. This feature must be activated to search using regular expressions.
We can observe that the product name is enclosed within <div class=t>
and </div>
, while the price information is on the next line, starting with <div class=x>
followed by either a tax-included (含稅) or tax-excluded (未稅) NT + number.
|
|
We can enter <div class=t>.*?</div>\n<.*?\d+
in the search bar to perform a matching search.
This regular expression can be interpreted as follows:
- Matches a starting tag: the HTML tag
<div class=t>
. - Then matches any characters
.*?
, zero or more times, in a non-greedy manner. - Next, it matches the closing tag
</div>
and proceeds to anewline
. - After the newline, it matches
<
at the start of a tag..*?
indicates an unknown length, but it must eventually match one or more digits\d+
.
If you’re not familiar with these matching rules, it’s recommended to try typing them yourself to get a better sense of how they work.
For those comfortable with regular expressions, this can even be simplified to: <d.*=t>.*?.*\n<d.*=x>.*?\d+
.
Select All Occurrences
In VS Code, you can quickly select all occurrences items using the Command Palette or a shortcut. Here are the steps:
Open the Command Palette
Press
F1
orCtrl + Shift + P
, then typeSelect All Occurrences of Find Match
in the search bar. Alternatively, you can use the shortcutCtrl + Shift + L
.Execute Selection
After confirming the matching items, press the shortcut. The system will automatically select all matching items. Exit the search mode to view the selected results.
Copy and Paste
Press
Ctrl + C
to copy the selected content. Open a new blank document usingCtrl + N
, then paste the copied content withCtrl + V
.Format the Data
Use regular expressions to format the data for easier analysis:
- Use
Ctrl + Shift + L
again to select unrelated content and delete it. - Perform a regex-based replace operation to convert specific strings into
\t
(TAB space), making the data easier to import into Excel for analysis. Format it asProduct Name\tPrice
.
- Use
Paste into Excel
Once the data is organized, paste it into Excel for further processing:
Copy Selected Content
Select all rows and pressCtrl + C
to copy the data to the clipboard.Paste Special
Select the target cell (e.g., A1) and use the shortcutCtrl + Shift + V
to paste the content asText Format
. Excel will automatically parse the\t
(TAB) delimiters, splitting the data into multiple columns.- Column A will contain product names.
- Column B will contain product prices.
Once pasted, you can proceed with further data processing or analysis.
Output Results
Below are the extracted CPU prices in Taiwan as of December 2024.
CPU | TWD | USD |
---|---|---|
AMD 8500G + Any MB Bundle (With same invoice as motherboard) | 4990 | 153.02 |
AMD R5 3400G【4C/8T】3.7G(↑4.2G)65W/12nm/3-Year Warranty/Includes iGPU | 2550 | 78.20 |
AMD R5 5500GT【6C/12T】3.6G(↑4.4G)65W/Includes iGPU/7nm | 3900 | 119.60 |
AMD R5 5600GT【6C/12T】3.6G(↑4.6G)65W/Includes iGPU/7nm | 4450 | 136.46 |
AMD R5 7600X【6C/12T】4.7G(↑5.3G)105W/With RDNA iGPU | 7600 | 233.06 |
AMD R5 8400F【6C/12T】4.2G(↑4.7G)65W | 5700 | 174.79 |
AMD R5 8500G【6C/12T】3.5G(↑5.0G)65W/RDNA 3 iGPU/4nm Tech/Min 45W | 5150 | 157.93 |
AMD R5 8600G【6C/12T】4.3G(↑5.0G)65W/RDNA 3 iGPU/Built-in NPU for AI | 6350 | 194.73 |
AMD R5 9600X【6C/12T】3.9G(↑5.4G)65W/With RDNA iGPU | 8650 | 265.26 |
AMD R7 5700X3D【8C/16T】3.0G(↑4.1G)105W/96M | 7550 | 231.52 |
AMD R7 5700X3D【8C/16T】3.0G(↑4.1G)105W/96M (Any MB Bundle) | 6990 | 214.35 |
AMD R7 7700 MPK(Includes Fan)【8C/16T】3.8G(↑5.3G)65W | 7390 | 226.62 |
AMD R7 7700 MPK(Includes Fan)【8C/16T】3.8G(↑5.3G)65W (Any MB Bundle) | 6990 | 214.35 |
AMD R7 7800X3D【8C/16T】4.2G(↑5.0G)96M/120W/With RDNA iGPU | 13950 | 427.78 |
AMD R7 8700F【8C/16T】4.1G(↑5.0G)65W/Built-in NPU for AI | 9200 | 282.12 |
AMD R7 8700G【8C/16T】4.2G(↑5.1G)65W/RDNA 3 iGPU/Built-in NPU for AI | 9450 | 289.79 |
AMD R7 9700X【8C/16T】3.8G(↑5.5G)65W/With RDNA iGPU | 11550 | 354.19 |
AMD R9 7900【12C/24T】3.7G(↑5.4G)65W/With RDNA iGPU | 13400 | 410.92 |
AMD R9 7950X3D【16C/32T】4.2G(↑5.7G)128M/120W/With RDNA iGPU | 21450 | 657.77 |
AMD R9 7950X【16C/32T】4.5G(↑5.7G)170W/With RDNA iGPU | 18900 | 579.58 |
AMD R9 9900X【12C/24T】4.4G(↑5.6G)120W/With RDNA iGPU | 14850 | 455.38 |
AMD R9 9950X【16C/32T】4.3G(↑5.7G)170W/With RDNA iGPU | 20950 | 642.44 |
AMD Ryzen TR 7980X【64C/128T】3.2G(↑5.1G)350W/320M/7nm | 182700 | 5602.58 |
AMD Ryzen TR PRO 7975WX【32C/64T】4.0G(↑5.3G)350W/144M/7nm | 137700 | 4222.63 |
Intel Core Ultra 5 245K【14C/14T】4.2G(↑5.2G)/24M/Integrated Xe-core/Fanless | 10100 | 309.72 |
Intel Core Ultra 5 245KF【14C/14T】4.2G(↑5.2G)/24M/No iGPU/Fanless | 9650 | 295.92 |
Intel Core Ultra 7 265K【20C/20T】3.9G(↑5.5G)/30M/Integrated Xe-core/Fanless | 13600 | 417.05 |
Intel Core Ultra 7 265KF【20C/20T】3.9G(↑5.5G)/30M/No iGPU/Fanless | 13000 | 398.65 |
Intel Core Ultra 9 285K【24C/24T】3.7G(↑5.7G)/36M/Integrated Xe-core/Fanless | 19700 | 604.11 |
Intel i3-12100【4C/8T】(With specified motherboard invoice, Save $150) | 3100 | 95.06 |
Intel i3-12100【4C/8T】3.3G(↑4.3G)/12M/UHD730/60w Global 3-Year Warranty | 3250 | 99.66 |
Intel i3-14100【4C/8T】(With specified motherboard invoice, Save $300) | 3500 | 107.33 |
Intel i3-14100【4C/8T】3.5GHz(↑4.7GHz)/20M/UHD730/60W | 3800 | 116.53 |
Intel i3-14100F【4C/8T】3.5GHz(↑4.7GHz)/20M/No iGPU/58W | 2880 | 88.32 |
Intel i5-12400【6C/12T】(With specified motherboard invoice, Save $150) | 4250 | 130.33 |
Intel i5-12400【6C/12T】2.5G(↑4.4G)/18M/UHD730/65w Global 3-Year Warranty | 4400 | 134.93 |
Intel i5-12400F【6C/12T】2.5G(↑4.4G)/18M/No iGPU/65w Global 3-Year Warranty | 3500 | 107.33 |
Intel i5-14400【10C/16T】(With specified motherboard invoice, Save $200) | 6100 | 187.06 |
Intel i5-14400【10C/16T】2.5GHz(↑4.7G)/24M/UHD730/65W | 6300 | 193.19 |
Intel i5-14400F【10C/16T】2.5GHz(↑4.7G)/24M/No iGPU/65W | 5400 | 165.59 |
Intel i5-14500【14C/20T】(With specified motherboard invoice, Save $200) | 7300 | 223.86 |
Intel i5-14500【14C/20T】2.6GHz(↑5G)/24M/UHD770/65W | 7500 | 229.99 |
Intel i5-14600K【14C/20T】(With specified motherboard invoice, Save $400) | 7590 | 232.75 |
Intel i5-14600K【14C/20T】3.5G(↑5.3G)/24M/UHD770/Fanless | 7990 | 245.02 |
Intel i5-14600KF【14C/20T】(With specified motherboard invoice, Save $200) | 7200 | 220.79 |
Intel i5-14600KF【14C/20T】3.5G(↑5.3G)/24M/No iGPU/Fanless | 7400 | 226.92 |
Intel i7-14700【20C/28T】(With specified motherboard invoice, Save $450) | 9999 | 306.62 |
Intel i7-14700【20C/28T】2.1GHz(↑5.4G)/33M/UHD770/65W | 10450 | 320.45 |
Intel i7-14700F【20C/28T】(With specified motherboard invoice, Save $500) | 9200 | 282.12 |
Intel i7-14700F【20C/28T】2.1GHz(↑5.4G)/33M/No iGPU/65W | 9700 | 297.45 |
Intel i7-14700K【20C/28T】(With specified motherboard invoice, Save $600) | 11900 | 364.92 |
Intel i7-14700K【20C/28T】3.4G(↑5.6G)/33M/UHD770/Fanless | 12500 | 383.32 |
Intel i7-14700KF【20C/28T】(With specified motherboard invoice, Save $500) | 10900 | 334.25 |
Intel i7-14700KF【20C/28T】3.4G(↑5.6G)/33M/No iGPU/Fanless | 11400 | 349.59 |
Intel i9-14900F【24C/32T】(With specified motherboard invoice, Save $1200) | 13700 | 420.12 |
Intel i9-14900F【24C/32T】2.0GHz(↑5.8G)/36M/No iGPU/65W | 14900 | 456.92 |
Intel i9-14900K【24C/32T】(With specified motherboard invoice, Save $1000) | 15700 | 481.45 |
Intel i9-14900K【24C/32T】3.2G(↑6.0G)/36M/UHD770/Fanless | 16700 | 512.11 |
Intel i9-14900KF【24C/32T】(With specified motherboard invoice, Save $1200) | 14100 | 432.38 |
Intel i9-14900KF【24C/32T】3.2G(↑6.0G)/36M/No iGPU/Fanless | 15300 | 469.18 |
Intel Processor 300【2C/4T】3.9GHz/6M/UHD710/46W | 2680 | 82.18 |
Intel Xeon W5-2455X【12C/24T】3.20GHz(↑4.6GHz)/30M/200W | 36700 | 1125.42 |
Intel Xeon W5-2465X【16C/32T】3.10GHz(↑4.7GHz)/33.75M/200W | 47900 | 1468.87 |
Intel Xeon W5-3435X【16C/32T】3.10GHz(↑4.7GHz)/45M/270W | 56500 | 1732.60 |
Intel Xeon W7-2475X【20C/40T】2.60GHz(↑4.8GHz)/37.5M/225W | 61200 | 1876.72 |
Intel Xeon W7-2495X【24C/48T】2.50GHz(↑4.8GHz)/45M/225W | 75500 | 2315.24 |
Intel Xeon W7-3465X【28C/56T】2.50GHz(↑4.8GHz)/75M/300W | 100500 | 3081.88 |
Intel Xeon W9-3475X【36C/72T】2.20GHz(↑4.8GHz)/82.5M/300W | 132500 | 4063.17 |
Conclusion
This article highlighted the effective use of the Select All Occurrences of Find Match feature to filter relevant content and organize it into structured data for further analysis. Notably, this feature can be used without opening the search bar by selecting specific strings and pressing the shortcut Ctrl + Shift + L
to effortlessly find all matching items.
With VS Code’s regular expressions, we can precisely search for key information. During processes requiring extensive filtering, these actions can be performed entirely within VS Code, eliminating the need to rely on GPT for accurate data extraction.
The application of regular expressions extends beyond VS Code. Mastering them can yield long-term technical benefits in various programming languages and tools such as JavaScript, Python, and more.