Tims_blog_2025-11-29_14:23:12.40_transforms >> Tims_blog_2026-04-25_16:01:26.25_transforms >> garage_door_opener_transforms >> _bug9HuMcJvz_transforms >> Business_Ideas

TB Wiki

Regression Test

Expected HTML for page "DocParsedTables"

	expected html
t	1	A "parsed table" is one that is read using the <i>scan_data</i> module,	t
	2	(the same module used by the WebSed processor) to create a list of records.
	3	<p>
	4	This is similar to a tbwiki table, where the table fields are read from level-1 sections and definition lists. But generic parsed tables are completely
	5	free-form in their format. Due to this, there is no mechanism to write
	6	the data back. A parsed table database is always read-only.
	7	<p>
	8	<h1><a name="source">source</a>
	9	<span align=right class="section_edit_link">[<a href="/tbwiki/DocParsedTables?action=edit&section=source">edit section</a>]</font></span>
	10	</h1>
	11	The source_spec for a parsed table can be wiki pages, absolute paths,
	12	or external URL references. They may contain wildcards to read more than
	13	one file. They can be lists of specifications (separated by colons).
	14	<p>
	15	Note that parsing data from external web pages (scraping them) may
	16	introduce long delays in displaying a table.
	17	<p>
	18	<h1><a name="match_spec">match_spec</a>
	19	<span align=right class="section_edit_link">[<a href="/tbwiki/DocParsedTables?action=edit&section=match_spec">edit section</a>]</font></span>
	20	</h1>
	21	See <a href="/tbwiki/DocWebSed">DocWebSed</a> for more information about the data_scan module and
	22	how to declare regular expressions for finding data in a page of text.
	23	<p>
	24	Here are some miscelaneous rules for match_spec:
	25	<ul><li>if record_id is not a declared field for the parser, then the first field encountered is treated at the record_id for the database
	26	<ul><li>This means that this field is duplicated in the database. that is, if your first field is "year", your records will have a "year" field and a "record_id" field, with the same values for each field.
	27	<li>Only one record for each record_id value is stored in the database. So, if your record_id is the year, and there are two records with the year "2010" in the parsed page, then only one of them appears in the parsed database.
	28	<ul><li>FIXTHIS - should add a error message when a record is overwritten in the database
	29	</ul><li>you can specify the field to use for the record_id, by passing a match expression of "record_id_name=<field_name>" in the match_spec.
	30	<ul><li>ex: record_id_name=full_date
	31	</ul></ul><li>you must declare a record_start expression in order for the parser to determine when to reset itself and start looking for the next record
	32	<li>more than one match expression can match a single line or a single piece of text
	33	<ul><li>For example: if the line were "date: August 20, 2015" you could use the following matches (simultaneously):
	34	<ul><li>full_date=date: (.*)
	35	<li>month=date: (\w*) \d{1,2},
	36	<li>day=date: \w* (\d{1,2}),
	37	<li>year=date: \w* \d{1,2}, (\d\d\d\d}
	38	</ul></ul><li>every match expression is evaluated against every line in the data text. this means that each expression must contain something sufficiently unique in the expression to prevent it from getting matches on incorrect lines. Overly general expressions will lead to confusion
	39	</ul>
	40	<p>
	41	<h1><a name="examples">examples</a>
	42	<span align=right class="section_edit_link">[<a href="/tbwiki/DocParsedTables?action=edit&section=examples">edit section</a>]</font></span>
	43	</h1>
	44	<h2><a name="Fixthis_table">Fixthis table</a>
	45	<span align=right class="section_edit_link">[<a href="/tbwiki/DocParsedTables?action=edit&section=Fixthis_table">edit section</a>]</font></span>
	46	</h2>
	47	Here is a sample table declaration for the FIXTHIS list for the tbwiki software. It reads the code files (.py files) looking for FIXTHIS entries.
	48	in the code itself.
	49	<p>
	50	<pre>
	51	{{{#!Table
	52	source_spec=/home/tbird/work/tbwiki/cgi-bin/.[.]py$:/home/tbird/work/tbwiki/cgi-bin/plugins/.[.]py$
	53
	54	match_spec="""
	55	record_start=FIXTHIS
	56	description=FIXTHIS[ -](.)$
	57	file=%(basename)s
	58	line_no=%(line_no)s
	59	"""
	60
	61	cols=file:line_no:description
	62	sortby=file:alpha,line_no:int
	63	}} }
	64	</pre>
	65	<p>
	66	<h2><a name="busybox_page_scrape">busybox page scrape</a>
	67	<span align=right class="section_edit_link">[<a href="/tbwiki/DocParsedTables?action=edit&section=busybox_page_scrape">edit section</a>]</font></span>
	68	</h2>
	69	Here is an example of scraping the busybox mailing list archive summary
	70	page to extract the size of the archive for each month (giving an approximation
	71	of community activity level for that month).
	72	<p>
	73	<pre>
	74	{{{#!Table
	75	source_spec=http://lists.busybox.net/pipermail/busybox/
	76	match_spec="""
	77	record_start=.*<tr>
	78	year_month=href="(.*)[.]txt[.]gz"
	79	year=href="(20\d\d)-.*[.]txt[.]gz"
	80	month=href="20\d\d-(.*)[.]txt[.]gz"
	81	size=Gzip'd Text (.*) KB
	82	"""
	83	cols=year_month:year:month:size
	84	sortby=year:alpha,month:month
	85	</pre>
	86	<p>

Legends

Colors
Added
Changed
Deleted

Links
(f)irst change
(n)ext change
(t)op

Differences for page "DocParsedTables"

Legends

Colors
Added
Changed
Deleted

Links
(f)irst change
(n)ext change
(t)op

Update saved output

Back to diff page

Return to Regression_Test page