Ticket #35 (closed defect: fixed)
Search does not display subdirectory pages.
| Reported by: | rselzler | Owned by: | sheep |
|---|---|---|---|
| Priority: | High | Milestone: | |
| Component: | Hatta Wiki | Version: | 1.3.3dev |
| Keywords: | Cc: |
Description
'Search' with an empty list of words, will only show pages in
in the first directory level, even when subdirectories are enabled.
Hatta 1.3.3dev
changeset: 758:51cf7924f16f
user: sheep@ghostwheel
date: Fri Feb 19 22:26:43 2010 +0100
summary: make punctuation work before quotes and dashes
The problem is easily reproduced.
Clone the repository, start Hatta with -D,
Create 'Home' and 'subdir/foo',
Click on 'Search'
Change History
comment:1 in reply to: ↑ description Changed 2 years ago by rselzler
comment:2 follow-up: ↓ 3 Changed 2 years ago by sheep
- Status changed from new to closed
- Resolution set to fixed
I see the problem with index -- the page list is actually just a file list, for efficiency. And since the same function is used to get the list of pages for indexing, it also affected all the search features.
It's hopefully fixed in 6b9210896d0f, please test (don't forget to delete the cache directory).
comment:3 in reply to: ↑ 2 Changed 2 years ago by rselzler
Replying to sheep:
I see the problem with index -- the page list is actually just a file list, for efficiency. And since the same function is used to get the list of pages for indexing, it also affected all the search features.
It's hopefully fixed in 6b9210896d0f, please test (don't forget to delete the cache directory).
YES, it works !!!
Score another one for Radomir !
I hit one small bump, which is probably beyond your control.
When I initially restarted the DreamHost Apache server, running Hatta under flup,
the 'Menu' needed to be rebuilt (+edit/Menu, no change, just saved it).
The CPU pegged at 99% for a few minutes and then an error page was displayed.
I deleted the cache and restarted the server and Menu update 3 times and it failed consistently.
I then executed Hatta directly, instead of via Apache.
The process ran at 50% CPU for 3:10 and finished rebuilding the Menu without error.
When I started the server again, the Menu was correct and Search worked as expected.
I'm guessing the Apache server has a time-out that kicked in when
Hatta took so long to rebuild the cache for the Menu...
Do you think this is a reasonable guess?
Should I look for an Apache time-out option to increase?
Is there another solution?
--Randy
comment:4 follow-up: ↓ 5 Changed 2 years ago by sheep
On the first run (or the first change) Hatta has to index all the text pages. I did some tests and with 5000 RFC text documents totalling about 40MB it took about 5 to 10 minutes on my computer. I have no idea why it takes so long in your case, especially when a fresh wiki should only have a few pages...
Of course, subsequent updating of the cache should be much faster -- only the pages that changed are reindexed. So unless you delete the cache directory again, it should be rather fast.
comment:5 in reply to: ↑ 4 Changed 2 years ago by rselzler
Replying to sheep:
On the first run (or the first change) Hatta has to index all the text pages. I did some tests and with 5000 RFC text documents totalling about 40MB it took about 5 to 10 minutes on my computer. I have no idea why it takes so long in your case, especially when a fresh wiki should only have a few pages...
Of course, subsequent updating of the cache should be much faster -- only the pages that changed are reindexed. So unless you delete the cache directory again, it should be rather fast.
Your description is consistent with my understanding.
Question: what events can trigger a total rescan, besides deleting the cache?
Does changing 'Menu' or 'Logo' or 'style.css' etc. trigger big rescans?
Knowing more will help me response to questions from my users.
I may also 'Lock' critical pages (or at least add cautionary notices).
A larger timeout may be appropriate in Apache (I suspect such a thing exists).
Are there any timeouts in Hatta itself?
FYI, here are some current statistics from my docs directory.
30mb total size
-9mb for .hg repository
====
21mb content
900 files and/or directories (pages, icons, a couple smallish tar balls)
There are a few large log files (half dozen, each 2mb with more to come).
Each is highly repetitive (think Makefile output from large packages).
These really don't need to be indexed and might warrant special handling.
Question: could I chose a filename extension that Hatta would ignore?
What is currently ignored? .pdf ? .tar ? .tgz ? .png ?
Question: would it be reasonable for Hatta to support a 'data' subdirectory?
Hatta markup might direct large, binary uploads and download to data?
comment:6 follow-up: ↓ 7 Changed 2 years ago by sheep
Question: could I chose a filename extension that Hatta would ignore?
What is currently ignored? .pdf ? .tar ? .tgz ? .png ?
At the moment hatta can only index text files (the ones that have their mime type starting with text/) and wiki pages (the files with no extension). All other files are only indexed by name.
Question: would it be reasonable for Hatta to support a 'data' subdirectory?
Hatta markup might direct large, binary uploads and download to data?
I don't fully understand. By default Hatta "supports" data subdirectories in that it ignores any files inside subdirectories. Maybe you need some small web application for uploading files, or even a gallery/file browser. I know that the guys who make the Vanilla forum software had something like that once, but I can't find it.
comment:7 in reply to: ↑ 6 Changed 2 years ago by rselzler
Replying to sheep:
Question: could I chose a filename extension that Hatta would ignore?
What is currently ignored? .pdf ? .tar ? .tgz ? .png ?
At the moment hatta can only index text files (the ones that have their mime type starting with text/) and wiki pages (the files with no extension). All other files are only indexed by name.
Question: would it be reasonable for Hatta to support a 'data' subdirectory?
Hatta markup might direct large, binary uploads and download to data?
I don't fully understand. By default Hatta "supports" data subdirectories in that it ignores any files inside subdirectories. Maybe you need some small web application for uploading files, or even a gallery/file browser. I know that the guys who make the Vanilla forum software had something like that once, but I can't find it.
That helps, thanks...
I need to learn more about mime and review hatta.py for 'text/' usage.
Mercurial documentation suggests that large binary files are not handled
efficiently in the current implementation. Binary content changes aren't
a good fit to delta-like changes in a few lines. Keeping a separate file
for each version probably makes more sense. I experienced it first hand
when attempting to stash some large tar balls in it.
The documentation also hints that future Mercurial versions may provide
special support for large binary files.
I'm trying to think ahead regarding my project needs and Hatta's role.
Large binary files are common in my industry and some will need to be
accessible via a web site. Hatta and Mercurial may or may not be the
appropriate tool to manage them, although they may play a supporting role.
Perhaps, like you suggest, some small web application may be the answer.
FYI, the binary files in our industry don't compress very well because
they are primarily floating point. In the future, the www site may need to
routinely serve multi-megabyte files for "small" examples and tests.
In this industry, for "production" mode, we routinely process individual files
that are > 1 terabyte using super computers and run times measured in days.

Replying to rselzler:
Actually, this problem appears to break 'Changes', 'History' and 'Backlinks' too.
Hopefully it's fixed soon, because I've used the development version
on my pseis.org Web site... Its only an alpha-preview, not production,
so my foolishness isn't a complete disaster.