[Lemon-devel] [Lemon-user] GraphReader
Balazs Dezso
deba at inf.elte.hu
Sun Jan 20 20:03:56 CET 2008
Hi
> This seems to contradict to your and also Peter's previous mail. You
> said that the columns cannot be determined without knowing their type.
> Similarly in Peter's example, there are columns which are two words.
The format x y or x, y is not suggested in the lgf, because it is not conform
to the UnformattedReader, which is used to read unknown types. Generally, it
is not forbidden in the current system, but if you do not want read the map
with these fromats, then you have to use skip...Map function to define the
reading(skipping) method for the column.
The global restriction of value representations could be a good choice in
usability and safety of lemon IO, but I support the less restricted format.
> Could you tell me a single real advantage of the 'extended format'?
- The most obvious example is the format (x, y) of points. This is more
natural than the format you suggested.
- The type wrapping system could become harder with a quoting restrictions.
For example you want to use vector of strings for edge map (it can be used in
a finite state machine for text processing). The current format is { v_1,
v_2, v_3, ..., v_n}, where each v_i is a quoted string. If we quoute the
vector, then the v_i quoting should be changed, which could be overall hard.
> * it is impossible of process a general .lgf file (either by a
> C(++) code or an awk script)
Why? The restricted format is well defined, and it is easy to parse.
> * it is impossible to read a column twice (into two maps). It is
> not a theoretical problem as you might think. You will come
> across this whenever you want to write a program where the user
> can decide running time which column should be used for some
> purpose. (Why the user should not be allowed to use the same
> column for different purposes?) Currently if you want to handle
> this situation, it is easier to write an own reader, which is
> ridiculous.
Yes, your example could be very disturbing. It is a reason for changing the
implementation. As I see, the implementation would be changed on the next
point. Each value will be read to a string with the restricted format rule,
and after that the value will be read from an istringstream to the typed
value.
> * The current GraphReader/GraphWriter implementation is extremely
> and overly complex. Probably nobody understand how it works,
> except you. Nobody is able to do any single change in that,
> except you.
I think the complexity comes from the different dimensions of freedom. We want
to handle the different and extendable file sections, different map types and
different graph item types. But I do not know, where could your stricter
format simplify the lemon IO code complexity. Please, explain it!
Best, Balazs
On Sunday 20 January 2008 18.46.16 Alpár Jüttner wrote:
> Hi,
>
> > The current lemon reader uses the next format to read the unknown tokens:
> > Well formed expression with the next parses
> > (), [], {}, //
> > Well formed string literal with escape processing
> > "(.|\")*", '(.|\')*'
> > Sequence without any whitespace (not in quote and parses)
> > The parsed sequence can contain string literals
>
> This seems to contradict to your and also Peter's previous mail. You
> said that the columns cannot be determined without knowing their type.
> Similarly in Peter's example, there are columns which are two words.
>
> > In my point of view, the value representation can be restricted to this
> > format, but I would not like stricter format as Alpar suggested.
>
> Could you tell me a single real advantage of the 'extended format'?
>
> Remember that these advantages should compensate the facts that
> currently
> * it is impossible of process a general .lgf file (either by a
> C(++) code or an awk script)
> * it is impossible to read a column twice (into two maps). It is
> not a theoretical problem as you might think. You will come
> across this whenever you want to write a program where the user
> can decide running time which column should be used for some
> purpose. (Why the user should not be allowed to use the same
> column for different purposes?) Currently if you want to handle
> this situation, it is easier to write an own reader, which is
> ridiculous.
> * The current GraphReader/GraphWriter implementation is extremely
> and overly complex. Probably nobody understand how it works,
> except you. Nobody is able to do any single change in that,
> except you.
>
> The last bullet is really a serious problem. Just an example, once I
> wanted to change the '@uedgeset' string in the GraphReader for some
> reason :). It took me more than half an hour to just identify where is
> the string is hidden in the source code and I still didn't understood
> how I could properly change is to '@edgeset'.
>
> Alpar
>
> > If
> > you want to develop some general lgf handling tool, you can use this
> > short specification, or you can use the lemon::UnformattedReader
> > class. Unfortunately, the multiple reading is not possible, but I do
> > not feel it important question.
> >
> > Best, Balazs
> >
> >
> >
> >
> > ________________________________________
> > From: lemon-devel-bounces at lemon.cs.elte.hu
> > [lemon-devel-bounces at lemon.cs.elte.hu] On Behalf Of Alpár Jüttner
> > [alpar at cs.elte.hu] Sent: Sunday, January 20, 2008 1:09 PM
> > To: Kovács Péter
> > Cc: LEMON Development; lemon-user at lemon.cs.elte.hu
> > Subject: Re: [Lemon-devel] [Lemon-user] GraphReader
> >
> > Hi,
> >
> > > Maybe Alpar's idea seems better. However there are special cases when
> > > it is not so good. For example if the .lgf file contains a map of
> > > dim2::Point value, the following lines are valid now:
> > >
> > > coords
> > > 1 10 20
> > > 2 (30 40)
> > > 3 50, 60
> > > 4 (70, 80)
> > > 5 90,100
> > > 6 (110,120)
> >
> > Why on earth would anyone want you write a file like this? And even if
> > someone wants to do that, why should we enable to do that?
> > By the way, these formats can also be used in my approach, they just
> > have to be quoted like this:
> >
> > coords
> > 1 "10 20"
> > 2 "(30 40)"
> > 3 "50, 60"
> > 4 "(70, 80)"
> > 5 90,100
> > 6 (110,120)
> >
> > Regards,
> > Alpar
> >
> > > If we used the method suggested by Alpar, only the last two lines would
> > > be valid, but according to the current implementation of the
> > > dim2::Point::operator<< function, a graph writer would generate the 4th
> > > line. (Of course it could be changed to the last version: (x,y) without
> > > spaces.)
> > >
> > > Peter
> > >
> > > Alpár Jüttner írta:
> > > > Hi,
> > > >
> > > > This design has much more drawbacks than advantages.
> > > >
> > > > The worst thing is that it makes it fully impossible to write any
> > > > tool that would manipulate a general .lgf file. For example it is
> > > > even impossible to correctly read a general .lgf file into a graph
> > > > editor.
> > > >
> > > > Do we really need such a free file-format? Probably we don't.
> > > >
> > > > I think we should insist on the original definition, i.e. the edgeset
> > > > and the nodeset of the file should consist of plain whitespace
> > > > separated columns. The only exception I would allow is that a column
> > > > can also be a string enclosed by two double-quotes characters (in
> > > > order to enable whitespace containing strings). Of course, we may
> > > > also use a more complex evaluation similar to the way the unix shells
> > > > evaluate their parameters, but I'm not sure if it is worth doing
> > > > that.
> > > > This approach would make it possible to write general .lgf
> > > > manipulation tools, and it would also enable a much simpler API for
> > > > making custom readers/writers for special data types, as they would
> > > > basically be converters between std::string and the data type.
> > > >
> > > > Of course, it also means that we must write e.g. "x + y / 82" instead
> > > > of just x + y / 82.
> > > > I think it is really worth doing that.
> > > >
> > > > Regards,
> > > > Alpar
> > > >
> > > > On Fri, 2008-01-18 at 17:20 +0100, Balazs Dezso wrote:
> > > >> Hi
> > > >>
> > > >>> What is the reason for that?
> > > >>
> > > >> The reader does not cache the string representation of the map
> > > >> value, because the map reader reads the value directly from the
> > > >> input stream. More general reason, the graph IO design is based on
> > > >> the next rule, just the map knows where is the end of its input. By
> > > >> example, the value may contains space, or it should be well formed
> > > >> expression.
> > > >>
> > > >> By example, valid values for some readers:
> > > >> Hello\ world!
> > > >> <xml> AB <tag> C </tag> D </xml>
> > > >> x + y / 82
> > > >> separator;
> > > >> "\"Hello\" \"World\""
> > > >>
> > > >> Best, Balazs
> > > >>
> > > >> On Friday 18 January 2008 16.50.25 you wrote:
> > > >>> Hi,
> > > >>>
> > > >>>> It is not possible, but an exception is thrown when you try to do
> > > >>>> this.
> > > >>>
> > > >>> What is the reason for that?
> > > >>>
> > > >>>> The
> > > >>>> only solution is using maps such ForkWriteMap...
> > > >>>>
> > > >>>> Best, Balazs
> > > >>>>
> > > >>>> On Thursday 17 January 2008 11.07.26 Alpár Jüttner wrote:
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> Is it possible to read a column of an .lfg file into two
> > > >>>>> different maps (of different types, such as NodeMap<double> and
> > > >>>>> NodeMap<std::string>) at the same time?
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> Alpar
> > > >>>>>
> > > >>>>>
> > > >>>>> _______________________________________________
> > > >>>>> Lemon-user mailing list
> > > >>>>> Lemon-user at lemon.cs.elte.hu
> > > >>>>> http://lemon.cs.elte.hu/mailman/listinfo/lemon-user
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Lemon-user mailing list
> > > >>>> Lemon-user at lemon.cs.elte.hu
> > > >>>> http://lemon.cs.elte.hu/mailman/listinfo/lemon-user
> > > >
> > > > _______________________________________________
> > > > Lemon-user mailing list
> > > > Lemon-user at lemon.cs.elte.hu
> > > > http://lemon.cs.elte.hu/mailman/listinfo/lemon-user
> > >
> > > _______________________________________________
> > > Lemon-user mailing list
> > > Lemon-user at lemon.cs.elte.hu
> > > http://lemon.cs.elte.hu/mailman/listinfo/lemon-user
> >
> > _______________________________________________
> > Lemon-devel mailing list
> > Lemon-devel at lemon.cs.elte.hu
> > http://lemon.cs.elte.hu/mailman/listinfo/lemon-devel
> > _______________________________________________
> > Lemon-devel mailing list
> > Lemon-devel at lemon.cs.elte.hu
> > http://lemon.cs.elte.hu/mailman/listinfo/lemon-devel
>
> _______________________________________________
> Lemon-devel mailing list
> Lemon-devel at lemon.cs.elte.hu
> http://lemon.cs.elte.hu/mailman/listinfo/lemon-devel
More information about the Lemon-devel
mailing list